mida

command module
v0.0.0-...-1d26606 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2022 License: MIT Imports: 25 Imported by: 0

README

MIDA: A Tool for Measuring the Web

Go Go Report Card

MIDA is meant to be a general tool for web measurement projects. It is built in Go on top of Chrome/Chromium and the DevTools protocol, giving it a realistic vantage point to study the web and fine-grained access to information provided by Chrome Developer Tools.


Getting Started

Getting started with MIDA is easy! First, install:

$ wget files.mida.sprai.org/setup.py
$ sudo python3 setup.py 

Now we are ready to visit a site and collect some data:

$ mida go example.org

You can find the results of your crawl in the results/ directory.

Easy At-Scale Crawling

One major benefit of MIDA is in being able to run large scale, highly configurable crawls without needing to write your own crawler code. Here's an example of a single MIDA command which will crawl the Alexa Top 100K and gather a few specific types of data:

$ mida go -f https://files.mida.sprai.org/toplists/alexa.lst -n100000 -c8 --all-resources --screenshot --dom

Breaking this down by argument:

-f https://files.mida.sprai.org/toplists/alexa.lst: This is a list of the Alexa Top Websites. You can read from a local file or go get one hosted on the web somewhere

-n100000: Read the top 100,000 entries from the list

-c8: Run with 8 parallel crawlers (browser instances)

--all-resources: Gather all of the actual files/resources required to render the web page. Beware, this takes a lot of space!

--screenshot: Capture a screenshot after/if the load event for each website fires.

--dom: Capture a JSON representation of the DOM for each website visited.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
This package contains the base/root components of MIDA.
This package contains the base/root components of MIDA.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL