2022-04-websteps-illustrated

module
v0.0.0-...-41ba115 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 30, 2023 License: GPL-3.0

README

Websteps illustrated

This repository contains the third prototype of websteps (codename "winter 2022"). This prototype follows after the "summer 2021" and the "fall 2021" protypes.

Content of this repository

The internal directory contains code derived from ooni/probe-cli v3.14.1 as well as new code. As a rule of thumb, most directories could be easily merged back, except measurex and websteps, which have been significantly rewritten and would require either more careful merging or a yolo-rewrite-everything approach.

The cmd directory contains commands using code in the internal library. The most important commands are:

The spec directory contains the current draft specification of websteps, which still needs to be discussed with my colleagues and other friends of OONI.

The python directory contains:

  • python/websteps.py: minimal implementation of the websteps concept written in Python, not meant to become production ready, but rather useful to see the algorithms in a smaller context and show that websteps implementations not written in Go could inter-operated with the test helper written in Go;

  • python/analysis/dbsteps: Python script to analyze websteps measurements and view them in the browser;

  • python/testcase/create: script to create integration test cases for websteps while running measurements;

  • python/testcase/shell: script to manage the integration tests for websteps;

  • python/ooni: comprehensive library to import and process websteps measurements using Python.

The testdata/testcase directory contains a few test cases collected using the create command and managed using the shell command.

The html directory contains support file for browsing websteps measurements and test cases using HTML.

NOTE: while I spent some time to make this code polished, this is still experimental code, with little unit testing and, for sure, a bunch of inconsistencies betwenn the spec and the implementation. This is normal, given that for now websteps is still a bit of a moving target.

Building the websteps client

You need to use go1.17.13 to build this repository.

go build -v ./cmd/websteps

Building the TH

go build -v ./cmd/thd

Changes since websteps fall 2021

These are the main changes since the fall 2021 edition (collection?! 😅):

  1. added support for PTR and NS queries as well as for opportunistially extract the CNAME from replies;

  2. implemented a parallel DNSResolver using custom DNSTransport;

  3. reworked the system resolver to fake a DNSTransport and produce more easily the OONI DNS data format;

  4. several reliability and correctness fixes in DNS code;

  5. significantly reworked the conceptual model of measurex to more easily accommodate for implementing websteps;

  6. around one month of experience running websteps code in several countries (including China, Italy, and Iran), which dramatically helped to improve the robustness of the implementation as well as to develop "scoring" algorithms;

  7. developed a set of algorithms to assign blocking flags to websteps measurements as well as heuristics to spot common classes of false positives and flag them correctly;

  8. implemented and integrated a dnsping extension for websteps that allows to confirm with more confidence cases of DNS blocking as well as to retreat DNS timeout claims when there are transient timeouts;

  9. integration testing framework based on caching the TH and the probe's measurements that is based on replaying measurements collected on the field (thus being more true to real world censorship than simulated censorship using jafar or similar tools);

  10. robust caching mechanism for the TH;

  11. started experimenting with using TLSH to classify webpages in addition to using the traditional Web Connectivity algorithm (but this effort is so far a bit inconclusive);

  12. TH protocol using WebSocket in addition to web APIs to increase robustness when middleboxes close connections that stay silent for a number of seconds;

  13. figure out ways in which the original, optimistic let's-measure-every-endpoint model breaks when coupled with the typical OONI constraints of timing and single-URL-at-a-time and add to the algorithm reasonable settings to strike a balance between depth and breadth;

  14. learn that my effort estimate is usually off by a 5x factor 😬.

  15. the design incorporates future improvements in the check-in API that will allow us to customize how we measure URLs depending on the context (so, we will be able to say for each URL in a given country and ASN, the amount of body bytes to download, whether to follow redirects, etc).

This work addresses in part of completely:

issue level of completion
probe#2034 complete
probe#1190 complete
probe#1806 complete
probe#1803 now unnecessary
probe#1516 mostly(?) complete
probe#1718 complete

What happens now

  • continuing to discuss the spec with OONI friends;

  • prepare short presentation for pitching websteps since the spec is long and it may be beneficial to also provide people with short introductions;

  • continue extensive data analysis and start preparing reports/blog posts based on this work;

  • write spec for extensions (including dnsping, already implemented, and sniblocking, which we need);

  • collect more test cases and add support for automatically checking that we're still passing these test cases;

  • figure out ways to auto-generate parts of the codebase if possible (especially python data structs that depend on Go data structs: that would be nice);

  • perform again a performance comparison with Web Connectivity and also a comparison in terms of accuracy;

  • double check that our level of parallelism is adequate for testing in low bandwidth scenarios;

  • start merging back into probe-cli the easy parts and generally aim to reduce the diff between this fork and the original codebase;

  • sync up the OONI issue tracker with the work I have beem doing here basically in sti mode;

  • extend the underlying library to add support for as many raw errors as possible;

  • decide how to adapt tutorials to changes in here.

Nginx setup

If thd is running locally (and please rememeber to force it to drop root privileges), you can integrate it with an existing nginx setup by adding:

  location /websteps/v1/websocket {
      proxy_read_timeout 900;
      proxy_pass http://127.0.0.1:9876;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "Upgrade";
      proxy_set_header Host $host;
  }
  location /websteps/v1/http {
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_read_timeout 900;
      proxy_pass http://127.0.0.1:9876;
  }

Directories

Path Synopsis
cmd
crawler
Command crawler crawls a set of URLs
Command crawler crawls a set of URLs
dnslookup
Command dnslookup allows to perform DNS lookups.
Command dnslookup allows to perform DNS lookups.
dnsping
Command dnsping allows to send DNS pings.
Command dnsping allows to send DNS pings.
thctl
Command thctl is the test helper client.
Command thctl is the test helper client.
thd
Command thd is the test helper daemon.
Command thd is the test helper daemon.
websteps
Command websteps is a websteps client.
Command websteps is a websteps client.
internal
archival
Package archival is a haveily modified copy of probe-cli internal/archival.
Package archival is a haveily modified copy of probe-cli internal/archival.
atomicx
Package atomicx extends sync/atomic.
Package atomicx extends sync/atomic.
caching
Package caching implements an on-disk cache.
Package caching implements an on-disk cache.
dnsping
Package dnsping contains code for sending DNS pings.
Package dnsping contains code for sending DNS pings.
engine/experiment/websteps
Package websteps implements the websteps experiment.
Package websteps implements the websteps experiment.
engine/geolocate
Package geolocate contains stubs emulating the namesake probe-cli package.
Package geolocate contains stubs emulating the namesake probe-cli package.
engine/httpheader
Package httpheader contains code to set common HTTP headers.
Package httpheader contains code to set common HTTP headers.
logcat
Package logcat implements a logcat-like functionality for ooniprobe.
Package logcat implements a logcat-like functionality for ooniprobe.
measurex
Package measurex contains an heavily modified internal/measurex.
Package measurex contains an heavily modified internal/measurex.
model
Package model is a subset of probe-cli's internal/model package.
Package model is a subset of probe-cli's internal/model package.
netxlite
Package netxlite contains network extensions.
Package netxlite contains network extensions.
runtimex
Package runtimex contains runtime extensions.
Package runtimex contains runtime extensions.
scrubber
Package scrubber contains part of probe-cli's internal/scrubber.
Package scrubber contains part of probe-cli's internal/scrubber.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL