subwaydata.nyc

module

v0.0.0-...-e98602f Latest Latest Go to latest Published: Aug 10, 2023 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jamespfennell/subwaydata.nyc

Links

Open Source Insights

README ¶

subwaydata.nyc

This is all of the source code for the subwaydata.nyc project. The project has two pieces:

An ETL pipeline that takes raw subway data collected using Hoard and generates per-day CSV files containing data on all of the trips that ran.
An HTTP server that serves the website files.

Metadata

The project maintains a metadata file that lists all days that have been processed. This file is persisted in Digital Ocean's object/bucket storage, and can be downloaded directly. When the ETL pipeline runs the metadata file is updated. The website periodically polls for the file and updates the HTML files when it changes.

The structure of the metadata file is defined in the metadata/metadata.go file.

ETL pipeline

The code is mostly in the etl directory. A binary is defined in cmd/etl.

The ETL pipeline needs a config file that specifies the feeds to work with and credentials for the object storage where the data is stored. And example of this config is given in etl/config/sample.json.

To run the ETL pipeline for a single day:

go run ./cmd/etl --hoard-config $HOARD_CONFIG --etl-config $ETL_CONFIG run --day YYYY-MM-DD

To run the ETL pipeline over the whole backlog (i.e. days that have yet to be processed):

go run ./cmd/etl --hoard-config $HOARD_CONFIG --etl-config $ETL_CONFIG backlog

To run a periodic job that runs the pipeline every day at certain times:

go run ./cmd/etl  --hoard-config $HOARD_CONFIG --etl-config $ETL_CONFIG periodic 05:30:00-06:00:00

All of these commands have different options and the help text is reasonable:

go run ./cmd/etl help

Website

The code is mostly in the website directory. A binary is defined in cmd/website.

The website periodically polls for the metadata and then updates the HTML files. Run it with:

go run ./cmd/website --metadata-url https://data.subwaydata.nyc/subwaydatanyc_metadata.json --port 8080

Directories ¶

Path	Synopsis
cmd
etl
journal
website
etl
config
export
journal Package journal contains a tool for building trip journals.	Package journal contains a tool for building trip journals.
periodic
storage
metadata
website
html
static

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL