dist

package module
v0.3.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2021 License: BSD-3-Clause Imports: 5 Imported by: 2

README

go-whosonfirst-dist

Go package for working with Who's On First distributions.

Important

The documentation for this package is incomplete.

Build process(es)

Building SQLite databases
  • Fetch remote clone or use local checkout
  • Build databases, generate indices
  • Compress databases, optionally preserve uncompressed
  • Clean up (remote clone or use local checkout)

For example:

$> wof-dist-build \
	-build-sqlite-common \
	-preserve-checkout \
	-workdir /usr/local/dist \
	whosonfirst-data-admin-ca
$> wof-dist-build \
	-build-sqlite-common \
	-local-checkout \
	-compress-all \
	-workdir /usr/local/dist \
	/usr/local/data/dist/whosonfirst-data

-preserve-checkout is assumed (and assumed to be true) if -local-checkout is true.

Building meta file(s)

To be written.

Building bundles
  • Fetch remote clone or use remote (compressed) SQLite database or use local (uncompressed) SQLite database
  • Generate metafiles
  • Build bundle, generate index
  • Compress bundle, optionally preserve uncompressed
  • Clean up (remote clone or remote (compressed) SQLite database or local (uncompressed) SQLite database)
$> wof-dist-build \
	-build-sqlite-common=false \
	-build-bundle \
	-local-sqlite \
	-workdir /usr/local/dist \
	whosonfirst-data-admin-ca

-preserve-sqlite is assumed (and assumed to be true) if -local-sqlite is true.

Tools

To build binary versions of these tools run the cli Makefile target. For example:

$> make cli
go build -mod vendor -o bin/wof-dist-build cmd/wof-dist-build/main.go
go build -mod vendor -o bin/wof-dist-fetch cmd/wof-dist-fetch/main.go
wof-dist-build

Build one or more distribution files for a list of Who's On First repositories.

$> ./bin/wof-dist-build -h
Build one or more distribution files for a list of Who's On First repositories.

Usage:
	 ./bin/wof-dist-build [options] repo(N) repo(N)

Valid options are:
  -build-bundle
    	Build a bundle distribution for a repo.
  -build-meta
    	Build meta files for a repo
  -build-sqlite
    	Build a (common) SQLite distribution for a repo. This flag is DEPRECATED. (default true)
  -build-sqlite-all
    	Build a SQLite distribution for a repo, with all tables defined by the other -build-sqlite flags.
  -build-sqlite-common
    	Build a SQLite distribution for a repo, with common tables. (default true)
  -build-sqlite-rtree
    	Build a SQLite distribution for a repo, with rtree-related tables.
  -build-sqlite-search
    	Build a (common) SQLite distribution for a repo, with search-tables.
  -combined
    	Create a single combined distribution from multiple repos.
  -combined-name string
    	Distribution name for a single combined distribution from multiple repos.
  -compress-all
    	Compress all distributions that are created. (default true)
  -compress-bundle
    	Compress bundle distribution. (default true)
  -compress-max-cpus int
    	Number of concurrent processes to use when compressing distribution items. (default 2)
  -compress-meta
    	Compess meta (CSV) file distribution. (default true)
  -compress-sqlite
    	Compress SQLite database distribution. (default true)
  -custom-repo
    	Allow custom repo names (default true)
  -git-clone string
    	Indicate how to clone a repo, using either a native Git binary or the go-git implementation. Currently only the native Git binary is supported. (default "native")
  -git-organization string
    	Fetch repos from the user (or organization) (default "whosonfirst-data")
  -git-protocol string
    	Fetch repos using this protocol (default "https")
  -git-source string
    	Fetch repos from this endpoint (default "github.com")
  -index-alt-files
    	Index alternate geometry files.
  -index-relations
    	Index the records related to a feature, specifically wof:belongsto, wof:depicts and wof:involves. Alt files for relations are not indexed at this time.
  -index-relations-reader-uri string
    	A valid go-reader.Reader URI from which to read data for a relations candidate.
  -local-checkout
    	Do not fetch a repo from a remote source but instead use a local checkout on disk
  -local-sqlite
    	Do not build a new SQLite database but use a pre-existing database on disk (this expects to find the database at the same path it would be stored if the database were created from scratch)
  -preserve-all
    	Preserve all intermediary files that are used or created while building distributions.
  -preserve-bundle
    	Preserve any intermediary bundles used or created while building distributions.
  -preserve-checkout
    	Do not remove repo from disk after the build process is complete. This is automatically set to true if the -local-checkout flag is true.
  -preserve-meta
    	Preserve any intermediary meta (CSV) files used or created while building distributions.
  -preserve-sqlite
    	Preserve any intermediary SQLite databases used or created while building distributions.
  -query value
    	One or more {PATH}={REGEXP} parameters for filtering records.
  -query-mode string
    	Specify how query filtering should be evaluated. Valid modes are: ALL, ANY (default "ALL")
  -strict
    	Stop execution if any errors loading files (typically alternate geometries) are encountered.
  -timings
    	Display timings during the build process
  -verbose
    	Be chatty
  -workdir string
    	Where to store temporary and final build files. If empty the code will attempt to use the current working directory.

For example:

$> mkdir tmp

$> ./bin/wof-dist-build \
	-timings \
	-verbose \
	-workdir ./tmp \
	-build-sqlite-common \
	-build-meta \
	whosonfirst-data-constituency-ca

13:09:59.008358 [wof-dist-build] STATUS git lfs clone -depth 1 https://github.com/whosonfirst-data/whosonfirst-data-constituency-ca.git tmp/whosonfirst-data-constituency-ca
13:10:04.008358 [wof-dist-build] STATUS time to clone whosonfirst-data-constituency-ca 4.425250127s
13:10:04.008388 [wof-dist-build] STATUS LOCAL tmp/whosonfirst-data-constituency-ca
13:10:07.620093 [wof-dist-build] STATUS CREATED tmp/whosonfirst-data-constituency-ca-latest.db
13:10:07.620126 [wof-dist-build] STATUS BUILD METAFILE sqlite tmp/whosonfirst-data-constituency-ca-latest.db
2018/06/11 13:10:08 time to prepare tmp/whosonfirst-data-constituency-ca-latest.db 874.109451ms
2018/06/11 13:10:08 time to prepare all 55 records 874.133191ms
13:10:08.613818 [wof-dist-build] STATUS time to build all 9.030797929s

$> ls -al ./tmp
-rw-r--r--  1 asc  staff  33390592 Jun 11 13:10 whosonfirst-data-constituency-ca-latest.db
-rw-r--r--  1 asc  staff     17895 Jun 11 13:10 wof-constituency-ca-latest.csv
Inline queries

You can also specify inline queries by passing a -query parameter which is a string in the format of:

{PATH}={REGULAR EXPRESSION}

Paths follow the dot notation syntax used by the tidwall/gjson package and regular expressions are any valid Go language regular expression. Successful path lookups will be treated as a list of candidates and each candidate's string value will be tested against the regular expression's MatchString method.

For example:

$> ./bin/wof-dist-build \
	-query 'properties.wof:placetype=locality' \
	-compress-all=false \
	-workdir /tmp \
	whosonfirst-data-admin-ca

$> sqlite3 /tmp/whosonfirst-data-admin-ca-latest.db
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite> SELECT DISTINCT(placetype) FROM spr;
locality

You can pass multiple -query parameters. For example:

$> ./bin/wof-dist-build \
	-query 'properties.wof:placetype=locality' \
	-query 'properties.wof:belongsto=136251273' \
	-compress-all=false \
	-workdir /tmp
	whosonfirst-data-admin-ca

$> sqlite3 /tmp/whosonfirst-data-admin-ca-latest.db

SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite> SELECT COUNT(id) FROM geojson;
2892

The default query mode is to ensure that all queries match but you can also specify that only one or more queries need to match by passing the -query-mode ANY flag:

"Combined" distributions

It is also possible to create a single combined distribution from two or more repos, passing the -combined and -combined-name flag.

Here's an example that in addition to creating a combined distributions, also assumes local and non-standard repositories, builds a "bundle" distribution and indexes alternate geometry files.

Note that as of this writing alternate geometry files are not supported for either bundles or (CSV) meta files. They will be but today that are only indexed in SQLite databases.

$> ./bin/wof-dist-build \
	-build-bundle -custom-repo -preserve-checkout -local-checkout -index-alt-files \
	-timings -verbose \
	-workdir /usr/local/data \
	-combined -combined-name sfomuseum-data-flights \
	sfomuseum-data-flights-2019-04 sfomuseum-data-flights-2019-05

go build -o bin/wof-dist-build cmd/wof-dist-build/main.go
go build -o bin/wof-dist-fetch cmd/wof-dist-fetch/main.go
15:24:20.003232 [wof-dist-build] STATUS local_checkouts are [/usr/local/data/sfomuseum-data-flights-2019-04 /usr/local/data/sfomuseum-data-flights-2019-05]
15:24:20.105579 [wof-dist-build] STATUS commit hashes are map[sfomuseum-data-flights-2019-04:bd913977adef7a56a5d236046ff878b261d7f289 sfomuseum-data-flights-2019-05:2a12dc4085353ea65423f09ec1369e4c6d6d6426] ([/usr/local/data/sfomuseum-data-flights-2019-04 /usr/local/data/sfomuseum-data-flights-2019-05])
15:25:20.122712 [wof-dist-build] STATUS time to index ancestors (61608) : 23.027587622s
15:25:20.122774 [wof-dist-build] STATUS time to index concordances (61608) : 3.051047626s
15:25:20.123193 [wof-dist-build] STATUS time to index geojson (61608) : 8.833956578s
15:25:20.123215 [wof-dist-build] STATUS time to index spr (61608) : 17.115882932s
15:25:20.123221 [wof-dist-build] STATUS time to index names (61608) : 6.171740731s
15:25:20.123226 [wof-dist-build] STATUS time to index all (61608) : 1m0.004661436s
15:26:20.123918 [wof-dist-build] STATUS time to index names (110124) : 11.724367851s
15:26:20.123986 [wof-dist-build] STATUS time to index ancestors (110124) : 41.615312492s
15:26:20.124001 [wof-dist-build] STATUS time to index concordances (110124) : 6.179600527s
15:26:20.124007 [wof-dist-build] STATUS time to index geojson (110124) : 17.307942063s
15:26:20.124012 [wof-dist-build] STATUS time to index spr (110124) : 39.327491403s
15:26:20.124018 [wof-dist-build] STATUS time to index all (110124) : 2m0.005381573s
15:27:20.127816 [wof-dist-build] STATUS time to index geojson (155162) : 25.833516964s
15:27:20.127834 [wof-dist-build] STATUS time to index spr (155162) : 1m1.9488226s
15:27:20.127854 [wof-dist-build] STATUS time to index names (155162) : 17.371926717s
15:27:20.127860 [wof-dist-build] STATUS time to index ancestors (155162) : 59.038325554s
15:27:20.127864 [wof-dist-build] STATUS time to index concordances (155162) : 9.573363568s
15:27:20.127868 [wof-dist-build] STATUS time to index all (155162) : 3m0.008763864s
15:27:48.281906 [wof-dist-build] STATUS Built  without any reported errors
15:27:48.281945 [wof-dist-build] STATUS local sqlite is /usr/local/data/sfomuseum-data-flights-latest.db
15:27:48.281973 [wof-dist-build] STATUS build metafile from sqlite ([/usr/local/data/sfomuseum-data-flights-latest.db])
2019/06/03 15:28:34 time to prepare /usr/local/data/sfomuseum-data-flights-latest.db 45.957440286s
2019/06/03 15:28:34 time to prepare all 154858 records 45.957729135s
15:28:35.683017 [wof-dist-build] STATUS time to build metafiles (/usr/local/data/sfomuseum-data-flights.csv) 47.400595263s
15:32:04.141721 [wof-dist-build] STATUS time to build bundles () 3m28.456791321s
15:32:04.141747 [wof-dist-build] STATUS time to build UNCOMPRESSED distributions for sfomuseum-data-flights 7m44.134569605s
15:32:04.144406 [wof-dist-build] STATUS register function to compress /usr/local/data/sfomuseum-data-flights-latest.db
15:32:04.144602 [wof-dist-build] STATUS time to wait to start compressing /usr/local/data/sfomuseum-data-flights-latest.db 581ns
15:32:04.144530 [wof-dist-build] STATUS register function to compress /usr/local/data/sfomuseum-data-flights.csv
15:32:04.145585 [wof-dist-build] STATUS time to wait to start compressing /usr/local/data/sfomuseum-data-flights.csv 437ns
15:32:04.144563 [wof-dist-build] STATUS register function to compress /usr/local/data/sfomuseum-data-flights-latest
15:32:20.892664 [wof-dist-build] STATUS All done compressing /usr/local/data/sfomuseum-data-flights.csv (throttle)
15:32:20.892781 [wof-dist-build] STATUS time to compress /usr/local/data/sfomuseum-data-flights.csv 16.747045767s
15:32:20.892860 [wof-dist-build] STATUS All done compressing /usr/local/data/sfomuseum-data-flights.csv
15:32:20.892820 [wof-dist-build] STATUS time to wait to start compressing /usr/local/data/sfomuseum-data-flights-latest 16.746488168s
15:34:21.574356 [wof-dist-build] STATUS All done compressing /usr/local/data/sfomuseum-data-flights-latest.db (throttle)
15:34:21.574376 [wof-dist-build] STATUS time to compress /usr/local/data/sfomuseum-data-flights-latest.db 2m17.428545702s
15:34:21.574380 [wof-dist-build] STATUS All done compressing /usr/local/data/sfomuseum-data-flights-latest.db
15:35:08.325796 [wof-dist-build] STATUS All done compressing /usr/local/data/sfomuseum-data-flights-latest (throttle)
15:35:08.325817 [wof-dist-build] STATUS time to compress /usr/local/data/sfomuseum-data-flights-latest 3m4.1779794s
15:35:08.325822 [wof-dist-build] STATUS All done compressing /usr/local/data/sfomuseum-data-flights-latest
15:35:08.325854 [wof-dist-build] STATUS remove uncompressed file /usr/local/data/sfomuseum-data-flights-latest.db
15:35:08.325873 [wof-dist-build] STATUS remove uncompressed file /usr/local/data/sfomuseum-data-flights.csv
15:35:08.325861 [wof-dist-build] STATUS remove uncompressed file /usr/local/data/sfomuseum-data-flights-latest
15:35:51.580311 [wof-dist-build] STATUS time to remove uncompressed files for sfomuseum-data-flights 43.254038216s
15:35:51.580393 [wof-dist-build] STATUS time to build COMPRESSED distributions for sfomuseum-data-flights 11m31.571211707s
15:35:51.580542 [wof-dist-build] STATUS time to build distributions for 2 repos 11m31.571501299s
15:35:51.581774 [wof-dist-build] STATUS Wrote inventory /usr/local/data/sfomuseum-data-flights-inventory.json

$> cat /usr/local/data/sfomuseum-data-flights-inventory.json 
[
  {
    "name": "sfomuseum-data-flights.csv",
    "type": "x-urn:whosonfirst:csv:meta#event",
    "name_compressed": "sfomuseum-data-flights.csv.bz2",
    "count": 154860,
    "size": 45900340,
    "size_compressed": 7550967,
    "sha256_compressed": "cf2023e9f895f5f9671ebbb280983149b4aa09dfecfc71c967576da5750b4de6",
    "last_updated": "2019-05-17T11:19:14-07:00",
    "last_modified": "2019-06-03T15:28:34-07:00",
    "repo": "sfomuseum-data-flights-2019-04:sfomuseum-data-flights-2019-05",
    "commit": "bd913977adef7a56a5d236046ff878b261d7f289:2a12dc4085353ea65423f09ec1369e4c6d6d6426"
  }, 
  {
    "name": "sfomuseum-data-flights-latest.db",
    "type": "x-urn:whosonfirst:database:sqlite#common",
    "name_compressed": "sfomuseum-data-flights-latest.db.bz2",
    "count": 166469,
    "size": 943042560,
    "size_compressed": 105789001,
    "sha256_compressed": "1c1afc5f337cea024da5e4cd198e67ccb3dbe9c45ed31dfdafb505f2a0a1bc4d",
    "last_updated": "2019-05-17T11:19:14-07:00",
    "last_modified": "2019-06-03T15:27:47-07:00",
    "repo": "sfomuseum-data-flights-2019-04:sfomuseum-data-flights-2019-05",
    "commit": "bd913977adef7a56a5d236046ff878b261d7f289:2a12dc4085353ea65423f09ec1369e4c6d6d6426"
  }, 
  {
    "name": "sfomuseum-data-flights-latest",
    "type": "x-urn:whosonfirst:fs:bundle#sfomuseum-data-flights-latest",
    "name_compressed": "sfomuseum-data-flights-latest.tar.bz2",
    "count": 154860,
    "size": 383617201,
    "size_compressed": 27602413,
    "sha256_compressed": "d6b74885c70107a35cc4c4a8b8707036b55aae74e875cf263d720a4ea67926f4",
    "last_updated": "2019-05-17T11:19:14-07:00",
    "last_modified": "2019-06-03T15:31:00-07:00",
    "repo": "sfomuseum-data-flights-2019-04:sfomuseum-data-flights-2019-05",
    "commit": "bd913977adef7a56a5d236046ff878b261d7f289:2a12dc4085353ea65423f09ec1369e4c6d6d6426"
  }
]

Git

This package depends on there being a platform-specific git and git-lfs binaries present on the system where this is running. There is a branch of the code that uses the go-git package for cloning repositories but some Who's On First repos still require lfs support (hello whosonfirst-data...) It seems like that should be possible in go-git but if it is I haven't figured it out.

Ultimately I'd like to build everything on top of go-git because then we would have a proper pure-Go distribution tool which means we could build platform-native binaries with no extra depedencies. Today, everything depends on Git.

See also

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CompressedDistribution

type CompressedDistribution interface {
	Path() string
	Hash() string
}

type Distribution

type Distribution interface {
	Type() DistributionType
	Path() string
	Count() int64
	Size() int64
	LastUpdate() time.Time

	Compress() (CompressedDistribution, error)
}

type DistributionType

type DistributionType interface {
	Class() string
	Major() string
	Minor() string
}

type Inventory

type Inventory []*Item

type Item

type Item struct {
	Name             string `json:"name"`
	Type             string `json:"type"`
	NameCompressed   string `json:"name_compressed"`
	Count            int64  `json:"count"`
	Size             int64  `json:"size"`
	SizeCompressed   int64  `json:"size_compressed"`
	Sha256Compressed string `json:"sha256_compressed"`
	LastUpdate       string `json:"last_updated"`
	LastModified     string `json:"last_modified"`
	Repo             string `json:"repo,omitempty"`
	Commit           string `json:"commit,omitempty"`
}

func NewItemFromDistribution

func NewItemFromDistribution(d Distribution, c CompressedDistribution, m *MetaData) (*Item, error)

func (*Item) String

func (i *Item) String() string

type MetaData

type MetaData struct {
	Repo         string
	CommitHashes map[string]string
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL