properties

package module
v1.2.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 18, 2023 License: BSD-3-Clause Imports: 11 Imported by: 0

README

go-whosonfirst-properties

Go package for working with Who's On First properties

Documentation

Go Reference

Tools

$> make cli
go build -mod vendor -o bin/report-properties cmd/report-properties/main.go
go build -mod vendor -o bin/index-properties cmd/index-properties/main.go
index-properties

Crawl a series of Who's On First documents and ensure that all their properties have a corresponding property file in your whosonfirst-properties/properties directory.

$> ./bin/index-properties -h
Usage of ./bin/index-properties:
  -alternate value
    	One or more paths to alternate properties directories that will be crawled to check for existing properties (that will not be duplicated).
  -debug
    	Go through all the motions but don't write any new files.
  -exclude value
    	One or more valid regular expressions to use for excluding property names you don't want to index
  -iterator-uri string
    	A valid go-whosonfirst-iterate/v2 URI. (default "repo://")
  -properties string
    	The path to your whosonfirst-properties/properties directory

For example:

$> ./bin/index-properties \
	-mode sqlite \
	-properties ../whosonfirst-properties/properties \
	/usr/local/data/whosonfirst-data-constituency-us-latest.db

Or:

$> ./bin/index-properties \
	-exclude 'misc\:.*' \
	-alternate /usr/local/whosonfirst/whosonfirst-properties/properties \
	-properties /usr/local/sfomuseum/sfomuseum-properties \
	/usr/local/data/sfomuseum-data-*

Or iterating over all the repositories matching a pattern (sfomuseum-data-flights-) in a given organization (sfomuseum-data):

$> ./bin/index \
	-iterator-uri org:///tmp \
	-properties /usr/local/sfomuseum/sfomuseum-properties/properties \
	-alternate /usr/local/whosonfirst/whosonfirst-properties/properties \
	'sfomuseum-data://?prefix=sfomuseum-data-flights-&exclude=sfomuseum-data-flights-YYYY-MM'
report-properties

Generate a CSV report for a list of whosonfirst-properties properties.

> ./bin/report-properties -h
Usage of ./bin/report:
  -properties string
    	      The path to your whosonfirst-properties/properties directory
  -report string
    	  The path to write your whosonfirst-properties report. Default is STDOUT.

For example:

$> ./bin/report-properties -properties ../whosonfirst-properties/properties
id,prefix,name,description
1158804491,edtf,cessation,"Indicates when a place stopped being a going concern. The semantics for something ceasing may vary from placetype to placetype. For example, a venue may cease operations or a country may split in to multiple countries."
1158844675,abrv,{lang}_x_colloquial,"The colloquial, informal abbreviation for a place."
1158808009,addr,city,
1158804493,geom,area,"The geometric area of a feature, in WGS84 (unprojected lat/lng)."
1158844669,abrv,{lang}_x_historical,The historical abbreviation for a place.
1158804489,edtf,deprecated,Indicates the date when a place was determined to be invalid (was never a going concern).
1158808003,addr,conscriptionnumber,
1158804497,geom,area_square_m,"The geometric area of a feature in square meters, in the EPSG:3410 projection."
... and so on

Docker

Basics

There is a Dockerfile for building a container designed to clone a specific properties (defintions) repo, records properties for all the files from multiple repositories in a given organization and commit those changes.

For example:

$> docker build -t whosonfirst-properties-indexing .

And then:

$> docker run whosonfirst-properties-indexing /bin/index.sh \
	-t 'constant://?val={GITHUB_TOKEN}' \
	-s 'whosonfirst-data://?prefix=whosonfirst-data-admin-'

Note: The command above will index all 270+ whosonfirst-data-admin-* repositories which won't be quick. The idea behind the Docker stuff is to periodically run across all the Who's On First repositories in a hosted container like Amazon's ECS service, or equivalent.

The index.sh script bundled with the container is copied from the docker-bin/index.sh script. It accepts the following arguments:

$> ./docker-bin/index.sh -h
usage: ./index.sh -options
options:
-h Print this message.
-a Zero or more Git URLs for alternate properties repositories to clone.
-c An optional branch to checkout when performing updates. If not empty then this value will be used to set the -u (update branch) flag. (Default is ).
-e Zero or more regular expressions to specify properties that should not be indexed.
-o The GitHub organization for the properties repo. (Default is whosonfirst.)
-r The name of the properties repo. (Default is whosonfirst-properties.)
-s A whosonfirst/go-whosonfirst-iterate-organization URI source to defines repositories to index. (Default is whosonfirst-data:\/\/?prefix=whosonfirst-data-&exclude=whosonfirst-data-venue-.)
-t A gocloud.dev/runtimevar URI referencing the GitHub API access token to use for updating {PROPERTIES_REPO}. (Default is constant://?val=s33kret.)
-u The branch name where updates should be pushed. (Default is main).
Fancy

Here's a more sophisticated example. In this instance the "principal" properties repository is sfomuseum/sfomuseum-properties but the whosonfirst/whosonfirst-properties repository is used as an "alternate" (source of property definitions). In this way the sfomuseum-properties should only contain property definitions unique the sfomuseum-specific projects.

Additionally properties starting in misc are excluded (-e misc) from consideration and the final updates are pushed to a testing2 branch (-c testing2).

In this example only a single repository is indexed from the sfomuseum-data organization (-s 'sfomuseum-data://?prefix=sfomuseum-data-maps').

$> docker run whosonfirst-properties-indexing /bin/index.sh \
	-a https://github.com/whosonfirst/whosonfirst-properties.git \
	-e misc \
	-o sfomuseum \
	-s 'sfomuseum-data://?prefix=sfomuseum-data-maps' \
	-t 'constant://?val={GITHUB_TOKEN}' \
	-r sfomuseum-properties \
	-c testing2
	
Cloning into '/usr/local/data/sfomuseum-properties'...
Cloning into '/usr/local/data/whosonfirst-properties.git'...
./bin/index-properties -iterator-uri org:///tmp -properties /usr/local/data/sfomuseum-properties/properties -alternate /usr/local/data/whosonfirst-properties.git/properties -exclude misc sfomuseum-data://?prefix=sfomuseum-data-maps
2022/07/01 22:31:50 time to index paths (1) 1.570320779s
2022/07/01 22:31:50 time to index paths (1) 3.087979488s
Switched to a new branch 'testing2'
On branch testing2
nothing to commit, working tree clean
remote: 
remote: Create a pull request for 'testing2' on GitHub by visiting:        
remote:      https://github.com/sfomuseum/sfomuseum-properties/pull/new/testing2        
remote: 
To https://github.com/sfomuseum/sfomuseum-properties
 * [new branch]      testing2 -> testing2
Notes
  • GitHub API access tokens (specified in the -t flag) are derived using the sfomuseum/runtimevar tool. Please consult the documentation for the list of supported URI schemes.
AWS

As usual doing things in AWS is a bit of confusing mess to set things up. The following are basic instructions for run the Docker tools described above as a scheduled task using the AWS Elastic Container Service.

Elastic Container Registry

Create a new entry for the whosonfirst-properties-indexing container, per the AWS documention. For example:

docker build -t whosonfirst-properties-indexing .
docker tag whosonfirst-properties-indexing:latest {ACCOUNT}.dkr.ecr.{REGION}.amazonaws.com/whosonfirst-properties-indexing:0.0.1
docker push {ACCOUNT}.dkr.ecr.{REGION}.amazonaws.com/whosonfirst-properties-indexing:0.0.1
docker tag whosonfirst-properties-indexing:latest {ACCOUNT}.dkr.ecr.{REGION}.amazonaws.com/whosonfirst-properties-indexing:latest
docker push {ACCOUNT}.dkr.ecr.{REGION}.amazonaws.com/whosonfirst-properties-indexing:latest
Parameter Store

Create a new encrypted key (entry) in the AWS Parameter Store that contains a valid GitHub access token that can be used to update a properties repository.

For the purposes of this documentation we'll call this key github-properties-token.

IAM
Policies

Create a new policy to allow reading the github-properties-token AWS Parameter Store entry.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ssm:DescribeParameters"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "ssm:GetParameter",
            "Resource": "arn:aws:ssm:{REGION}:{ACCOUNT}:parameter/github-properties-token"
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:{REGION}:{ACCOUNT}:key/CMK"
            ]
        }
    ]
}

For the purposes of this documentation we'll call this policy GetGitHubPropertiesToken.

Roles

Create a new role to run the whosonfirst-properties-indexing container with the following policies:

  • GetGitHubPropertiesToken
  • AmazonECSTaskExecutionRolePolicy

Make sure it has a "trust relationship" with ECS:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ecs-tasks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

For the purposes of this documentation we'll call this role PropertiesIndexing.

Elastic Container Service
Task Definitions

Create a new Linux-based FARGATE task definition referencing the whosonfirst-properties-indexing container which assumes the PropertiesIndexing role.

For the purposes of this documentation we'll call this task definition whosonfirst-properties-indexing.

Scheduled Tasks

In a suitable (ECS) cluster create a new scheduled task to run the whosonfirst-properties-indexing task definition at a desired interval.

Unless you've already added a container override in the task definition, create one in the scheduled task. For example:

/bin/index.sh,-a,-s,sfomuseum-data://?prefix=whosonfirst-data&exclude=whosonfirst-data-venue-,-t,awsparamstore://github-properties-token?region={REGION}&credentials=iam:

The command above will index all the properties in all the whosonfirst-data- repositories except those starting with whosonfirst-data-venue. Note the use of the awsparamstore token parameter (-t) to read a GitHub access token from AWS Parameter Store.

See also

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Property

type Property struct {
	// The unique ID of this property
	Id int64 `json:"id"`
	// The name of the property
	Name string `json:"name"`
	// The namespace (prefix) of this property
	Prefix string `json:"prefix"`
	// A description of the property targeted at humans (rather than machines)
	Description string `json:"description"`
	// The expected (JSON schema) type of this property
	Type PropertyType `json:"type"`
}

type Property is a struct that maps to a machine-readable data file describing a Who's On First property.

func NewPropertyFromFile

func NewPropertyFromFile(path string) (*Property, error)

NewPropertyFromFiles() reads and parses the contents of 'path' and returns a new `Property` instance.

func NewPropertyFromKey

func NewPropertyFromKey(k string) (*Property, error)

NewPropertyFromKey() parses 'k' and returns a new `Property` instance.

func NewPropertyFromReader

func NewPropertyFromReader(fh io.Reader) (*Property, error)

NewPropertyFromFiles() reads and parses the contents of 'fh' and returns a new `Property` instance.

func (*Property) EnsureId

func (p *Property) EnsureId() error

EnsureId() ensures that the property has a unique (64-bit) identifier.

func (*Property) Filename

func (p *Property) Filename() string

Filename() returns the filename for the serialized representation of this property.

func (*Property) IsName

func (p *Property) IsName() bool

IsName() returns a boolean value indicating whether or not the property is a `name:*` property

func (*Property) RelPath

func (p *Property) RelPath() string

Filename() returns the relative path (inclusive of filename) for the serialized representation of this property.

func (*Property) String

func (p *Property) String() string

String returns the fully-qualified name (prefix + ":" + name) of this property

func (*Property) Write

func (p *Property) Write(dest string) error

Write() serializes this property and writes it to 'dest' which is expected to be a directory.

type PropertyType added in v1.0.0

type PropertyType interface{}

Directories

Path Synopsis
cmd
index-properties
index is a command line tool for crawling one or more Who's On First data sources and ensuring that individual properties contained in those records have a corresponding machine-readable properties description file.
index is a command line tool for crawling one or more Who's On First data sources and ensuring that individual properties contained in those records have a corresponding machine-readable properties description file.
Package index provides methods for cataloging and indexing directories containing Who's On First style property definition files.
Package index provides methods for cataloging and indexing directories containing Who's On First style property definition files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL