dp-dd-csv-filter

command module
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 24, 2017 License: MIT Imports: 10 Imported by: 0

README

dp-csv-filter

Application retrieves a specified CSV file from AWS s3 bucket, and filters it by dimension values. The output is then written to a new file in an AWS s3 bucket.

The /filter endpoint accepts HTTP POST request with a FilterRequest body {"filePath": "$PATH_TO_FILE$"}

Getting started

First grab the code

go get github.com/ONSdigital/dp-csv-filter

You will need to have Kafka set up locally. Set the following env variables (the example here uses the default ports)

ZOOKEEPER=localhost:2181
KAFKA=localhost:9092

Install Kafka:

brew install kafka
brew services start kafka
brew services start zookeeper

Run the Kafka console consumer

kafka-console-consumer --zookeeper $ZOOKEEPER --topic filter-request

Run the Kafka console producer

kafka-console-producer --broker-list $KAFKA --topic filter-request

Run the filter

make debug

The following curl command will instruct the application attempt to get the specified file from the AWS bucket, filter it and write the output back to the output file in the bucket

curl -H "Content-Type: application/json" -X POST -d '{ "inputUrl": "s3://dp-csv-splitter-1/Open-Data-for-filter.csv", "outputUrl": "s3://dp-csv-splitter-1/Open-Data-filtered.csv", "dimensions": { "NACE": [ "08 - Other mining and quarrying", "1012 - Processing and preserving of poultry meat"], "Prodcom Elements": [ "Work done", "Waste Products"] } }' http://localhost:21100/filter

Or paste the following line into the kafka console producer mentioned above:

{ "inputUrl": "s3://dp-csv-splitter/Open-Data-v3.csv", "outputUrl": "s3://dp-dd-csv-filter/Open-Data-v3.csv", "dimensions": { "NACE": [ "CI_0000072", "CI_0008197"], "Prodcom Elements": [ "CI_0021513", "CI_0021514"] } }

The project includes a small data set in the sample_csv directory for test usage.

Configuration
Environment variable Default Description
BIND_ADDR ":21100" The host and port to bind to.
KAFKA_ADDR "http://localhost:9092" The Kafka address to request messages from.
S3_BUCKET "dp-csv-splitter-1" The name of AWS S3 bucket to get the csv files from.
AWS_REGION "eu-west-1" The AWS region to use.
KAFKA_CONSUMER_GROUP "filter-request" The name of the Kafka group to read messages from.
KAFKA_CONSUMER_TOPIC "filter-request" The name of the Kafka topic to read messages from.
Contributing

See CONTRIBUTING for details.

License

Copyright ©‎ 2016, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL