generic-rw-s3

command module
v1.11.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2023 License: MIT Imports: 13 Imported by: 0

README

Generic Reader/Writer for S3

Circle CI Go Report Card Coverage Status

system-code: upp-generic-rw-s3 and upp-concepts-rw-s3

Introduction

An API for reading/writing generic payloads up to S3. It can be setup to read those payloads off Kafka.

Build and Test

git clone github.com/Financial-Times/generic-rw-s3  
cd ./generic-rw-s3
go build -mod=readonly
go test -mod=readonly -race ./...

Running locally

export|set PORT=8080
export|set BUCKET_NAME='bucketName"
export|set AWS_REGION="eu-west-1"
./generic-rw-s3

The app assumes that you have correctly set up your AWS credentials by either using the ~/.aws/credentials file:

[default]
aws_access_key_id = AKID1234567890
aws_secret_access_key = MY-SECRET-KEY

or the default AWS environment variables:

AWS_ACCESS_KEY_ID=AKID1234567890
AWS_SECRET_ACCESS_KEY=MY-SECRET-KEY

There are optional arguments as well:

export|set BUCKET_PREFIX="bucketPrefix" # adds a prefix folder to all items uploaded
export|set WORKERS=10 # Number of concurrent downloads when downloading all items. Default is 10
Run locally with read from kafka enabled

./generic-rw-s3 --port=8080 --bucketName="bucketName" --bucketPrefix="bucketPrefix" --awsRegion="eu-west-1" --kafka-address="<kafka_address>" --consumer-group="<consumer_group>" --consumer-topic="<topic_to_read>"

Run locally with specified resource path

./generic-rw-s3 --port=8080 --resourcePath="concepts" --bucketName="bucketName" --bucketPrefix="bucketPrefix" --awsRegion="eu-west-1"

Test locally

See Endpoints section.

Build and deployment

Service Endpoints

For complete API specification see S3 Read/Write API Endpoint

PUT /UUID

Any payload can be written via the PUT using a unique UUID to identify this payload within the S3 bucket

curl -H 'Content-Type: application/json' -X PUT -d '{"tags":["tag1","tag2"],"question":"Which band?","answers":[{"id":"a0","answer":"Answer1"},{"id":"a1","answer":"answer2"}]}' http://localhost:8080/123e4567-e89b-12d3-a456-426655440000

The Content-Type is important as that will be what the file will be stored as. In addition we will also store transaction ID in S3. It is either provided as request header and if not, it is auto-generated.

When the content is uploaded, the key generated for the item is converted from 123e4567-e89b-12d3-a456-426655440000 to <bucket_prefix>/123e4567/e89b/12d3/a456/426655440000. The reason we do this is so that it becomes easier to manage/browser for content in the AWS console. It is also good practice to do this as it means that files get put into different partitions. This is important if you're writing and pulling content from S3 as it means that content will get written/read from different partitions on S3.

If bucket_prefix is present, concepts will always be written in the following format <bucket_prefix>/123e4567/e89b/12d3/a456/426655440000.

However if bucket_prefix is not present, there is additional path parameter functionality for writing/reading concepts to/from a specific partition. A sample request would look like:

curl -H 'Content-Type: application/json' -X PUT -d '{"tags":["tag1","tag2"],"question":"Which band?","answers":[{"id":"a0","answer":"Answer1"},{"id":"a1","answer":"answer2"}]}' http://localhost:8080/123e4567-e89b-12d3-a456-426655440000?path=TestDirectory

when the parameter is present and the content is uploaded, the key generated for the item is converted from 123e4567-e89b-12d3-a456-426655440000 to TestDirectory/123e4567/e89b/12d3/a456/426655440000.

GET /UUID

This internal read should return what was written to S3

If not found, you'll get a 404 response.

curl http://localhost:8080/bcac6326-dd23-4b6a-9dfa-c2fbeb9737d9

Getting what was written in specific directory

curl http://localhost:8080/bcac6326-dd23-4b6a-9dfa-c2fbeb9737d9?path=TestDirectory
DELETE /UUID

To delete something from specific directory the path parameter should be appended to the request as follows:

curl .../bcac6326-dd23-4b6a-9dfa-c2fbeb9737d9?path=TestDirectory

Will return 204

Utility endpoints

GET /

Streams all payloads in a given bucket

To stream the payload a specific directory the path parameter should be appended to the request as follows:

curl .../bcac6326-dd23-4b6a-9dfa-c2fbeb9737d9?path=TestDirectory
GET /__ids

Streams all ids in a given bucket

curl http://localhost:8080/__ids

The return payload will look like:

{"ID":"dcfa65d6-3849-445e-ac6a-15bc5a17e954"}
{"ID":"2136f8ad-e94e-45cb-b616-336f38533214"}
{"ID":"c9f5337d-0435-477e-b0f5-bd35ff3a4b48"}
{"ID":"7f84a70b-7085-4309-aa8e-304b3759f49f"}
{"ID":"99a0537a-3635-479b-92f7-ba10b63e2f87"}
...

If there were concepts also in a directory TestDirectory the payload will look like:

{"ID":"dcfa65d6-3849-445e-ac6a-15bc5a17e954"}
{"ID":"2136f8ad-e94e-45cb-b616-336f38533214"}
{"ID":"c9f5337d-0435-477e-b0f5-bd35ff3a4b48"}
{"ID":"TestDirectory-7f84a70b-7085-4309-aa8e-304b3759f49f"}
{"ID":"TestDirectory-99a0537a-3635-479b-92f7-ba10b63e2f87"}
...
Admin endpoints

Healthchecks: http://localhost:8080/__health
Build Info: http://localhost:8080/__build-info GTG: http://localhost:8080/__gtg

Other Information
Hashing

This service stores a hash of the payload in the metadata of the s3 object on each write. If the ONLY_UPDATES_ENABLED flag is set to true the payload's hash is compared to the stored record. Only records which have been updated or are entirely new will be written. Records that have not been updated will instead return 304 Not Modified. If the ONLY_UPDATES_ENABLED flag is set to false then records will always be updated regardless of the stored hash. The hash can also be bypassed by setting a request header of "X-Ignore-Hash" to true.

S3 buckets

For this to work you need to make sure that your AWS credentials has the following policy file on the bucket.

{
    "Version": "2012-10-17",
    "Id": "Policy12345678990",
    "Statement": [
        {
            "Sid": "Stmt12345678990",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<ID>:user/<AWS_KEY_ID>"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>",
                "arn:aws:s3:::<BUCKET_NAME>/*"
            ]
        }
    ]
}

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL