forklift

module

v0.0.0-...-0cace24 Latest Latest Go to latest Published: Apr 11, 2024 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/dacort/forklift

Links

Open Source Insights

README ¶

Forklift

A simple utility that can take stdin and redirect it to templatized paths on S3.

Installing

go install github.com/dacort/forklift/cmd/forklift@latest

Then pipe the sample file to a bucket!

curl -o - \
    "https://raw.githubusercontent.com/dacort/forklift/main/sample_data.json" \
    | forklift -w 's3://forklift-demo/{{json "event_type"}}/{{today}}.json'

Overview

Usage is pretty simple - pipe some content to forklift and it will upload it to the desired S3 bucket and path.

echo "Hello Damon" | forklift -w s3://bucket/some/file.txt

While that in itself isn't too exciting (you could just use aws s3 cp - !), where it gets interesting is when you want to pipe JSON data and have it uploaded to a dynamic location based on the content of the data itself. For example, imagine a JSON file with the following content:

sample_data.json

{"event_type": "click", "data": {"uid": 1234, "path": "/signup"}}
{"event_type": "login", "data": {"uid": 1234, "referer": "yak.shave"}}

And imagine we want to pipe this to S3, but split it by event_type. Well, forklift can do that for us!

cat sample_data.json | forklift -w 's3://bucket/{{json "event_type"}}/{{today}}.json'

That will upload two different files:

s3://bucket/click/2021-02-18.json
s3://bucket/login/2021-02-18.json

Default behavior

Note that the default behavior of forklift is to simply echo whatever is passed to it to stdout. This is partially because I build forklift into another project, as noted in the section below.

Advanced Usage

Again, while not terribly interesting as a standalone CLI, where this becomes particularly useful is with cargo-crates. This is a sample project that makes it easy to captial-e Extract data from third-party services without having to be a data engineering wizard.

For example, I've got an Oura ring and want to extract my sleep data. With the Oura Crate, I can simply do:

docker run -e OURA_PAT ghcr.io/dacort/crates-oura sleep

And that'll return a JSON blob with my sleep data for the past 7 days. But let's say I want to drop that sleep data into a location on S3 based on when I went to bed:

docker run -e OURA_PAT ghcr.io/dacort/crates-oura sleep | forklift  -w 's3://bucket/{{json "bedtime_start" | ymdFromTimestamp }}/sleep_data.json'

Cool. Now imagine I want to drop a single Docker container into an ETL workflow that does both of these for me. Well, forklift is integrated into Cargo Crates.

docker run \
    -e OURA_PAT \
    -e FORKLIFT_URI='s3://bucket/{{json "bedtime_start" | ymdFromTimestamp }}/sleep_data.json' \
    ghcr.io/dacort/crates-oura sleep

That will automatically take any stdout of the Docker container and pipe it to that location!

Why?

This seems like a lot of work to just ... upload a file. Well, a few reasons.

I started playing around with the idea of Docker containers that could very simply extract data from an API giving the consumer nothing else to worry about except having Docker and the proper authentication tokens.
Then I wanted to upload the data to S3. But I wanted the Docker containers to remain as lightweight as possible.
It's just a fun experiment. 🤷

Resources

These resources came in handy while building this:

Directories ¶

Path	Synopsis
cmd
forklift
internal
forklift

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL