rs_ingester

command module
v0.0.0-...-b3895e6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2018 License: MIT Imports: 26 Imported by: 0

README

NOTE: This project is no longer being updated publicly.

Table of Contents

Ingester

The ingester manages loading processed event data into a redshift database and migrating the schemas of the database. At the highest level, it does this by receiving pointers to tsv files and loads them in batches, and migrates tables if it discovers a new version.

metadatastorer

The metadatastorer (code) is a separate binary that has the simple task of reading messages off an SQS queue and then writing them as rows in a postgres metadata database. The incoming messages look like:

{
    "KeyName":"spade-compacter-prod/20160729/oauth_authorize/v0/processor-data-ami-94f837f4/ip-10-192-9-216.us-west-2.compute.internal.1469832764.log.gz",
    "TableName":"oauth_authorize",
    "TableVersion":0
}

and get stored into the tsv table, whose schema is here.

rsloadmanager

The rsloadmanager (code) is the main binary that performs two major functions:

  1. Batch tsvs, and tell redshift to load them with manifests.
  2. Migrate schemas to new versions.
Loaders

The loaders are a pool of goroutines that manages loading the tsvs into the redshift database.

Each goroutine does the following:

  • It searches the tsv table for events that have --loadAgeSeconds old tsvs, or --loadCountTrigger many rows (both configurable) and pulls the oldest to load that is the current table version.
  • It then creates a row in the manifest table and sets the manifest_uuid on the rows in tsv corresponding to that table-version.
  • It creates a manifest in s3 of all those s3 keys (from the tsv rows).
  • Then it submits a COPY query to redshift, pointing at that manifest. If the load succeeds, the files and manifest are deleted from tsv and manifest.
Migrator

The migrator (code) is a separate goroutine that discovers schemas that need to be migrated, and then migrates them.

On startup, a shared (across all the goroutines) map of table name to version number is pulled from the redshift table infra.table_version.

The migrator does the following:

  • It periodically polls the tsv table for (event_name, version) pairs, and compares them to its table version cache. If it discovers a version that is higher than the one in its cache (or it isn't in the cache), that table needs to be migrated.
  • Then it hits blueprint's /migration endpoint to discover the operations it needs to apply to reach the next version. Example of the endpoint:
GET http://<blueprint>/migration/minute-watched?to_version=1
response body:
[
    {"Action":"add","Name":"time","ActionMetadata":{"column_options":" sortkey","column_type":"f@timestamp@unix","inbound":"time"}
    {"Action":"add","Name":"browser","ActionMetadata":{"column_options":"(180)","column_type":"varchar","inbound":"browser"}
    {"Action":"add","Name":"channel","ActionMetadata":{"column_options":"(25)","column_type":"varchar","inbound":"channel"}
    ...
]
  • It then runs the CREATE TABLE or ALTER query and updates infra.table_version in a transaction, and updates its local cache. It then moves on to the next migration.

The migrator also handles calls to the /control/increment_version/:id endpoint (see below). It handles the necessary updates to infra.table_version and the in-memory version cache so that only one goroutine is ever modifying them.

Control

The control module provides an API to control aspects parts of the ingester, called from blueprint.

On error, each of these endpoints returns a 4xx or 5xx and a JSON object: {"Error": }

POST endpoints:

  • /control/force_load: Execute a force load. On success, response is empty with 204 (no content) status code. Body of request must be JSON with:
    Table: name of the table to load
    Requester: name of the person or system requesting the load
  • /control/increment_version/:id: Increment a table's version without waiting for a TSV to come in and the migration to be executed. On success, response is empty with 204 (no content) status code.

GET endpoints:

  • /control/table_exists/:id: Return if a table exists in the infra.table_versions table. Can return false positives for tables that have been dropped.

Response format:

{"Exists": bool}
Blueprint's usage

Blueprint's UI forwards to the force load endpoint in response to a button press, and uses increment version to drop tables which don't have any events being sent.

License

see LICENSE

Documentation

Overview

Package rs_ingester provides servers which load processed event data from the Spade pipeline into a Redshift database.It receives pointers to tsv files from the Spade processor, loads the files in batches, and migrates tables when they are changed in Blueprint.

The outer binary is aliased as rsloadmanager, and is responsible for running loads, retrying loads on failure, and executing migrations on the Redshift instance.

Directories

Path Synopsis
_vendor
github.com/aws/aws-sdk-go/aws
Package aws provides core functionality for making requests to AWS services.
Package aws provides core functionality for making requests to AWS services.
github.com/aws/aws-sdk-go/aws/awserr
Package awserr represents API error interface accessors for the SDK.
Package awserr represents API error interface accessors for the SDK.
github.com/aws/aws-sdk-go/aws/credentials
Package credentials provides credential retrieval and management The Credentials is the primary method of getting access to and managing credentials Values.
Package credentials provides credential retrieval and management The Credentials is the primary method of getting access to and managing credentials Values.
github.com/aws/aws-sdk-go/aws/credentials/endpointcreds
Package endpointcreds provides support for retrieving credentials from an arbitrary HTTP endpoint.
Package endpointcreds provides support for retrieving credentials from an arbitrary HTTP endpoint.
github.com/aws/aws-sdk-go/aws/credentials/stscreds
Package stscreds are credential Providers to retrieve STS AWS credentials.
Package stscreds are credential Providers to retrieve STS AWS credentials.
github.com/aws/aws-sdk-go/aws/defaults
Package defaults is a collection of helpers to retrieve the SDK's default configuration and handlers.
Package defaults is a collection of helpers to retrieve the SDK's default configuration and handlers.
github.com/aws/aws-sdk-go/aws/ec2metadata
Package ec2metadata provides the client for making API calls to the EC2 Metadata service.
Package ec2metadata provides the client for making API calls to the EC2 Metadata service.
github.com/aws/aws-sdk-go/aws/endpoints
Package endpoints provides the types and functionality for defining regions and endpoints, as well as querying those definitions.
Package endpoints provides the types and functionality for defining regions and endpoints, as well as querying those definitions.
github.com/aws/aws-sdk-go/aws/session
Package session provides configuration for the SDK's service clients.
Package session provides configuration for the SDK's service clients.
github.com/aws/aws-sdk-go/aws/signer/v4
Package v4 implements signing for AWS V4 signer Provides request signing for request that need to be signed with AWS V4 Signatures.
Package v4 implements signing for AWS V4 signer Provides request signing for request that need to be signed with AWS V4 Signatures.
github.com/aws/aws-sdk-go/private/endpoints
Package endpoints validates regional endpoints for services.
Package endpoints validates regional endpoints for services.
github.com/aws/aws-sdk-go/private/protocol/query
Package query provides serialization of AWS query requests, and responses.
Package query provides serialization of AWS query requests, and responses.
github.com/aws/aws-sdk-go/private/protocol/rest
Package rest provides RESTful serialization of AWS requests and responses.
Package rest provides RESTful serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/protocol/restxml
Package restxml provides RESTful XML serialization of AWS requests and responses.
Package restxml provides RESTful XML serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil
Package xmlutil provides XML serialization of AWS requests and responses.
Package xmlutil provides XML serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/signer/v4
Package v4 implements signing for AWS V4 signer
Package v4 implements signing for AWS V4 signer
github.com/aws/aws-sdk-go/service/s3
Package s3 provides a client for Amazon Simple Storage Service.
Package s3 provides a client for Amazon Simple Storage Service.
github.com/aws/aws-sdk-go/service/s3/s3iface
Package s3iface provides an interface to enable mocking the Amazon Simple Storage Service service client for testing your code.
Package s3iface provides an interface to enable mocking the Amazon Simple Storage Service service client for testing your code.
github.com/aws/aws-sdk-go/service/s3/s3manager
Package s3manager provides utilities to upload and download objects from S3 concurrently.
Package s3manager provides utilities to upload and download objects from S3 concurrently.
github.com/aws/aws-sdk-go/service/s3/s3manager/s3manageriface
Package s3manageriface provides an interface for the s3manager package
Package s3manageriface provides an interface for the s3manager package
github.com/aws/aws-sdk-go/service/sqs
Package sqs provides a client for Amazon Simple Queue Service.
Package sqs provides a client for Amazon Simple Queue Service.
github.com/aws/aws-sdk-go/service/sqs/sqsiface
Package sqsiface provides an interface to enable mocking the Amazon Simple Queue Service service client for testing your code.
Package sqsiface provides an interface to enable mocking the Amazon Simple Queue Service service client for testing your code.
github.com/aws/aws-sdk-go/service/sts
Package sts provides a client for AWS Security Token Service.
Package sts provides a client for AWS Security Token Service.
github.com/cactus/go-statsd-client/statsd
Package statsd provides a StatsD client implementation that is safe for concurrent use by multiple goroutines and for efficiency can be created and reused.
Package statsd provides a StatsD client implementation that is safe for concurrent use by multiple goroutines and for efficiency can be created and reused.
github.com/davecgh/go-spew/spew
Package spew implements a deep pretty printer for Go data structures to aid in debugging.
Package spew implements a deep pretty printer for Go data structures to aid in debugging.
github.com/go-ini/ini
Package ini provides INI file read and write functionality in Go.
Package ini provides INI file read and write functionality in Go.
github.com/gorilla/context
Package context stores values shared during a request lifetime.
Package context stores values shared during a request lifetime.
github.com/lib/pq
Package pq is a pure Go Postgres driver for the database/sql package.
Package pq is a pure Go Postgres driver for the database/sql package.
github.com/lib/pq/oid
Package oid contains OID constants as defined by the Postgres server.
Package oid contains OID constants as defined by the Postgres server.
github.com/pborman/uuid
The uuid package generates and inspects UUIDs.
The uuid package generates and inspects UUIDs.
github.com/pmezard/go-difflib/difflib
Package difflib is a partial port of Python difflib module.
Package difflib is a partial port of Python difflib module.
github.com/sirupsen/logrus
Package logrus is a structured logger for Go, completely API compatible with the standard library logger.
Package logrus is a structured logger for Go, completely API compatible with the standard library logger.
github.com/stretchr/testify/assert
Package assert provides a set of comprehensive testing tools for use with the normal Go testing system.
Package assert provides a set of comprehensive testing tools for use with the normal Go testing system.
github.com/stretchr/testify/require
Package require implements the same assertions as the `assert` package but stops test execution when a test fails.
Package require implements the same assertions as the `assert` package but stops test execution when a test fails.
github.com/twitchscience/aws_utils/cache/lru
Package lru implements an LRU cache.
Package lru implements an LRU cache.
github.com/twitchscience/aws_utils/common
Package common provides utilities for retrying and working with S3 URLs.
Package common provides utilities for retrying and working with S3 URLs.
github.com/twitchscience/aws_utils/listener
Package listener provides an SQS listener which calls a function on each message.
Package listener provides an SQS listener which calls a function on each message.
github.com/twitchscience/aws_utils/logger
Package logger is a wrapper around logrus that logs in a structured JSON format and provides additional context keys.
Package logger is a wrapper around logrus that logs in a structured JSON format and provides additional context keys.
github.com/zenazn/goji/web
Package web provides a fast and flexible middleware stack and mux.
Package web provides a fast and flexible middleware stack and mux.
github.com/zenazn/goji/web/middleware
Package middleware provides several standard middleware implementations.
Package middleware provides several standard middleware implementations.
github.com/zenazn/goji/web/mutil
Package mutil contains various functions that are helpful when writing http middleware.
Package mutil contains various functions that are helpful when writing http middleware.
golang.org/x/sys/unix
Package unix contains an interface to the low-level operating system primitives.
Package unix contains an interface to the low-level operating system primitives.
gopkg.in/DATA-DOG/go-sqlmock.v1
Package sqlmock is a mock library implementing sql driver.
Package sqlmock is a mock library implementing sql driver.
MetadataStorer fetches pointers to processed TSV files and stores them into a Postgres instance.
MetadataStorer fetches pointers to processed TSV files and stores them into a Postgres instance.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL