spade

command module
v0.0.0-...-7e90556 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 29, 2018 License: MIT Imports: 45 Imported by: 0

README

NOTE: This project is no longer being updated publicly.

Spade Processor

Overview

The Spade Processor provides a service to transform and collate stat events into a format that is consistent with the current storage schema for a particular event (if it exists). If you are interested in how we create and maintain schemas take a look at twitchscience/blueprint.

The above has a lot of big words that are kind of confusing so lets start off with an example.

What the Processor Does

The processor can be described by the following diagram:

Downloads logs from S3
   +–+
   |
+––+––+     +–––––––––––––––––––––+––––+–––+
|     |     |                     |    |   ++
|     |     |                     |    +––––+       XXXXXXXXX
|     |     |                     |    |   ++  XXXXXX       XX
|     |     | Parsing Pool        |    +––––+ XX            XX
|     |     |                     |    |   ++  X            XX
|     +–––––+                     |    +––––––––+X   S3      X
|     |     |                     |    |   ++    XX           X
|     |  +  |                     |    +––––+  XX          XXXX
+–––––+  |  +–––––––––––––––––––––+––––+   ++  XXXXXXXXXXXXX
         |
         |                        +––––––––+
         | chan []byte            Writer Controller and Writers
         +

By this model if we attached a thread that simple output

222.22.222.222 [1395707641.000] data=eyJwcm9wZXJ0a...GNoZWQifQ== 09fff3e1-49eff880-535707f3-20f9e114d784a3fa

to the parsing pool, the parsing pool would receive this line and extract the data contained in the (truncated) base64 encoded json blob. This Json blob would then be transformed into a tab delimited representation of the schema for the event as described in the daemon's table_config.json file (an example of how that config looks is contained in the config directory).

This in turn will trigger an ingest of the data into Redshift.

Testing

If you are on a mac, to run the tests you need to brew install pkg-config and gzrt. If you are running this on a mac with Xcode 8.3 and Go < 1.8.1, then you need to provide -ldflags -s to your run.

Replay mode

It is also possible to replay data from an S3 bucket of deglobbed inputs which failed to process properly for some reason (errors in the Blueprint schema or bugs in Spade, for example). Once this is done, the replay script will automatically reingest the reprocessed data into Redshift, replacing the erroneous records if any. Replay can also publish to Kinesis streams, but consumers must be aware that replays are possible.

To run in replay mode:

  1. Deploy a Spark cluster where each worker node has Spade installed. Spade should be isolated by only allowing one task at a time on each worker node.
  2. Inside the Spade directory, run
./replay.sh MASTER_IP REDSHIFT_TARGET START END [TABLE ... | --all-tables]

where * MASTER_IP is the central node of the Spark cluster; * REDSHIFT_TARGET is the destination Redshift cluster; * START and END specify the time window to be replaced, in Pacific time in the format %Y-%m-%d %H:%M:%S; * TABLE ... specifies the tables whose records should be replaced. 3. If the results of the previous step are unsatisfactory, make necessary changes and repeat.

Replay is a 2 step process - transforming edge logs on Spark, and loading the transformed data into Redshift. For flexibility, there are 2 flags that lets you skip either steps. --skip-ace-upload will have replay do the edge log transformation, and will place the transformed files in S3, but will stop short of actually uploading into Redshift. --skip-transform will have replay skip the edge log transformation step, and directly load a runtag's transformed data into Redshift - if this flag is specified, --runtag must also be specified. For example, --skip-ace-upload is used by Tahoe replay because the transformed files are Tahoe-bound instead of Ace-bound.

For extra reliability, instead of running replay.sh locally, start a tmux session on the master node and submit from there.

We expect this tool to be used only every few months; if your needs are similar and the Spark cluster you bring up is only for this tool, consider tearing it down.

Utilities

libexec/spade_parse --config <file>

runs a mini spade server that will read spade requests at localhost:8888 and output how it parsed them.

License

see License

Documentation

Overview

Package spade provides a parallel processing layer for event data in the Spade pipeline. It consumes data written by the Spade Edge, processes it according to rules in Blueprint, and writes it to Kinesis streams and TSV files intended for Redshift (via rs_ingester). It expects events to be a base64 encoded JSON object or list of objects with an event field and a properties field. It rejects any data it doesn’t recognize or cannot decode, including unmapped properties of correctly formatted data. Decodable but unmapped event types are flushed to S3 periodically so that Blueprint can suggest schemas for them.

Directories

Path Synopsis
_vendor
github.com/abh/geoip
Go (cgo) interface to libgeoip
Go (cgo) interface to libgeoip
github.com/aws/aws-sdk-go/aws
Package aws provides core functionality for making requests to AWS services.
Package aws provides core functionality for making requests to AWS services.
github.com/aws/aws-sdk-go/aws/awserr
Package awserr represents API error interface accessors for the SDK.
Package awserr represents API error interface accessors for the SDK.
github.com/aws/aws-sdk-go/aws/credentials
Package credentials provides credential retrieval and management The Credentials is the primary method of getting access to and managing credentials Values.
Package credentials provides credential retrieval and management The Credentials is the primary method of getting access to and managing credentials Values.
github.com/aws/aws-sdk-go/aws/credentials/endpointcreds
Package endpointcreds provides support for retrieving credentials from an arbitrary HTTP endpoint.
Package endpointcreds provides support for retrieving credentials from an arbitrary HTTP endpoint.
github.com/aws/aws-sdk-go/aws/credentials/stscreds
Package stscreds are credential Providers to retrieve STS AWS credentials.
Package stscreds are credential Providers to retrieve STS AWS credentials.
github.com/aws/aws-sdk-go/aws/defaults
Package defaults is a collection of helpers to retrieve the SDK's default configuration and handlers.
Package defaults is a collection of helpers to retrieve the SDK's default configuration and handlers.
github.com/aws/aws-sdk-go/aws/ec2metadata
Package ec2metadata provides the client for making API calls to the EC2 Metadata service.
Package ec2metadata provides the client for making API calls to the EC2 Metadata service.
github.com/aws/aws-sdk-go/aws/endpoints
Package endpoints provides the types and functionality for defining regions and endpoints, as well as querying those definitions.
Package endpoints provides the types and functionality for defining regions and endpoints, as well as querying those definitions.
github.com/aws/aws-sdk-go/aws/session
Package session provides configuration for the SDK's service clients.
Package session provides configuration for the SDK's service clients.
github.com/aws/aws-sdk-go/aws/signer/v4
Package v4 implements signing for AWS V4 signer Provides request signing for request that need to be signed with AWS V4 Signatures.
Package v4 implements signing for AWS V4 signer Provides request signing for request that need to be signed with AWS V4 Signatures.
github.com/aws/aws-sdk-go/private/protocol/json/jsonutil
Package jsonutil provides JSON serialization of AWS requests and responses.
Package jsonutil provides JSON serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/protocol/jsonrpc
Package jsonrpc provides JSON RPC utilities for serialization of AWS requests and responses.
Package jsonrpc provides JSON RPC utilities for serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/protocol/query
Package query provides serialization of AWS query requests, and responses.
Package query provides serialization of AWS query requests, and responses.
github.com/aws/aws-sdk-go/private/protocol/rest
Package rest provides RESTful serialization of AWS requests and responses.
Package rest provides RESTful serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/protocol/restxml
Package restxml provides RESTful XML serialization of AWS requests and responses.
Package restxml provides RESTful XML serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil
Package xmlutil provides XML serialization of AWS requests and responses.
Package xmlutil provides XML serialization of AWS requests and responses.
github.com/aws/aws-sdk-go/service/dynamodb
Package dynamodb provides a client for Amazon DynamoDB.
Package dynamodb provides a client for Amazon DynamoDB.
github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute
Package dynamodbattribute provides marshaling utilities for marshaling to dynamodb.AttributeValue types and unmarshaling to Go value types.
Package dynamodbattribute provides marshaling utilities for marshaling to dynamodb.AttributeValue types and unmarshaling to Go value types.
github.com/aws/aws-sdk-go/service/dynamodb/dynamodbiface
Package dynamodbiface provides an interface to enable mocking the Amazon DynamoDB service client for testing your code.
Package dynamodbiface provides an interface to enable mocking the Amazon DynamoDB service client for testing your code.
github.com/aws/aws-sdk-go/service/elasticache
Package elasticache provides a client for Amazon ElastiCache.
Package elasticache provides a client for Amazon ElastiCache.
github.com/aws/aws-sdk-go/service/elasticache/elasticacheiface
Package elasticacheiface provides an interface to enable mocking the Amazon ElastiCache service client for testing your code.
Package elasticacheiface provides an interface to enable mocking the Amazon ElastiCache service client for testing your code.
github.com/aws/aws-sdk-go/service/firehose
Package firehose provides a client for Amazon Kinesis Firehose.
Package firehose provides a client for Amazon Kinesis Firehose.
github.com/aws/aws-sdk-go/service/firehose/firehoseiface
Package firehoseiface provides an interface to enable mocking the Amazon Kinesis Firehose service client for testing your code.
Package firehoseiface provides an interface to enable mocking the Amazon Kinesis Firehose service client for testing your code.
github.com/aws/aws-sdk-go/service/kinesis
Package kinesis provides a client for Amazon Kinesis.
Package kinesis provides a client for Amazon Kinesis.
github.com/aws/aws-sdk-go/service/kinesis/kinesisiface
Package kinesisiface provides an interface to enable mocking the Amazon Kinesis service client for testing your code.
Package kinesisiface provides an interface to enable mocking the Amazon Kinesis service client for testing your code.
github.com/aws/aws-sdk-go/service/s3
Package s3 provides a client for Amazon Simple Storage Service.
Package s3 provides a client for Amazon Simple Storage Service.
github.com/aws/aws-sdk-go/service/s3/s3iface
Package s3iface provides an interface to enable mocking the Amazon Simple Storage Service service client for testing your code.
Package s3iface provides an interface to enable mocking the Amazon Simple Storage Service service client for testing your code.
github.com/aws/aws-sdk-go/service/s3/s3manager
Package s3manager provides utilities to upload and download objects from S3 concurrently.
Package s3manager provides utilities to upload and download objects from S3 concurrently.
github.com/aws/aws-sdk-go/service/s3/s3manager/s3manageriface
Package s3manageriface provides an interface for the s3manager package
Package s3manageriface provides an interface for the s3manager package
github.com/aws/aws-sdk-go/service/sns
Package sns provides a client for Amazon Simple Notification Service.
Package sns provides a client for Amazon Simple Notification Service.
github.com/aws/aws-sdk-go/service/sns/snsiface
Package snsiface provides an interface to enable mocking the Amazon Simple Notification Service service client for testing your code.
Package snsiface provides an interface to enable mocking the Amazon Simple Notification Service service client for testing your code.
github.com/aws/aws-sdk-go/service/sqs
Package sqs provides a client for Amazon Simple Queue Service.
Package sqs provides a client for Amazon Simple Queue Service.
github.com/aws/aws-sdk-go/service/sqs/sqsiface
Package sqsiface provides an interface to enable mocking the Amazon Simple Queue Service service client for testing your code.
Package sqsiface provides an interface to enable mocking the Amazon Simple Queue Service service client for testing your code.
github.com/aws/aws-sdk-go/service/sts
Package sts provides a client for AWS Security Token Service.
Package sts provides a client for AWS Security Token Service.
github.com/bradfitz/gomemcache/memcache
Package memcache provides a client for the memcached cache server.
Package memcache provides a client for the memcached cache server.
github.com/cactus/go-statsd-client/statsd
Package statsd provides a StatsD client implementation that is safe for concurrent use by multiple goroutines and for efficiency can be created and reused.
Package statsd provides a StatsD client implementation that is safe for concurrent use by multiple goroutines and for efficiency can be created and reused.
github.com/davecgh/go-spew/spew
Package spew implements a deep pretty printer for Go data structures to aid in debugging.
Package spew implements a deep pretty printer for Go data structures to aid in debugging.
gosplitargs
github.com/go-ini/ini
Package ini provides INI file read and write functionality in Go.
Package ini provides INI file read and write functionality in Go.
github.com/myesui/uuid
Package uuid provides RFC4122 and DCE 1.1 UUIDs.
Package uuid provides RFC4122 and DCE 1.1 UUIDs.
github.com/pmezard/go-difflib/difflib
Package difflib is a partial port of Python difflib module.
Package difflib is a partial port of Python difflib module.
github.com/sirupsen/logrus
Package logrus is a structured logger for Go, completely API compatible with the standard library logger.
Package logrus is a structured logger for Go, completely API compatible with the standard library logger.
github.com/stretchr/testify/assert
Package assert provides a set of comprehensive testing tools for use with the normal Go testing system.
Package assert provides a set of comprehensive testing tools for use with the normal Go testing system.
github.com/stretchr/testify/require
Package require implements the same assertions as the `assert` package but stops test execution when a test fails.
Package require implements the same assertions as the `assert` package but stops test execution when a test fails.
github.com/twinj/uuid
This package provides RFC4122 and DCE 1.1 UUIDs.
This package provides RFC4122 and DCE 1.1 UUIDs.
github.com/twitchscience/aws_utils/logger
Package logger is a wrapper around logrus that logs in a structured JSON format and provides additional context keys.
Package logger is a wrapper around logrus that logs in a structured JSON format and provides additional context keys.
github.com/vrischmann/jsonutil
Package jsonutil provides a collection of types implementing the json.Unmarshaler and json.Marshaler interface.
Package jsonutil provides a collection of types implementing the json.Unmarshaler and json.Marshaler interface.
golang.org/x/net/context
Package context defines the Context type, which carries deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes.
Package context defines the Context type, which carries deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes.
golang.org/x/sys/unix
Package unix contains an interface to the low-level operating system primitives.
Package unix contains an interface to the low-level operating system primitives.
golang.org/x/time/rate
Package rate provides a rate limiter.
Package rate provides a rate limiter.
lru
Package lru implements an LRU cache.
Package lru implements an LRU cache.
config_fetcher
Package deglobber decompresses and unglobs events and checks for duplicates.
Package deglobber decompresses and unglobs events and checks for duplicates.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL