corral: Index | Files | Directories

package corral

import ""

Package corral is a MapReduce framework designed to be deployed to serverless platforms, like AWS Lambda.

It presents a lightweight alternative to Hadoop MapReduce. Much of the design philosophy was inspired by Yelp's mrjob -- corral retains mrjob's ease-of-use while gaining the type safety and speed of Go.

Corral's runtime model consists of stateless, transient executors controlled by a central driver. Currently, the best environment for deployment is AWS Lambda, but corral is modular enough that support for other serverless platforms can be added as support for Go in cloud functions improves.

Corral is best suited for data-intensive but computationally inexpensive tasks, such as ETL jobs.


Package Files

config.go doc.go driver.go emitter.go executor.go job.go lambda.go mapreduce.go split.go task.go

type Driver Uses

type Driver struct {
    // contains filtered or unexported fields

Driver controls the execution of a MapReduce Job

func NewDriver Uses

func NewDriver(job *Job, options ...Option) *Driver

NewDriver creates a new Driver with the provided job and optional configuration

func NewMultiStageDriver Uses

func NewMultiStageDriver(jobs []*Job, options ...Option) *Driver

NewMultiStageDriver creates a new Driver with the provided jobs and optional configuration

func (*Driver) Main Uses

func (d *Driver) Main()

Main starts the Driver, running the submitted jobs.

type Emitter Uses

type Emitter interface {
    Emit(key, value string) error
    // contains filtered or unexported methods

Emitter enables mappers and reducers to yield key-value pairs.

type Job Uses

type Job struct {
    Map           Mapper
    Reduce        Reducer
    PartitionFunc PartitionFunc
    // contains filtered or unexported fields

Job is the logical container for a MapReduce job

func NewJob Uses

func NewJob(mapper Mapper, reducer Reducer) *Job

NewJob creates a new job from a Mapper and Reducer.

type Mapper Uses

type Mapper interface {
    Map(key, value string, emitter Emitter)

Mapper defines the interface for a Map task.

type Option Uses

type Option func(*config)

Option allows configuration of a Driver

func WithInputs Uses

func WithInputs(inputs ...string) Option

WithInputs specifies job inputs (i.e. input files/directories)

func WithMapBinSize Uses

func WithMapBinSize(s int64) Option

WithMapBinSize sets the MapBinSize of the Driver

func WithReduceBinSize Uses

func WithReduceBinSize(s int64) Option

WithReduceBinSize sets the ReduceBinSize of the Driver

func WithSplitSize Uses

func WithSplitSize(s int64) Option

WithSplitSize sets the SplitSize of the Driver

func WithWorkingLocation Uses

func WithWorkingLocation(location string) Option

WithWorkingLocation sets the location and filesystem backend of the Driver

type PartitionFunc Uses

type PartitionFunc func(key string, numBins uint) (binIdx uint)

PartitionFunc defines a function that can be used to segment map keys into intermediate buckets. The default partition function simply hashes the key, and takes hash % numBins to determine the bin. The value returned from PartitionFunc (binIdx) must be in the range 0 <= binIdx < numBins, i.e. [0, numBins)

type Phase Uses

type Phase int

Phase is a descriptor of the phase (i.e. Map or Reduce) of a Job

const (
    MapPhase Phase = iota

Descriptors of the Job phase

type Reducer Uses

type Reducer interface {
    Reduce(key string, values ValueIterator, emitter Emitter)

Reducer defines the interface for a Reduce task.

type ValueIterator Uses

type ValueIterator struct {
    // contains filtered or unexported fields

ValueIterator iterates over a sequence of values. This is used during the Reduce phase, wherein a reduce task iterates over all values for a particular key.

func (*ValueIterator) Iter Uses

func (v *ValueIterator) Iter() <-chan string

Iter iterates over all the values in the iterator.



Package corral imports 26 packages (graph). Updated 2019-04-14. Refresh now. Tools for package owners.