pipeline

package module
v0.0.0-...-773bb05 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2015 License: MIT Imports: 10 Imported by: 0

README

pipeline

A framework for performing MapReduce using Go.

Build Status

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultKeyFieldDelimiter = []byte("\t")
View Source
var NewLine = []byte("\n")

Functions

func Hash

func Hash(keys ...string) string

Hash returns a hashed string of the given keys. Used for creating compound keys when grouping by multiple fields.

Types

type DelimiterDeserializer

type DelimiterDeserializer struct {
	// contains filtered or unexported fields
}

DelimiterDeserializer is a deserializer that splits input rows based on the configured delimiter. It is suitable for use with Hadoop Streaming.

func (*DelimiterDeserializer) Error

func (g *DelimiterDeserializer) Error() error

Error returns the last error to occur or nil if there are no errors.

func (*DelimiterDeserializer) HasNext

func (g *DelimiterDeserializer) HasNext() bool

HasNext advances the underlying scanner and returns true when data is available. It will return false once no data is available or an error occurs.

func (*DelimiterDeserializer) Next

func (g *DelimiterDeserializer) Next() Emitter

Next retrieves the next available Emitter from the underlying scanner calling the defined constructor method.

type Deserializer

type Deserializer interface {
	// HasNext should return true if there is data available
	HasNext() bool
	// Next should return the next available Emitter interface
	Next() Emitter
	// Error should return the last error to occur
	Error() error
}

Deserializer is an interface for deserializing the input data into Emitter interfaces which can then be used within other pipeline methods.

type DeserializerFunc

type DeserializerFunc func(io.Reader) Deserializer

DeserializerFunc is a function that accepts an io.Reader and returns a Deserializer interface.

func NewDelimiterDeserializer

func NewDelimiterDeserializer(delimiter []byte, fn func([]byte) Emitter) DeserializerFunc

NewDelimiterDeserializer returns a DeserializerFunc that uses delimiter as the row delimiter and fn as the constructor function.

func NewGOBDeserializer

func NewGOBDeserializer(fn func() Emitter) DeserializerFunc

NewGOBDeserializer

func NewJSONDeserializer

func NewJSONDeserializer(fn func() Emitter) DeserializerFunc

NewJSONDeserializer

func NewMsgPackDeserializer

func NewMsgPackDeserializer(fn func() Emitter) DeserializerFunc

func NewTSVDeserializer

func NewTSVDeserializer(constructor func() Emitter) DeserializerFunc

type Emitter

type Emitter interface {
	// Emit is called passing in the underlying writer. It's upto the caller
	// to write data in the required format.
	Emit(w io.Writer) error
	// Where should return true or false. It it returns false then this
	// Emitter will not be used within the Map or Reduce methods.
	Where() bool
}

Emitter implements methods required within the Map stage.

type Finalizer

type Finalizer interface {
	Reducer
	Finalize(w io.Writer) string
}

Finalize implements a Finalize method. During the reduce stage, if a type implements the Finalizer interface Finalize is called once for each reduced Key.

type GOBDeserializer

type GOBDeserializer struct {
	// contains filtered or unexported fields
}

func (*GOBDeserializer) Error

func (g *GOBDeserializer) Error() error

func (*GOBDeserializer) HasNext

func (g *GOBDeserializer) HasNext() bool

func (*GOBDeserializer) Next

func (g *GOBDeserializer) Next() Emitter

type JSONDeserializer

type JSONDeserializer struct {
	// contains filtered or unexported fields
}

func (*JSONDeserializer) Error

func (j *JSONDeserializer) Error() error

func (*JSONDeserializer) HasNext

func (j *JSONDeserializer) HasNext() bool

func (*JSONDeserializer) IgnoreMalformedJSON

func (j *JSONDeserializer) IgnoreMalformedJSON() *JSONDeserializer

func (*JSONDeserializer) Next

func (j *JSONDeserializer) Next() Emitter

type Map

type Map struct {
	DeserializerFunc DeserializerFunc
	// contains filtered or unexported fields
}

Map implements the Pipeline interface. It deserializes an input using a Deserializer and emits a key value pair of resulting Emitter interface.

func NewMap

func NewMap(d DeserializerFunc) *Map

NewMap creates a new Map with the given Deserializer d

func (*Map) Close

func (m *Map) Close() error

func (*Map) In

func (m *Map) In(r io.Reader) Pipeline

In reads from input r into the Map and returns the Map.

func (*Map) Out

func (m *Map) Out(w io.Writer)

Out writes the Map output to w. It returns a channel of Emitters from the deserializer and then emits the Emitters Key and Value to the output.

func (*Map) Read

func (m *Map) Read(p []byte) (int, error)

func (*Map) Then

func (m *Map) Then(p Pipeline) Pipeline

Then starts a goroutine to write the output of the Map to the given Pipeline and returns it.

func (*Map) Write

func (m *Map) Write(p []byte) (int, error)

type MsgPackDeserializer

type MsgPackDeserializer struct {
	// contains filtered or unexported fields
}

func (*MsgPackDeserializer) Error

func (m *MsgPackDeserializer) Error() error

func (*MsgPackDeserializer) HasNext

func (m *MsgPackDeserializer) HasNext() bool

func (*MsgPackDeserializer) Next

func (m *MsgPackDeserializer) Next() Emitter

type Pipeline

type Pipeline interface {
	io.ReadWriteCloser
	// In takes an io.Reader and returns a instance of the Pipeline
	// interface. It is used for providing an input, such as a file,
	// to a Pipeline
	In(io.Reader) Pipeline
	// Then writes the output of the Pipeline to another Pipeline and returns
	// that Pipeline. It is used for chaining together stages of the pipeline.
	Then(Pipeline) Pipeline
	// Out writes the output of this Pipeline to the given io.Writer
	Out(io.Writer)
}

Pipeline implements the required methods for a pipeline stage.

type Reduce

type Reduce struct {
	DeserializerFunc DeserializerFunc
	// contains filtered or unexported fields
}

Reduce is a reduce process that reads from an io.PipeReader performs a reduce operation and writes to an io.PipeWriter

func NewReduce

func NewReduce(d DeserializerFunc) *Reduce

NewReduce creates and returns a pointer to a new reduce type using the given DeserializerFunc

func (*Reduce) Close

func (m *Reduce) Close() error

func (*Reduce) In

func (m *Reduce) In(r io.Reader) Pipeline

func (*Reduce) Out

func (m *Reduce) Out(w io.Writer)

func (*Reduce) Read

func (m *Reduce) Read(p []byte) (int, error)

func (*Reduce) Reduce

func (m *Reduce) Reduce(w io.Writer)

Reduce performs the reduce operation

func (*Reduce) Then

func (m *Reduce) Then(p Pipeline) Pipeline

func (*Reduce) Write

func (m *Reduce) Write(p []byte) (int, error)

type Reducer

type Reducer interface {
	Emitter
	// Provides the key to use for grouping the sum operations
	Key() string
	// Sum implements logic to Sum together two copies of this Emitter.
	// The underlying type of the receiver and the argument will always be
	// identical.
	Sum(emitter ...Emitter)
}

Reducer implements the Sum method. It is used within the reduce stage to perform summing of sequential matching keys and their values

type Reducers

type Reducers []Reducer

Emitters is a slice of Emitter interfaces that implements the sort.Interface

func (Reducers) Len

func (e Reducers) Len() int

func (Reducers) Less

func (e Reducers) Less(i, j int) bool

func (Reducers) Swap

func (e Reducers) Swap(i, j int)

type Sorter

type Sorter struct {
	DeserializerFunc DeserializerFunc
	// contains filtered or unexported fields
}

func NewSorter

func NewSorter(d DeserializerFunc) *Sorter

func (*Sorter) Close

func (m *Sorter) Close() error

func (*Sorter) In

func (m *Sorter) In(r io.Reader) Pipeline

func (*Sorter) Out

func (m *Sorter) Out(w io.Writer)

func (*Sorter) Read

func (m *Sorter) Read(p []byte) (int, error)

func (*Sorter) Sort

func (m *Sorter) Sort(w io.Writer)

func (*Sorter) Then

func (m *Sorter) Then(p Pipeline) Pipeline

func (*Sorter) Write

func (m *Sorter) Write(p []byte) (int, error)

type TSVDeserializer

type TSVDeserializer struct {
	// contains filtered or unexported fields
}

func (*TSVDeserializer) Error

func (m *TSVDeserializer) Error() error

func (*TSVDeserializer) HasNext

func (m *TSVDeserializer) HasNext() bool

func (*TSVDeserializer) Next

func (m *TSVDeserializer) Next() Emitter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL