Documentation ¶
Overview ¶
Runs a job (or part of a job). There are three primary types of runners
- LocalRunner - Used for simulating a job locally. The sorting and combining functions of Hadoop will be emulated as best as possible, though no guarantees are made
- TaskPhaseRunner - Used inter-step during a Hadoop job. This runs a single phase of a task
- JobRunner - Submits a multi-task Job to hadoop, organizing temporary files and forking the necessary processes.
Index ¶
- func Copy(r Reader, w Writer) (err error)
- func NewWriterCollector(writer Writer) *writerCollector
- func Run(tasks ...*Task) error
- type Collector
- type GroupedReader
- type Job
- type LineReader
- type LocalRunner
- type Phase
- type Reader
- type Runner
- type SortWriter
- type StringWriter
- type Task
- type TaskPhaseRunner
- type Writer
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewWriterCollector ¶
func NewWriterCollector(writer Writer) *writerCollector
Types ¶
type GroupedReader ¶
type GroupedReader struct {
// contains filtered or unexported fields
}
A reader that, for each key, will group all its values into a channel.
func (*GroupedReader) Next ¶
func (gr *GroupedReader) Next() (k, v interface{}, err error)
type LineReader ¶
type LineReader struct {
// contains filtered or unexported fields
}
Line Reader is used by basic streaming jobs. It yields a line number and the raw line delimited by \n. The consumer must accept the arguments (int64, string).
func NewLineReader ¶
func NewLineReader(r io.Reader) *LineReader
func (*LineReader) Next ¶
func (lr *LineReader) Next() (k, v interface{}, err error)
type LocalRunner ¶
type LocalRunner struct {
// contains filtered or unexported fields
}
LocalRunner
func (*LocalRunner) Run ¶
func (lr *LocalRunner) Run(j *Job) (err error)
type Reader ¶
type Reader interface {
Next() (k, v interface{}, err error)
}
func NewGroupedReader ¶
func NewPairReader ¶
Read pairs serialized with Hadoop's typedbytes. It is assumed that in non-local mode, this will always be the wire format for reading and writing.
type SortWriter ¶
type SortWriter struct {
// contains filtered or unexported fields
}
func NewSortWriter ¶
func NewSortWriter(w io.WriteCloser, capacity int) (*SortWriter, error)
func (*SortWriter) Close ¶
func (sw *SortWriter) Close() (err error)
func (*SortWriter) Write ¶
func (sw *SortWriter) Write(k, v interface{}) (err error)
type StringWriter ¶
type StringWriter struct {
// contains filtered or unexported fields
}
StringWriter will coax each key/value to a simple string and output it in simple streaming format: key\tvalue\n
func NewStringWriter ¶
func NewStringWriter(w io.WriteCloser) *StringWriter
func (*StringWriter) Close ¶
func (sw *StringWriter) Close() error
func (*StringWriter) Write ¶
func (sw *StringWriter) Write(k, v interface{}) (err error)
type TaskPhaseRunner ¶
type TaskPhaseRunner struct {
// contains filtered or unexported fields
}
TaskPhaseRunner Runs a single phase of a task forked from Hadoop. It is assumed that all input and output will be typed bytes at this point.
func TaskPhaseRunnerFromArgs ¶
func TaskPhaseRunnerFromArgs(args []string) (tpr *TaskPhaseRunner, err error)
func (*TaskPhaseRunner) Run ¶
func (tpr *TaskPhaseRunner) Run(j *Job) error
type Writer ¶
func NewPairWriter ¶
func NewPairWriter(w io.WriteCloser) Writer
Write pairs to an underlying writer in Hadoop's typedbytes format. As above, it is assumed all non-local IO will happen in this format