mapreduce

package
v0.0.0-...-33a8a46 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2021 License: MIT Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func RunWorker

func RunWorker(MasterAddress string, me string,
	MapFunc func(string, string) []KeyValue,
	ReduceFunc func(string, []string) string,
	nRPC int, parallelism *Parallelism,
)

RunWorker sets up a connection with the master, registers its address, and waits for tasks to be scheduled.

Types

type DoTaskArgs

type DoTaskArgs struct {
	JobName    string
	File       string   // only for map, the input file
	Phase      jobPhase // are we in mapPhase or reducePhase?
	TaskNumber int      // this task's index in the current phase

	// NumOtherPhase is the total number of tasks in other phase; mappers
	// need this to compute the number of output bins, and reducers needs
	// this to know how many input files to collect.
	NumOtherPhase int
}

DoTaskArgs holds the arguments that are passed to a worker when a job is scheduled on it.

type KeyValue

type KeyValue struct {
	Key   string
	Value string
}

KeyValue is a type used to hold the key/value pairs passed to the map and reduce functions.

type Master

type Master struct {
	sync.Mutex
	// contains filtered or unexported fields
}

Master holds all the state that the master needs to keep track of.

func Distributed

func Distributed(jobName string, files []string, nreduce int, master string) (mr *Master)

Distributed schedules map and reduce tasks on workers that register with the master over RPC.

func Sequential

func Sequential(jobName string, files []string, nreduce int,
	mapF func(string, string) []KeyValue,
	reduceF func(string, []string) string,
) (mr *Master)

Sequential runs map and reduce tasks sequentially, waiting for each task to complete before running the next.

func (*Master) CleanupFiles

func (mr *Master) CleanupFiles()

CleanupFiles removes all intermediate files produced by running mapreduce.

func (*Master) Register

func (mr *Master) Register(args *RegisterArgs, _ *struct{}) error

Register is an RPC method that is called by workers after they have started up to report that they are ready to receive tasks.

func (*Master) Shutdown

func (mr *Master) Shutdown(_, _ *struct{}) error

Shutdown is an RPC method that shuts down the Master's RPC server.

func (*Master) Wait

func (mr *Master) Wait()

Wait blocks until the currently scheduled work has completed. This happens when all tasks have scheduled and completed, the final output have been computed, and all workers have been shut down.

type Parallelism

type Parallelism struct {
	// contains filtered or unexported fields
}

track whether workers executed in parallel.

type RegisterArgs

type RegisterArgs struct {
	Worker string // the worker's UNIX-domain socket name, i.e. its RPC address
}

RegisterArgs is the argument passed when a worker registers with the master.

type ShutdownReply

type ShutdownReply struct {
	Ntasks int
}

ShutdownReply is the response to a WorkerShutdown. It holds the number of tasks this worker has processed since it was started.

type Worker

type Worker struct {
	sync.Mutex

	Map    func(string, string) []KeyValue
	Reduce func(string, []string) string
	// contains filtered or unexported fields
}

Worker holds the state for a server waiting for DoTask or Shutdown RPCs

func (*Worker) DoTask

func (wk *Worker) DoTask(arg *DoTaskArgs, _ *struct{}) error

DoTask is called by the master when a new task is being scheduled on this worker.

func (*Worker) Shutdown

func (wk *Worker) Shutdown(_ *struct{}, res *ShutdownReply) error

Shutdown is called by the master when all work has been completed. We should respond with the number of tasks we have processed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL