maplejuice

package
v0.0.0-...-b893f88 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 28, 2020 License: GPL-3.0 Imports: 20 Imported by: 0

README

MapleJuice

MapleJuice is a batch processing system that works like MapReduce.

Programming Model

MapleJuice consists of two phases of computation – Maple (brother of Map) and Juice (sister of Reduce). Maple takes a set of data and transforms their individual lines into a list of key-value pairs. Juice combines the output from Maple into a smaller list of key-value pairs. However, the Maple function processes 10 input lines from a file simultaneously at a time, while the traditional Map function processed only one input line at a time. The Maple and Juice tasks are user-defined. Users can upload any Maple/Juice executable files that fit the programming model to the system. For simplicity, our system can only run one phase at a time.

System Architecture

MapleJuice is composed of three major components: i) clients for interacting with users, ii) single master node for task scheduling and coordination, iii) worker nodes for task processing. It uses our Distributed Group Membership Service in MP2 for failure detection and Simple Distributed File System (SDFS) in MP3 for storing the input data and the results of the MapleJuice jobs.

Workflow

Users submit jobs on clients, which then send job requests to the master node. When doing a Maple job, the master node partitions the input data and evenly distributes the partitioned data to a set of selected workers. Each worker processes its share of data by repeatedly calling the user-uploaded Maple executable. The generated key-value pairs are sent back to the master to be gathered and written to SDFS, one file per key. When doing a subsequent Juice job, the master node shuffles the keys and allocates them to another set of selected workers. The workers process their dispatched task by repeatedly calling the user-uploaded Juice executable. The processed results are sent back to the master to be combined and written to SDFS.

Scheduling

The master node is in charge of all the scheduling work. It maintains first-in, first-out job queue, allowing at most one job to run at a time with all subsequent jobs waiting in queue. When dispatching tasks to workers, it prefers worker nodes on which replicas of input files already exist to reduce file I/O. When doing failure recovery, it favors free workers (if any) to restart the task previously running on the failed worker.

Fault Tolerance

MapleJuice can tolerate up to 3 simultaneous worker failures, limited by the replication factor (4) used in SDFS. A master failure cannot be tolerated though. On the master node, a worker-task mapping table is maintained. When a working worker is reported as failed, the master node retrieves the information of its running task(s), and selects another worker (free worker preferred) to restart the task(s).

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExecuteJuice

func ExecuteJuice(exe string, cmdArgs []string) bytes.Buffer

func ExecuteMaple

func ExecuteMaple(exe string, content string) bytes.Buffer

Types

type Job

type Job struct {
	Type                   int // 0 for Maple, 1 for Juice
	Exe                    string
	IntermediateFilePrefix string
	SrcDirectory           []string
	NumWorkers             int
	DestFile               string
	DeleteInput            bool
	Partitioner            string
}

Job is a generic representation of a Maple or Juice job

type JobResult

type JobResult struct {
	Keys    []string
	Results []string
}

JobResult is a generic representation of the result of a Maple or Juice task

type JuiceChan

type JuiceChan struct {
	// contains filtered or unexported fields
}

type MasterService

type MasterService struct {
	// contains filtered or unexported fields
}

MasterService provides services of MapleJuice's master

func (*MasterService) HandleJuiceRequest

func (master *MasterService) HandleJuiceRequest(job *Job, reply *int) error

HandleJuiceRequest handles requests of Juice tasks

func (*MasterService) HandleJuiceResult

func (master *MasterService) HandleJuiceResult(result *JobResult, reply *int) error

HandleJuiceResult gathers results from Juice tasks

func (*MasterService) HandleMapleRequest

func (master *MasterService) HandleMapleRequest(job *Job, reply *int) error

HandleMapleRequest handles requests of Maple tasks

func (*MasterService) HandleMapleResult

func (master *MasterService) HandleMapleResult(result *JobResult, reply *int) error

HandleMapleResult gathers results from Maple tasks

func (*MasterService) StartService

func (master *MasterService) StartService(fs *F.FileSystemService) error

StartService starts master service of MapleJuice

func (*MasterService) SubmitJuice

func (master *MasterService) SubmitJuice(strList []string) error

SubmitJuice submits a Juice job

func (*MasterService) SubmitMaple

func (master *MasterService) SubmitMaple(strList []string) error

SubmitMaple submits a Maple job

type WorkerService

type WorkerService struct {
	// contains filtered or unexported fields
}

WorkerService provides services of MapleJuice's worker

func (*WorkerService) RunJuiceTask

func (worker *WorkerService) RunJuiceTask(job *Job, reply *int) error

func (*WorkerService) RunMapleTask

func (worker *WorkerService) RunMapleTask(job *Job, reply *int) error

func (*WorkerService) StartService

func (worker *WorkerService) StartService(fs *F.FileSystemService) error

StartService starts master service of MapleJuice

func (*WorkerService) SubmitJuice

func (worker *WorkerService) SubmitJuice(strList []string) error

func (*WorkerService) SubmitMaple

func (worker *WorkerService) SubmitMaple(strList []string) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL