bigmachine: github.com/grailbio/bigmachine Index | Files | Directories

package bigmachine

import "github.com/grailbio/bigmachine"

Package bigmachine implements a vertically integrated stack for distributed computing in Go. Go programs written with bigmachine are transparently distributed across a number of machines as instantiated by the backend used. (Currently supported: EC2, local machines, unit tests.) Bigmachine clusters comprise a driver node and a number of bigmachine nodes (called "machines"). The driver node can create new machines and communicate with them; machines can call each other.

Computing model

On startup, a bigmachine program calls driver.Start. Driver.Start configures a bigmachine instance based on a set of standard flags and then starts it. (Users desiring a lower-level API can use bigmachine.Start directly.)

import (
	"github.com/grailbio/bigmachine"
	"github.com/grailbio/bigmachine/driver"
	...
)

func main() {
	flag.Parse()
	// Additional configuration and setup.
	b, shutdown := driver.Start()
	defer shutdown()

	// Driver code...
}

When the program is run, driver.Start returns immediately: the program can then interact with the returned bigmachine B to create new machines, define services on those machines, and invoke methods on those services. Bigmachine bootstraps machines by running the same binary, but in these runs, driver.Start never returns; instead it launches a server to handle calls from the driver program and other machines.

A machine is started by (*B).Start. Machines must be configured with at least one service:

m, err := b.Start(ctx, bigmachine.Services{
	"MyService": &MyService{Param1: value1, ...},
})

Users may then invoke methods on the services provided by the returned machine. A services's methods can be invoked so long as they are of the form:

Func(ctx context.Context, arg argType, reply *replyType) error

See package github.com/grailbio/bigmachine/rpc for more details.

Methods are named by the sevice and method name, separated by a dot ('.'), e.g.: "MyService.MyMethod":

if err := m.Call(ctx, "MyService.MyMethod", arg, &reply); err != nil {
	log.Print(err)
} else {
	// Examine reply
}

Since service instances must be serialized so that they can be transmitted to the remote machine, and because we do not know the service types a priori, any type that can appear as a service must be registered with gob. This is usually done in an init function in the package that declares type:

type MyService struct { ... }

func init() {
	// MyService has method receivers
	gob.Register(new(MyService))
}

Vertical computing

A bigmachine program attempts to appear and act like a single program:

- Each machine's standard output and error are copied to the driver;
- bigmachine provides aggregating profile handlers at /debug/bigmachine/pprof
  so that aggregate profiles may be taken over the full cluster;
- command line flags are propagated from the driver to the machine,
  so that a binary run can be configured in the usual way.

The driver program maintains keepalives to all of its machines. Once this is no longer maintained (e.g., because the driver finished, or crashed, or lost connectivity), the machines become idle and shut down.

Services

A service is any Go value that implements methods of the form given above. Services are instantiated by the user and registered with bigmachine. When a service is registered, bigmachine will also invoke an initialization method on the service if it exists. Per-machine initialization can be performed by this method.The form of the method is:

Init(*Service) error

If a non-nil error is returned, the machine is considered failed.

Index

Package Files

bigmachine.go doc.go expvar.go local.go machine.go profile.go status.go supervisor.go system.go

Constants

const RpcPrefix = "/bigrpc/"

RpcPrefix is the path prefix used to serve RPC requests.

func Init Uses

func Init()

Init initializes bigmachine. It should be called after flag parsing and global setup in bigmachine-based processes. Init is a no-op if the binary is not running as a bigmachine worker; if it is, Init never returns.

func RegisterSystem Uses

func RegisterSystem(name string, system System)

RegisterSystem is used by systems implementation to register a system implementation. RegisterSystem registers the implementation with gob, so that instances can be transmitted over the wire. It also registers the provided System instance as a default to use for the name to support bigmachine.Init.

type B Uses

type B struct {
    // contains filtered or unexported fields
}

B is a bigmachine instance. Bs are created by Start and, outside of testing situations, there is exactly one per process.

func Start Uses

func Start(system System, opts ...Option) *B

Start is the main entry point of bigmachine. Start starts a new B using the provided system, returning the instance. B's shutdown method should be called to tear down the session, usually in a defer statement from the program's main:

func main() {
	// Parse flags, configure system.
	b := bigmachine.Start()
	defer b.Shutdown()

	// bigmachine driver code
}

func (*B) Dial Uses

func (b *B) Dial(ctx context.Context, addr string) (*Machine, error)

Dial connects to the machine named by the provided address.

The returned machine is not owned: it is not kept alive as Start does.

func (*B) HandleDebug Uses

func (b *B) HandleDebug(mux *http.ServeMux)

HandleDebug registers diagnostic http endpoints on the provided ServeMux.

func (*B) HandleDebugPrefix Uses

func (b *B) HandleDebugPrefix(prefix string, mux *http.ServeMux)

HandleDebugPrefix registers diagnostic http endpoints on the provided ServeMux under the provided prefix.

func (*B) IsDriver Uses

func (b *B) IsDriver() bool

IsDriver is true if this is a driver instance (rather than a spawned machine).

func (*B) Machines Uses

func (b *B) Machines() []*Machine

Machines returns a snapshot of the current set machines known to this B.

func (*B) Shutdown Uses

func (b *B) Shutdown()

Shutdown tears down resources associated with this B. It should be called by the driver to discard a session, usually in a defer:

b := bigmachine.Start()
defer b.Shutdown()
// driver code

func (*B) Start Uses

func (b *B) Start(ctx context.Context, n int, params ...Param) ([]*Machine, error)

Start launches up to n new machines and returns them. The machines are configured according to the provided parameters. Each machine must have at least one service exported, or else Start returns an error. The new machines may be in Starting state when they are returned. Start maintains a keepalive to the returned machines, thus tying the machines' lifetime with the caller process.

Start returns at least one machine, or else an error.

func (*B) System Uses

func (b *B) System() System

System returns this B's System implementation.

type DiskInfo Uses

type DiskInfo struct {
    Usage disk.UsageStat
}

A DiskInfo describes system disk usage.

type Environ Uses

type Environ []string

Environ is a machine parameter that amends the process environment of the machine. It is a slice of strings in the form "key=value"; later definitions override earlies ones.

type Expvar Uses

type Expvar struct {
    Key   string
    Value string
}

An Expvar is a snapshot of an expvar.

type Expvars Uses

type Expvars []Expvar

Expvars is a collection of snapshotted expvars.

func (Expvars) MarshalJSON Uses

func (e Expvars) MarshalJSON() ([]byte, error)

type Info Uses

type Info struct {
    // Goos and Goarch are the operating system and architectures
    // as reported by the Go runtime.
    Goos, Goarch string
    // Digest is the fingerprint of the currently running binary on the machine.
    Digest digest.Digest
}

Info contains system information about a machine.

func LocalInfo Uses

func LocalInfo() Info

LocalInfo returns system information for this process.

type LoadInfo Uses

type LoadInfo struct {
    Averages load.AvgStat
}

A LoadInfo describes system load.

type Machine Uses

type Machine struct {
    // Addr is the address of the machine. It may be used to create
    // machine instances through Dial.
    Addr string

    // Maxprocs is the number of processors available on the machine.
    Maxprocs int

    // NoExec should be set to true if the machine should not exec a
    // new binary. This is meant for testing purposes.
    NoExec bool
    // contains filtered or unexported fields
}

A Machine is a single machine managed by bigmachine. Each machine is a "one-shot" execution of a bigmachine binary. Machines embody a failure detection mechanism, but does not provide fault tolerance. Each machine comprises instances of each registered bigmachine service. A Machine is created by the bigmachine driver binary, but its address can be passed to other Machines which can in turn connect to each other (through Dial).

Machines are created with (*B).Start.

func (*Machine) Call Uses

func (m *Machine) Call(ctx context.Context, serviceMethod string, arg, reply interface{}) error

Call invokes a method named by a service on this machine. The argument and reply must be provided in accordance to bigmachine's RPC mechanism (see package docs or the docs of the rpc package). Call waits to invoke the method until the machine is in running state, and fails fast when it is stopped.

If a machine fails its keepalive, pending calls are canceled.

func (*Machine) Cancel Uses

func (m *Machine) Cancel()

Cancel cancels all pending operations on machine m. The machine is stopped with an error of context.Canceled.

func (*Machine) DiskInfo Uses

func (m *Machine) DiskInfo(ctx context.Context) (info DiskInfo, err error)

DiskInfo returns the machine's disk usage information.

func (*Machine) Err Uses

func (m *Machine) Err() error

Err returns a machine's error. Err is only well-defined when the machine is in Stopped state.

func (*Machine) Hostname Uses

func (m *Machine) Hostname() string

Hostname returns the hostname portion of the machine's address.

func (*Machine) KeepaliveReplyTimes Uses

func (m *Machine) KeepaliveReplyTimes() []time.Duration

KeepaliveReplyTimes returns a buffer up to the last numKeepaliveReplyTimes keepalive reply latencies, most recent first.

func (*Machine) LoadInfo Uses

func (m *Machine) LoadInfo(ctx context.Context) (info LoadInfo, err error)

LoadInfo returns the machine's current load.

func (*Machine) MemInfo Uses

func (m *Machine) MemInfo(ctx context.Context, readMemStats bool) (info MemInfo, err error)

MemInfo returns the machine's memory usage information. Go runtime memory stats are read if readMemStats is true.

func (*Machine) NextKeepalive Uses

func (m *Machine) NextKeepalive() time.Time

NextKeepalive returns the time at which the next keepalive request is due.

func (*Machine) Owned Uses

func (m *Machine) Owned() bool

Owned tells whether this machine was created and is managed by this bigmachine instance.

func (*Machine) RetryCall Uses

func (m *Machine) RetryCall(ctx context.Context, serviceMethod string, arg, reply interface{}) error

RetryCall invokes Call, and retries on a temporary error.

func (*Machine) State Uses

func (m *Machine) State() State

State returns the machine's current state.

func (*Machine) Wait Uses

func (m *Machine) Wait(state State) <-chan struct{}

Wait returns a channel that is closed once the machine reaches the provided state or greater.

type MemInfo Uses

type MemInfo struct {
    System  mem.VirtualMemoryStat
    Runtime runtime.MemStats
}

A MemInfo describes system and Go runtime memory usage.

type Option Uses

type Option func(b *B)

Option is an option that can be provided when starting a new B. It is a function that can modify the b that will be returned by Start.

func Name Uses

func Name(name string) Option

Name is an option that will name the B. See B.name.

type Param Uses

type Param interface {
    // contains filtered or unexported methods
}

A Param is a machine parameter. Parameters customize machines before the are started.

type Services Uses

type Services map[string]interface{}

Services is a machine parameter that specifies the set of services that should be served by the machine. Each machine should have at least one service. Multiple Services parameters may be passed.

type State Uses

type State int32

State enumerates the possible states of a machine. Machine states proceed monotonically: they can only increase in value.

const (
    // Unstarted indicates the machine has yet to be started.
    Unstarted State = iota
    // Starting indicates that the machine is currently bootstrapping.
    Starting
    // Running indicates that the machine is running and ready to
    // receive calls.
    Running
    // Stopped indicates that the machine was stopped, eitehr because of
    // a failure, or because the driver stopped it.
    Stopped
)

func (State) String Uses

func (m State) String() string

String returns a State's string.

type Supervisor Uses

type Supervisor struct {
    // contains filtered or unexported fields
}

Supervisor is the system service installed on every machine.

func StartSupervisor Uses

func StartSupervisor(ctx context.Context, b *B, system System, server *rpc.Server) *Supervisor

StartSupervisor starts a new supervisor based on the provided arguments.

func (*Supervisor) CPUProfile Uses

func (s *Supervisor) CPUProfile(ctx context.Context, dur time.Duration, prof *io.ReadCloser) error

CPUProfile takes a pprof CPU profile of this process for the provided duration. If a duration is not provided (is 0) a 30-second profile is taken. The profile is returned in the pprof serialized form (which uses protocol buffers underneath the hood).

func (*Supervisor) DiskInfo Uses

func (s *Supervisor) DiskInfo(ctx context.Context, _ struct{}, info *DiskInfo) error

DiskInfo returns disk usage information on the disk where the temporary directory resides.

func (*Supervisor) Exec Uses

func (s *Supervisor) Exec(ctx context.Context, _ struct{}, _ *struct{}) error

Exec reads a new image from its argument and replaces the current process with it. As a consequence, the currently running machine will die. It is up to the caller to manage this interaction.

func (*Supervisor) Expvars Uses

func (s *Supervisor) Expvars(ctx context.Context, _ struct{}, vars *Expvars) error

Expvars returns a snapshot of this machine's expvars.

func (*Supervisor) GetBinary Uses

func (s *Supervisor) GetBinary(ctx context.Context, _ struct{}, rc *io.ReadCloser) error

GetBinary retrieves the last binary uploaded via Setbinary.

func (*Supervisor) Getpid Uses

func (s *Supervisor) Getpid(ctx context.Context, _ struct{}, pid *int) error

Getpid returns the PID of the supervisor process.

func (*Supervisor) Info Uses

func (s *Supervisor) Info(ctx context.Context, _ struct{}, info *Info) error

Info returns the info struct for this machine.

func (*Supervisor) Keepalive Uses

func (s *Supervisor) Keepalive(ctx context.Context, next time.Duration, reply *keepaliveReply) error

Keepalive maintains the machine keepalive. The next argument indicates the callers desired keepalive interval (i.e., the amount of time until the keepalive expires from the time of the call); the accepted time is returned. In order to maintain the keepalive, the driver should call Keepalive again before replynext expires.

func (*Supervisor) LoadInfo Uses

func (s *Supervisor) LoadInfo(ctx context.Context, _ struct{}, info *LoadInfo) error

LoadInfo returns system load information.

func (*Supervisor) MemInfo Uses

func (s *Supervisor) MemInfo(ctx context.Context, readMemStats bool, info *MemInfo) error

MemInfo returns system and Go runtime memory usage information. Go runtime stats are read if readMemStats is true.

func (*Supervisor) Ping Uses

func (s *Supervisor) Ping(ctx context.Context, seq int, replyseq *int) error

Ping replies immediately with the sequence number provided.

func (*Supervisor) Profile Uses

func (s *Supervisor) Profile(ctx context.Context, req profileRequest, prof *io.ReadCloser) error

Profile returns the named pprof profile for the current process. The profile is returned in protocol buffer format.

func (*Supervisor) Profiles Uses

func (s *Supervisor) Profiles(ctx context.Context, _ struct{}, profiles *[]profileStat) error

Profiles returns the set of available profiles and their counts.

func (*Supervisor) Register Uses

func (s *Supervisor) Register(ctx context.Context, svc service, _ *struct{}) error

Register registers a new service with the machine (server) associated with this supervisor. After registration, the service is also initialized if it implements the method

Init(*B) error

func (*Supervisor) Setargs Uses

func (s *Supervisor) Setargs(ctx context.Context, args []string, _ *struct{}) error

Setargs sets the process' arguments. It should be used before Exec in order to invoke the new image with the appropriate arguments.

func (*Supervisor) Setbinary Uses

func (s *Supervisor) Setbinary(ctx context.Context, binary io.Reader, _ *struct{}) error

Setbinary uploads a new binary to replace the current binary when Supervisor.Exec is called. The two calls are separated so that different timeouts can be applied to upload and exec.

func (*Supervisor) Setenv Uses

func (s *Supervisor) Setenv(ctx context.Context, env []string, _ *struct{}) error

Setenv sets the processes' environment. It is applied to newly exec'd images, and should be called before Exec. The provided environment is appended to the default process environment: keys provided here override those that already exist in the environment.

func (*Supervisor) Shutdown Uses

func (s *Supervisor) Shutdown(ctx context.Context, req shutdownRequest, _ *struct{}) error

Shutdown will cause the process to exit asynchronously at a point in the future no sooner than the specified delay.

type System Uses

type System interface {
    // Name is the name of this system. It is used to multiplex multiple
    // system implementations, and thus should be unique among
    // systems.
    Name() string
    // Init is called when the bigmachine starts up in order to
    // initialize the system implementation. If an error is returned,
    // the Bigmachine fails.
    Init(*B) error
    // Main is called to start  a machine. The system is expected to
    // take over the process; the bigmachine fails if main returns (and
    // if it does, it should always return with an error).
    Main() error
    // HTTPClient returns an HTTP client that can be used to communicate
    // from drivers to machines as well as between machines.
    HTTPClient() *http.Client
    // ListenAndServe serves the provided handler on an HTTP server that
    // is reachable from other instances in the bigmachine cluster. If addr
    // is the empty string, the default cluster address is used.
    ListenAndServe(addr string, handle http.Handler) error
    // Start launches up to n new machines.  The returned machines can
    // be in Unstarted state, but should eventually become available.
    Start(ctx context.Context, n int) ([]*Machine, error)
    // Exit is called to terminate a machine with the provided exit code.
    Exit(int)
    // Shutdown is called on graceful driver exit. It's should be used to
    // perform system tear down. It is not guaranteed to be called.
    Shutdown()
    // Maxprocs returns the maximum number of processors per machine,
    // as configured. Returns 0 if is a dynamic value.
    Maxprocs() int
    // KeepaliveConfig returns the various keepalive timeouts that should
    // be used to maintain keepalives for machines started by this system.
    KeepaliveConfig() (period, timeout, rpcTimeout time.Duration)
    // Tail returns a reader that follows the bigmachine process logs.
    Tail(ctx context.Context, m *Machine) (io.Reader, error)
    // Read returns a reader that reads the contents of the provided filename
    // on the host. This is done outside of the supervisor to support external
    // monitoring of the host.
    Read(ctx context.Context, m *Machine, filename string) (io.Reader, error)
}

A System implements a set of methods to set up a bigmachine and start new machines. Systems are also responsible for providing an HTTP client that can be used to communicate between machines and drivers.

var Local System = new(localSystem)

Local is a System that insantiates machines by creating new processes on the local machine.

Directories

PathSynopsis
driverPackage driver provides a convenient API for bigmachine drivers, which includes configuration by flags.
ec2systemPackage ec2system implements a bigmachine System that launches machines on dedicated EC2 spot instances.
ec2system/instances
internal/authorityPackage authority provides an in-process TLS certificate authority, useful for creating and distributing TLS certificates for mutually authenticated HTTPS networking within Bigmachine.
internal/ioutilPackage ioutil contains utilities for performing I/O in bigmachine.
internal/teePackage tee implements utilities for I/O multiplexing.
rpcPackage rpc implements a simple RPC system for Go methods.
testsystemPackage testsystem implements a bigmachine system that's useful for testing.

Package bigmachine imports 50 packages (graph) and is imported by 8 packages. Updated 2019-11-27. Refresh now. Tools for package owners.