runner

package
v0.0.0-...-4c91ef0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 28, 2019 License: Apache-2.0 Imports: 30 Imported by: 0

Documentation

Overview

Package runner provides a simple parallel cluster runner for diviner studies. It uses bigmachine[1] to launch multiple machines over which trials are run in parallel, one trial per machine.

Index

Constants

This section is empty.

Variables

View Source
var DefaultPreamble = `set -ex; `

Preamble is prepended to any script run by the worker. It is exposed here for testing.

TODO(marius): make this unnecessary by fixing bigmachine.

View Source
var Logger = log.Debug

Logger is the default logger used for informational messages by package runner. By default it is log.Debug, appropriate for library use.

Functions

func TestSetRetryBackoff

func TestSetRetryBackoff(d time.Duration)

TestSetRetryBackoff sets the interval of retries after errors. For unittests only.

Types

type Runner

type Runner struct {
	// contains filtered or unexported fields
}

A Runner is responsible for creating a cluster of machines and running trials on the cluster.

Runner is also an http.Handler that prints trial statuses.

func New

func New(db diviner.Database) *Runner

New returns a new runner that will perform trials, recording its results to the provided database. The runner uses bigmachine to create new systems according to the run configurations returned from the study. The caller must start the runner's run loop by calling Do.

func (*Runner) Counters

func (r *Runner) Counters() map[string]int

Counters returns a set of runtime counters from this runner's Do loop.

func (*Runner) Loop

func (r *Runner) Loop(ctx context.Context) error

Loop is the runner's main run loop, managing clusters of machines and allocating workers among the runs. The runner stops doing work when the provided context is canceled. All errors are fatal: the runner may not be revived.

BUG(marius): the runner should re-create failed machines.

func (*Runner) Round

func (r *Runner) Round(ctx context.Context, study diviner.Study, ntrials int) (done bool, err error)

func (*Runner) Run

func (r *Runner) Run(ctx context.Context, study diviner.Study, values diviner.Values, replicate int) (diviner.Run, error)

Run performs a single run with the provided study and values. The run is registered in the runner's configured database, and its status is maintained throughout the course of execution. Run returns when the run is complete (its status may be inspected by methods on diviner.Run); all errors are runtime errors, not errors of the run itself. The run is registered with the runner and will show up in the various introspection facilities.

func (*Runner) ServeHTTP

func (r *Runner) ServeHTTP(w http.ResponseWriter, req *http.Request)

ServeHTTP implements http.Handler, providing a simple status page used to examine the currently running trials, organized by study.

func (*Runner) StartTime

func (r *Runner) StartTime() time.Time

StartTime returns the time that the runner was created.

func (*Runner) Stream

func (r *Runner) Stream(ctx context.Context, study diviner.Study, nparallel int) *Streamer

Stream starts a streaming run for the provided study and with the provided parallelism. The returned Streamer controls the ongoing study. Streamers maintain the target parallelism, requesting new points from the underlying oracle as they are needed. Streaming studies stop when they are requested by the caller, or after running out of points to explore, as determined by the study's oracle.

type Streamer

type Streamer struct {
	// contains filtered or unexported fields
}

A Streamer is used to control streaming trials.

func (*Streamer) Stop

func (s *Streamer) Stop()

Stop requests that the streaming study should stop after currently executing trials complete. The study has stopped only after the Wait method returns.

func (*Streamer) Wait

func (s *Streamer) Wait() error

Wait blocks until the study has completed. If the study fails, an error is returned.

Notes

Bugs

  • the runner should re-create failed machines.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL