lang

package
v0.0.0-...-90deddd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 18, 2023 License: Apache-2.0 Imports: 18 Imported by: 3

Documentation

Overview

Package lang implements the reflow language.

The reflow language is a simple, type-safe, applicative domain specific language used to construct Reflow flows.

A reflow expression is one of the following, where e1, e2, .. themselves represent expressions; id represents an identifier. Other literals are exemplars.

(e1)                      // parenthesization
param("name", "help")     // parameter definition
let id = e1 in e2         // let-binding
func(e1, e2)              // function application (arbitrary arity)
image(e1)                 // import docker image e1
intern(e1)                // internalize data from url e1 (string or list of strings), see below
groupby(e1, e2)           // group the value e1 by the regular expression e2
concat(e1, e2)            // concatenate the strings e1 and e2
map(e1, e2)               // map the function e2 onto the list e1
collect(e1, e2)           // filter out files in e1 that don't match the regexp in e2
collect(e1, e2, e3)       // filter out files in e1 that don't match the regexp in e2;
                          // then rewrite keys with replacement string e3
pullup(vs...)			  // flatten values into one
e1 { bash script {{e2}} } // evaluate a bash script inside the image e1;
                          // materialize the value e2 into its namespace and
                          // substitute {{e2}} for its path
e1["attr1=val1",..]       // set attributes on images
args                      // command line arguments (list of strings)
"literal string"          // a literal string

A reflow program comprises a number of toplevel definitions, each of which are one of:

include("path")           // read definitions from the given file
extern(e1, e2)            // externalize e1 to url e2
id = e1                   // bind e1 to identifier id
func(id) = e1             // define a function where e1 is evaluated with
                          // the bound value of id upon application.

For example, the following program produces a flow that will align a pair of FASTQ files.

// A Docker image that contains the BWA aligner.
bwa = image("619867110810.dkr.ecr.us-west-2.amazonaws.com/wgsv1:latest")

// A read pair stored on S3.
r1 = intern("s3f://grail-marius/demultiplex2/W044216555475mini/FC0/W044216555475mini_S2_L001_R1_001.fastq.gz")
r2 = intern("s3f://grail-marius/demultiplex2/W044216555475mini/FC0/W044216555475mini_S2_L001_R2_001.fastq.gz")

// The BWA reference we'll be using. This fetches the entire s3 prefix,
// which contains both the FASTA files as well as a BWA index.
decoyAndViral = intern("s3://grail-scna/reference/bwa_decoy_viral_index")

// Align a pair of fastq files using BWA. Outputs a BAM file.
// We reserve approximately 12GB of memory for this operation.
align(r1, r2) = bwa["rss=12000000000"] {
	/usr/local/bin/bwa mem {{decoyAndViral}}/decoy_and_viral.fa {{r1}} {{r2}} | \
		/usr/local/bin/samtools view -Sb - > $out
}

// Upload the results of the expression "align(r1, r2)" to a file in S3.
extern(align(r1, r2), "s3://grail-marius/aligned.bam")

Interns

If function intern is handed a comma-separated list of arguments, it interns each separately and combines them into a single "virtual" value. The resulting output contains the union of all of the URLs, with the basename (directory name for directory interns, file names for file interns) appended to the keys of each respective intern. In this mode, directory interns must end in "/" so that the names are translated correctly. In the following example, "input" is a value containing INDEX and the contents of "s3://grail-marius/dir1" under the "dir1/" prefix.

input = intern("s3://grail-marius/dir1/,s3f://grail-marius/INDEX")

In this mode, empty list entries are ignored, thus adding a "," after a URL also hoists URLS into a directory. In the following example, reflow presents a directory with one file named "INDEX".

input = intern("s3f://grail-marius/INDEX,")

Type checking and evaluation

Reflow programs are type checked by inference: reflow computes the type of each expressions and checks that it is subsequently used correctly.

Reflow types are one of:

string             // the type of expressions producing strings
num                // the type of expressions producing numeric values
flow               // the type of expressions producing flows
flowlist           // the type of expressions producint lists of flows
func(n, r)         // the type of n-ary functions returning type r
template           // the type of command literals
image              // the type of expressions producing Docker image refs
void               // the type of side-effecting expressions

Here are some examples of expressions and their types:

"hello world"                                  // string
let h = "hello world" in h                     // string
image("ubuntu")                                // image
image("ubuntu") {
	echo hello world
}                                              // flow
intern("s3://grail-marius/foobar")             // flow
extern(out, "s3://...")                        // void
let h(a, b, c) = string in h                   // func(3, string)

The program is then evaluated into a Flow, which may in turn be evaluated on a computing cluster by the reflow evaluator.

Bugs and future work

The language has many flaws and short-cuts. In particular, it is somewhat hamstrung by its static type checking discipline: for example, we currently restrict the type of function arguments so that they may be safely inferred without a more complicated type inferencing scheme.

We can get rid of this restriction while also retaining safety by more carefully staging reflow evaluation. Currently, a reflow program is evaluated into a flow, but the semantics of map demand that some evaluation is deferred (since we don't know its input beforehand). However, we can sever this tie by representing maps differently. Namely, they may evaluate to a flow where arguments are "holes", named by a de-Brujin index (so that maps may be nested safely). This evaluation scheme would permit the reflow language to use runtime typing while at the same time exposing errors before the (expensive) flow evaluation occurs.

The language also has several other problems and inconsistencies. First, it has shift-reduce conflicts, which we should seek to avoid. Second, it lacks some common features for which users compensate. For example, retaining filename information across groupby-map-merge operations is cumbersome. This can be addressed in future refinements of the language.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Error

type Error struct {
	W io.Writer // The io.Writer to which errors are reported.
	N int       // The error count.
}

Error implements error reporting during parsing, typechecking, and evaluation. It is not safe for concurrent access.

func (*Error) Errorf

func (e *Error) Errorf(pos scanner.Position, format string, args ...interface{})

Errorf formats and reports an error.

type EvalEnv

type EvalEnv struct {
	*Error

	// Def contains toplevel defs available in the environment.
	Def map[string]*Expr
	// Param returns the value of parameter id.
	// The second argument returned indicates whether the
	// parameter was defined.
	Param func(id, help string) (string, bool)
	// contains filtered or unexported fields
}

EvalEnv contains the evaluation used by reflow. It contains a set of defs, params, and a value environment. It is an error reporter.

func (*EvalEnv) Bind

func (e *EvalEnv) Bind(ident string, val Val)

Bind binds a val to an ident.

func (*EvalEnv) Ident

func (e *EvalEnv) Ident(pos scanner.Position) string

Ident returns the current ident.

func (*EvalEnv) Pop

func (e *EvalEnv) Pop()

Pop pops the evaluation environment stack.

func (*EvalEnv) Push

func (e *EvalEnv) Push()

Push pushes the current evaluation environment onto the stack.

func (*EvalEnv) SetIdent

func (e *EvalEnv) SetIdent(ident string)

SetIdent sets the current ident.

func (*EvalEnv) Val

func (e *EvalEnv) Val(ident string) (Val, bool)

Val returns the val bound to ident.

type Expr

type Expr struct {
	scanner.Position
	// contains filtered or unexported fields
}

Expr implements expressions in reflow. They contain the expression's op and arguments (left, right, list) as well as any literal values (ident, val).

func (*Expr) Eval

func (e *Expr) Eval(env EvalEnv) Val

Eval evaluates the expression e in the evaluation environment env. Eval assumes the expression has been typechecked; thus the expression is well-formed.

func (*Expr) Pos

func (e *Expr) Pos() scanner.Position

Pos tells the position of the expression.

func (*Expr) String

func (e *Expr) String() string

String returns a human-readable representation of an expression.

func (*Expr) Type

func (e *Expr) Type(env TypeEnv) Type

Type computes the type of e in the type environment env.

func (*Expr) Walk

func (e *Expr) Walk(v Visitor)

Walk walks the visitor v through the expression. Walk recursively visits the current expression, then its left, right, and list arguments. Finally, it visits nil.

type Lexer

type Lexer struct {
	// File is the filename reported by the lexer's position.
	File string
	// Body contains the text to be lexed.
	Body io.Reader

	// Mode specifies the Lexer mode.
	Mode LexerMode

	// HashVersion is the hash version string, if any.
	HashVersion string

	Expr  *Expr
	Stmts []*Stmt
	// contains filtered or unexported fields
}

Lexer is a lexer for reflow. Its tokens are defined in the reflow grammar. The lexer composes Go's text/scanner: it knows how to tokenize special identifiers, and performs semicolon insertion in the style of Go.

The lexer also manages include directives, which are implemented by recursively instantiating a lexer for the included file. (If we want to support dynamic inclusion, this mechanism would need to be moved to the evaluator.)

func (*Lexer) Error

func (lx *Lexer) Error(s string)

Error reports an error to the lexer.

func (*Lexer) Errorf

func (lx *Lexer) Errorf(format string, args ...interface{})

Errorf formats and then reports an error to the lexer.

func (*Lexer) Init

func (lx *Lexer) Init()

Init initializes the lexer.

func (*Lexer) Lex

func (lx *Lexer) Lex(yy *yySymType) (x int)

Lex returns the next token.

type LexerMode

type LexerMode int

LexerMode determines the lexer's entry behavior.

const (
	// LexerTop begins lexing of a top-level program--i.e., one
	// containing a number of statements.
	LexerTop LexerMode = iota
	// LexerExpr begins lexing of an expression.
	LexerExpr
	// LexerInclude begins lexing of an included file.
	LexerInclude
)

type Op

type Op int

Op is the operation that an expression or statement implements.

func (Op) String

func (i Op) String() string

type Program

type Program struct {
	// Errors is the writer to which errors are reported.
	Errors io.Writer
	// File is the name of the file containing the reflow program.
	File string
	// Args contains the command-line arguments (but not flags)
	// used for this program invocation. Args must be set before
	// calling Eval.
	Args []string
	// contains filtered or unexported fields
}

Program represents a reflow program. It parses, typechecks, and evaluates reflow programs, managing parameters via Go's flags package.

func (*Program) Eval

func (p *Program) Eval() *flow.Flow

Eval evaluates the program and returns a flow. All toplevel extern statements are merged into a single flow.Merge node.

func (*Program) Flags

func (p *Program) Flags() *flag.FlagSet

Flags returns the set of flags that are defined by the program. It is defined only after ParseAndTypecheck has been called. Flags may be set to parameterize the program.

func (*Program) ModuleType

func (p *Program) ModuleType() *types.T

ModuleType computes and returns the Reflow module type for this program. This is used for bridging "v0" scripts into "v1" modules. This should be called only after type checking has completed.

For simplicity we only export non-function values, since they always evaluate to either immediate values.T or else to Flows, both of which have defined digests. We don't let functions escape.

func (*Program) ModuleValue

func (p *Program) ModuleValue() (values.T, error)

ModuleValue computes the Reflow module value given the set of defined parameters.

func (*Program) ParseAndTypecheck

func (p *Program) ParseAndTypecheck(r io.Reader) error

ParseAndTypecheck parses the program presented by the io.Reader r. It returns any error.

func (*Program) Visit

func (p *Program) Visit(e *Expr) Visitor

Visit implements the expression visitor used for converting program parameters to flags.

type Stmt

type Stmt struct {
	scanner.Position
	// contains filtered or unexported fields
}

Stmt implements a statement in reflow. It contains its operation and arguments (left, right, list).

func (*Stmt) Pos

func (s *Stmt) Pos() scanner.Position

Pos tells the position of the statement.

func (*Stmt) Type

func (s *Stmt) Type(env TypeEnv) Type

Type computes the type of s in the type environment env.

type Type

type Type int

Type is the type of types in the reflow language.

func (Type) ReflowType

func (t Type) ReflowType() *types.T

ReflowType converts a "v0" type to a Reflow type. A nil is returned if the type is not supported as a a Reflow type.

func (Type) String

func (t Type) String() string

String returns a human-readable string for type t.

type TypeEnv

type TypeEnv struct {
	*Error
	// The set of toplevel defs
	Def map[string]*Expr
	// contains filtered or unexported fields
}

TypeEnv is a type environment used during typechecking and type inference. It is an error reporter.

func (*TypeEnv) Bind

func (t *TypeEnv) Bind(ident string, typ Type)

Bind binds an identifier to a type.

func (*TypeEnv) Pop

func (t *TypeEnv) Pop()

Pop pops the type environment stack.

func (*TypeEnv) Push

func (t *TypeEnv) Push()

Push pushes the current type environment onto the environment stack.

func (*TypeEnv) Type

func (t *TypeEnv) Type(ident string) (Type, bool)

Type returns the type the type of an identifier in this environment. The second return value indicates whether ident was defined.

type Val

type Val struct {
	// contains filtered or unexported fields
}

Val is a container type for values in the reflow language.

type Visitor

type Visitor interface {
	Visit(e *Expr) Visitor
}

Visitor is implemented by Expr's visitors.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL