beam: github.com/apache/beam/sdks/go/pkg/beam/core/graph Index | Files | Directories

package graph

import "github.com/apache/beam/sdks/go/pkg/beam/core/graph"

Package graph is the internal representation of the Beam execution plan. This package is used by the public-facing Beam package to organize the user's pipeline into a connected graph structure. This graph is a precise, strongly-typed representation of the user's intent, and allows the runtime to verify typing of collections, and tracks the data dependency relationships to allow an optimizer to schedule the work.

Index

Package Files

bind.go doc.go edge.go fn.go graph.go node.go scope.go xlang.go

Constants

const (
    MainUnknown mainInputs = -1 // Number of inputs is unknown for DoFn validation.
    MainSingle  mainInputs = 1  // Number of inputs for single value elements.
    MainKv      mainInputs = 2  // Number of inputs for KV elements.
)

The following constants prefixed with "Main" represent valid numbers of DoFn main inputs for DoFn construction and validation.

const CombinePerKeyScope = "CombinePerKey"

CombinePerKeyScope is the Go SDK canonical name for the combine composite scope. With Beam Portability, "primitive" composite transforms like combine have their URNs & payloads attached to a high level scope, with a default representation beneath. The use of this const permits the translation layer to confirm the SDK expects this combine to be liftable by a runner and should set this scope's URN and Payload accordingly.

Variables

var (
    // SourceInputTag is a constant random string used when an ExternalTransform
    // expects a single unnamed input. xlangx and graphx use it to explicitly
    // bypass steps in pipeline construction meant for named inputs
    SourceInputTag string

    // SinkOutputTag is a constant random string used when an ExternalTransform
    // expects a single unnamed output. xlangx and graphx use it to explicitly
    // bypass steps in pipeline construction meant for named outputs.
    SinkOutputTag string

    // NewNamespace is a utility random string generator used by the xlang to
    // scope individual ExternalTransforms by a unique namespace
    NewNamespace func() string
)

func Bounded Uses

func Bounded(ns []*Node) bool

Bounded returns true iff all nodes are bounded.

func CoGBKMainInput Uses

func CoGBKMainInput(components int) func(*config)

CoGBKMainInput is an optional config to NewDoFn which specifies the number of components of a CoGBK input to the DoFn being created, allowing for more complete validation.

Example usage:

var col beam.PCollection
graph.NewDoFn(fn, graph.CoGBKMainInput(len(col.Type().Components())))

func IsLifecycleMethod Uses

func IsLifecycleMethod(n string) bool

lifecycleMethodName returns if the passed in string is one of the lifecycle method names used by the Go SDK as DoFn or CombineFn lifecycle methods. These are the only methods that need shims generated for them.

func NewNamespaceGenerator Uses

func NewNamespaceGenerator(n int) func() string

NewNamespaceGenerator returns a functions that generates a random string of n alphabets

func NodeTypes Uses

func NodeTypes(list []*Node) []typex.FullType

NodeTypes returns the fulltypes of the supplied slice of nodes.

func NumMainInputs Uses

func NumMainInputs(num mainInputs) func(*config)

NumMainInputs is an optional config to NewDoFn which specifies the number of main inputs to the DoFn being created, allowing for more complete validation. Valid inputs are the package constants of type mainInputs.

Example usage:

graph.NewDoFn(fn, graph.NumMainInputs(graph.MainKv))

type CombineFn Uses

type CombineFn Fn

CombineFn represents a CombineFn.

func AsCombineFn Uses

func AsCombineFn(fn *Fn) (*CombineFn, error)

AsCombineFn converts a Fn to a CombineFn, if possible.

func NewCombineFn Uses

func NewCombineFn(fn interface{}) (*CombineFn, error)

NewCombineFn constructs a CombineFn from the given value, if possible.

func (*CombineFn) AddInputFn Uses

func (f *CombineFn) AddInputFn() *funcx.Fn

AddInputFn returns the "AddInput" function, if present.

func (*CombineFn) CompactFn Uses

func (f *CombineFn) CompactFn() *funcx.Fn

CompactFn returns the "Compact" function, if present.

func (*CombineFn) CreateAccumulatorFn Uses

func (f *CombineFn) CreateAccumulatorFn() *funcx.Fn

CreateAccumulatorFn returns the "CreateAccumulator" function, if present.

func (*CombineFn) ExtractOutputFn Uses

func (f *CombineFn) ExtractOutputFn() *funcx.Fn

ExtractOutputFn returns the "ExtractOutput" function, if present.

func (*CombineFn) MergeAccumulatorsFn Uses

func (f *CombineFn) MergeAccumulatorsFn() *funcx.Fn

MergeAccumulatorsFn returns the "MergeAccumulators" function. If it is the only method present, then InputType == AccumulatorType == OutputType.

func (*CombineFn) Name Uses

func (f *CombineFn) Name() string

Name returns the name of the function or struct.

func (*CombineFn) SetupFn Uses

func (f *CombineFn) SetupFn() *funcx.Fn

SetupFn returns the "Setup" function, if present.

func (*CombineFn) TeardownFn Uses

func (f *CombineFn) TeardownFn() *funcx.Fn

TeardownFn returns the "Teardown" function, if present.

type DoFn Uses

type DoFn Fn

DoFn represents a DoFn.

func AsDoFn Uses

func AsDoFn(fn *Fn, numMainIn mainInputs) (*DoFn, error)

AsDoFn converts a Fn to a DoFn, if possible. numMainIn specifies how many main inputs are expected in the DoFn's method signatures. Valid inputs are the package constants of type mainInputs. If that number is MainUnknown then validation is done by best effort and may miss some edge cases.

func NewDoFn Uses

func NewDoFn(fn interface{}, options ...func(*config)) (*DoFn, error)

NewDoFn constructs a DoFn from the given value, if possible.

func (*DoFn) FinishBundleFn Uses

func (f *DoFn) FinishBundleFn() *funcx.Fn

FinishBundleFn returns the "FinishBundle" function, if present.

func (*DoFn) IsSplittable Uses

func (f *DoFn) IsSplittable() bool

IsSplittable returns whether the DoFn is a valid Splittable DoFn.

func (*DoFn) Name Uses

func (f *DoFn) Name() string

Name returns the name of the function or struct.

func (*DoFn) ProcessElementFn Uses

func (f *DoFn) ProcessElementFn() *funcx.Fn

ProcessElementFn returns the "ProcessElement" function.

func (*DoFn) SetupFn Uses

func (f *DoFn) SetupFn() *funcx.Fn

SetupFn returns the "Setup" function, if present.

func (*DoFn) StartBundleFn Uses

func (f *DoFn) StartBundleFn() *funcx.Fn

StartBundleFn returns the "StartBundle" function, if present.

func (*DoFn) TeardownFn Uses

func (f *DoFn) TeardownFn() *funcx.Fn

TeardownFn returns the "Teardown" function, if present.

type DynFn Uses

type DynFn struct {
    // Name is the name of the function. It does not have to be a valid symbol.
    Name string
    // T is the type of the generated function
    T   reflect.Type
    // Data holds the data, if any, for the generator. Each function
    // generator typically needs some configuration data, which is
    // required by the DynFn to be encoded.
    Data []byte
    // Gen is the function generator. The function generator itself must be a
    // function with a unique symbol.
    Gen func(string, reflect.Type, []byte) reflectx.Func
}

DynFn is a generator for dynamically-created functions:

gen: (name string, t reflect.Type, []byte) -> func : T

where the generated function, fn : T, is re-created at runtime. This concept allows serialization of dynamically-generated functions, which do not have a valid (unique) symbol such as one created via reflect.MakeFunc.

type ExpandedTransform Uses

type ExpandedTransform struct {
    Components   interface{} // *pipepb.Components
    Transform    interface{} //*pipepb.PTransform
    Requirements []string
}

ExpandedTransform stores the expansion response associated to each ExternalTransform.

Components and Transform fields are purposely typed as interface{} to avoid unnecesary proto related imports into graph.

type ExternalTransform Uses

type ExternalTransform struct {
    Namespace string

    Urn           string
    Payload       []byte
    ExpansionAddr string

    InputsMap  map[string]int
    OutputsMap map[string]int

    Expanded *ExpandedTransform
}

ExternalTransform represents the cross-language transform in and out of pipeline graph. It is associated with each MultiEdge and it's Inbound and Outbound links. It also stores the associated expansion response within the Expanded field.

func (ExternalTransform) WithNamedInputs Uses

func (ext ExternalTransform) WithNamedInputs(inputsMap map[string]int) ExternalTransform

WithNamedInputs adds a map (tag -> index of Inbound in MultiEdge.Input) of named inputs corresponsing to ExternalTransform's InputsMap

func (ExternalTransform) WithNamedOutputs Uses

func (ext ExternalTransform) WithNamedOutputs(outputsMap map[string]int) ExternalTransform

WithNamedOutputs adds a map (tag -> index of Outbound in MultiEdge.Output) of named outputs corresponsing to ExternalTransform's OutputsMap

type Fn Uses

type Fn struct {
    // Fn holds the function, if present. If Fn is nil, Recv must be
    // non-nil.
    Fn  *funcx.Fn
    // Recv hold the struct receiver, if present. If Recv is nil, Fn
    // must be non-nil.
    Recv interface{}
    // DynFn holds the function-generator, if dynamic. If not nil, Fn
    // holds the generated function.
    DynFn *DynFn
    // contains filtered or unexported fields
}

Fn holds either a function or struct receiver.

func NewFn Uses

func NewFn(fn interface{}) (*Fn, error)

NewFn pre-processes a function, dynamic function or struct for graph construction.

func (*Fn) Name Uses

func (f *Fn) Name() string

Name returns the name of the function or struct.

type Graph Uses

type Graph struct {
    // contains filtered or unexported fields
}

Graph represents an in-progress deferred execution graph and is easily translatable to the model graph. This graph representation allows precise control over scope and connectivity.

func New Uses

func New() *Graph

New returns an empty graph with the scope set to the root.

func (*Graph) Build Uses

func (g *Graph) Build() ([]*MultiEdge, []*Node, error)

Build performs finalization on the graph. It verifies the correctness of the graph structure, typechecks the plan and returns a slice of the edges in the graph.

func (*Graph) NewEdge Uses

func (g *Graph) NewEdge(parent *Scope) *MultiEdge

NewEdge creates a new edge of the graph in the supplied scope.

func (*Graph) NewNode Uses

func (g *Graph) NewNode(t typex.FullType, w *window.WindowingStrategy, bounded bool) *Node

NewNode creates a new node in the graph of the supplied fulltype.

func (*Graph) NewScope Uses

func (g *Graph) NewScope(parent *Scope, name string) *Scope

NewScope creates and returns a new scope that is a child of the supplied scope.

func (*Graph) Root Uses

func (g *Graph) Root() *Scope

Root returns the root scope of the graph.

func (*Graph) String Uses

func (g *Graph) String() string

type Inbound Uses

type Inbound struct {
    // Kind presents the form of the data that the edge expects. Main input
    // must be processed element-wise, but side input may take several
    // convenient forms. For example, a DoFn that processes ints may choose
    // among the following parameter types:
    //
    //   * Main:      int
    //   * Singleton: int
    //   * Slice:     []int
    //   * Iter:      func(*int) bool
    //   * ReIter:    func() func(*int) bool
    //
    // If the DoFn is generic then int may be replaced by any of the type
    // variables. For example,
    //
    //   * Slice:     []typex.T
    //   * Iter:      func(*typex.X) bool
    //
    // If the input type is KV<int,string>, say, then the options are:
    //
    //   * Main:      int, string  (as two separate parameters)
    //   * Map:       map[int]string
    //   * MultiMap:  map[int][]string
    //   * Iter:      func(*int, *string) bool
    //   * ReIter:    func() func(*int, *string) bool
    //
    // As above, either int, string, or both can be replaced with type
    // variables. For example,
    //
    //   * Map:       map[typex.X]typex.Y
    //   * MultiMap:  map[typex.T][]string
    //   * Iter:      func(*typex.Z, *typex.Z) bool
    //
    // Note that in the last case the parameter type requires that both
    // the key and value types are identical. Bind enforces such constraints.
    Kind InputKind

    // From is the incoming node in the graph.
    From *Node

    // Type is the fulltype matching the actual type used by the transform.
    // Due to the loose signatures of DoFns, we can only determine the
    // inbound structure when the fulltypes of the incoming links are present.
    // For example,
    //
    //     func (ctx context.Context, key int, value typex.X) error
    //
    // is a generic DoFn that if bound to KV<int,string> would have one
    // Inbound link with type KV<int, X>.
    Type typex.FullType
}

Inbound represents an inbound data link from a Node.

func NamedInboundLinks(ins map[string]*Node) (map[string]int, []*Inbound)

NamedInboundLinks returns an array of new Inbound links and a map (tag -> index of Inbound in MultiEdge.Input) of corresponding indices with respect to their names.

func (*Inbound) String Uses

func (i *Inbound) String() string

type InputKind Uses

type InputKind string

InputKind represents the role of the input and its shape.

const (
    Main      InputKind = "Main"
    Singleton InputKind = "Singleton"
    Slice     InputKind = "Slice"
    Map       InputKind = "Map"      // TODO: allow?
    MultiMap  InputKind = "MultiMap" // TODO: allow?
    Iter      InputKind = "Iter"
    ReIter    InputKind = "ReIter"
)

Valid input kinds.

func Bind Uses

func Bind(fn *funcx.Fn, typedefs map[string]reflect.Type, in ...typex.FullType) ([]typex.FullType, []InputKind, []typex.FullType, []typex.FullType, error)

Bind returns the inbound, outbound and underlying output types for a Fn, when bound to the underlying input types. The complication of bind is primarily that UserFns have loose signatures and bind must produce valid type information for the execution plan.

For example,

func (t EventTime, k typex.X, v int, emit func(string, typex.X))

or

func (context.Context, k typex.X, v int) (string, typex.X, error)

are UserFns that may take one or two incoming fulltypes: either KV<X,int> or X with a singleton side input of type int. For the purpose of the shape of data processing, the two forms are equivalent. The non-data types, context.Context and error, are not part of the data signature, but in play only at runtime.

If either was bound to the input type [KV<string,int>], bind would return:

inbound:  [Main: KV<X,int>]
outbound: [KV<string,X>]
output:   [KV<string,string>]

Note that it propagates the assignment of X to string in the output type.

If either was instead bound to the input fulltypes [float, int], the result would be:

inbound:  [Main: X, Singleton: int]
outbound: [KV<string,X>]
output:   [KV<string, float>]

Here, the inbound shape and output types are different from before.

type MultiEdge Uses

type MultiEdge struct {
    Op               Opcode
    DoFn             *DoFn              // ParDo
    RestrictionCoder *coder.Coder       // SplittableParDo
    CombineFn        *CombineFn         // Combine
    AccumCoder       *coder.Coder       // Combine
    Value            []byte             // Impulse
    External         *ExternalTransform // Current External Transforms API
    Payload          *Payload           // Legacy External Transforms API
    WindowFn         *window.Fn         // WindowInto

    Input  []*Inbound
    Output []*Outbound
    // contains filtered or unexported fields
}

MultiEdge represents a primitive data processing operation. Each non-user code operation may be implemented by either the harness or the runner.

func NewCoGBK Uses

func NewCoGBK(g *Graph, s *Scope, ns []*Node) (*MultiEdge, error)

NewCoGBK inserts a new CoGBK edge into the graph.

func NewCombine Uses

func NewCombine(g *Graph, s *Scope, u *CombineFn, in *Node, ac *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error)

NewCombine inserts a new Combine edge into the graph. Combines cannot have side input.

func NewCrossLanguage Uses

func NewCrossLanguage(g *Graph, s *Scope, ext *ExternalTransform, ins []*Inbound, outs []*Outbound) (*MultiEdge, func(*Node, bool))

NewCrossLanguage inserts a Cross-langugae External transform using initialized input and output nodes

func NewExternal Uses

func NewExternal(g *Graph, s *Scope, payload *Payload, in []*Node, out []typex.FullType, bounded bool) *MultiEdge

NewExternal inserts an External transform. The system makes no assumptions about what this transform might do.

func NewFlatten Uses

func NewFlatten(g *Graph, s *Scope, in []*Node) (*MultiEdge, error)

NewFlatten inserts a new Flatten edge in the graph. Flatten output type is the shared input type.

func NewImpulse Uses

func NewImpulse(g *Graph, s *Scope, value []byte) *MultiEdge

NewImpulse inserts a new Impulse edge into the graph. It must use the built-in bytes coder.

func NewParDo Uses

func NewParDo(g *Graph, s *Scope, u *DoFn, in []*Node, rc *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error)

NewParDo inserts a new ParDo edge into the graph.

func NewReshuffle Uses

func NewReshuffle(g *Graph, s *Scope, in *Node) (*MultiEdge, error)

NewReshuffle inserts a new Reshuffle edge into the graph.

func NewWindowInto Uses

func NewWindowInto(g *Graph, s *Scope, wfn *window.Fn, in *Node) *MultiEdge

NewWindowInto inserts a new WindowInto edge into the graph.

func (*MultiEdge) ID Uses

func (e *MultiEdge) ID() int

ID returns the graph-local identifier for the edge.

func (*MultiEdge) Name Uses

func (e *MultiEdge) Name() string

Name returns a not-necessarily-unique name for the edge.

func (*MultiEdge) Scope Uses

func (e *MultiEdge) Scope() *Scope

Scope return the scope.

func (*MultiEdge) String Uses

func (e *MultiEdge) String() string

type Node Uses

type Node struct {

    // Coder defines the data encoding. It can be changed, but must be of
    // the underlying type, t.
    Coder *coder.Coder
    // contains filtered or unexported fields
}

Node is a typed connector describing the data type and encoding. A node may have multiple inbound and outbound connections. The underlying type must be a complete type, i.e., not include any type variables.

func (*Node) Bounded Uses

func (n *Node) Bounded() bool

Bounded returns true iff the collection is bounded.

func (*Node) ID Uses

func (n *Node) ID() int

ID returns the graph-local identifier for the node.

func (*Node) String Uses

func (n *Node) String() string

func (*Node) Type Uses

func (n *Node) Type() typex.FullType

Type returns the underlying full type of the data, such as KV<int,string>.

func (*Node) WindowingStrategy Uses

func (n *Node) WindowingStrategy() *window.WindowingStrategy

WindowingStrategy returns the window applied to the data.

type Opcode Uses

type Opcode string

Opcode represents a primitive Beam instruction kind.

const (
    Impulse    Opcode = "Impulse"
    ParDo      Opcode = "ParDo"
    CoGBK      Opcode = "CoGBK"
    Reshuffle  Opcode = "Reshuffle"
    External   Opcode = "External"
    Flatten    Opcode = "Flatten"
    Combine    Opcode = "Combine"
    WindowInto Opcode = "WindowInto"
)

Valid opcodes.

type Outbound Uses

type Outbound struct {
    // To is the outgoing node in the graph.
    To  *Node

    // Type is the fulltype matching the actual type used by the transform.
    // For DoFns, unlike inbound, the outbound types closely mimic the type
    // signature. For example,
    //
    //     func (ctx context.Context, emit func (key int, value typex.X)) error
    //
    // is a generic DoFn that produces one Outbound link of type KV<int,X>.
    Type typex.FullType // representation type of data
}

Outbound represents an outbound data link to a Node.

func NamedOutboundLinks(g *Graph, outs map[string]typex.FullType) (map[string]int, []*Outbound)

NamedOutboundLinks returns an array of new Outbound links and a map (tag -> index of Outbound in MultiEdge.Output) of corresponding indices with respect to their names.

func (*Outbound) String Uses

func (o *Outbound) String() string

type Payload Uses

type Payload struct {
    URN  string
    Data []byte
}

Payload represents an external payload.

type Scope Uses

type Scope struct {

    // Label is the human-visible label for this scope.
    Label string
    // Parent is the parent scope, if nested.
    Parent *Scope
    // contains filtered or unexported fields
}

Scope is a syntactic Scope, such as arising from a composite Transform. It has no semantic meaning at execution time. Used by monitoring.

func (*Scope) ID Uses

func (s *Scope) ID() int

ID returns the graph-local identifier for the scope.

func (*Scope) String Uses

func (s *Scope) String() string

type SplittableDoFn Uses

type SplittableDoFn DoFn

SplittableDoFn represents a DoFn implementing SDF methods.

func (*SplittableDoFn) CreateInitialRestrictionFn Uses

func (f *SplittableDoFn) CreateInitialRestrictionFn() *funcx.Fn

CreateInitialRestrictionFn returns the "CreateInitialRestriction" function, if present.

func (*SplittableDoFn) CreateTrackerFn Uses

func (f *SplittableDoFn) CreateTrackerFn() *funcx.Fn

CreateTrackerFn returns the "CreateTracker" function, if present.

func (*SplittableDoFn) Name Uses

func (f *SplittableDoFn) Name() string

Name returns the name of the function or struct.

func (*SplittableDoFn) RestrictionSizeFn Uses

func (f *SplittableDoFn) RestrictionSizeFn() *funcx.Fn

RestrictionSizeFn returns the "RestrictionSize" function, if present.

func (*SplittableDoFn) RestrictionT Uses

func (f *SplittableDoFn) RestrictionT() reflect.Type

RestrictionT returns the restriction type from the SDF.

func (*SplittableDoFn) SplitRestrictionFn Uses

func (f *SplittableDoFn) SplitRestrictionFn() *funcx.Fn

SplitRestrictionFn returns the "SplitRestriction" function, if present.

Directories

PathSynopsis
coderPackage coder contains coder representation and utilities.
mtimePackage mtime contains a millisecond representation of time.
windowPackage window contains window representation, windowing strategies and utilities.

Package graph imports 13 packages (graph) and is imported by 9 packages. Updated 2020-08-25. Refresh now. Tools for package owners.