traversal

package
v0.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 10, 2023 License: MIT Imports: 8 Imported by: 123

Documentation

Overview

Package traversal provides functional utilities for traversing and transforming IPLD graphs.

Two primary types of traversal are implemented in this package: "Focus" and "Walk". Both types have a "Transforming" variant, which supports mutation through emulated copy-on-write tree rebuilding.

Traversal operations use the Progress type for configuration and state tracking. Helper functions such as Focus and Walk exist to avoid manual setup of a Progress struct, but they cannot cross link boundaries without a LinkSystem, which needs to be configured on the Progress struct.

A typical traversal operation involves creating a Progress struct, setting up the LinkSystem, and calling one of the Focus or Walk functions on the Progress object. Various other configuration options are available when traversing this way.

Focus

"Focus" and "Get" functions provide syntactic sugar for using ipld.Path to access Nodes deep within a graph.

"FocusedTransform" resembles "Focus" but supports user-defined mutation using its TransformFn.

Walk

"Walk" functions perform a recursive walk of a Node graph, applying visitor functions to matched parts of the graph.

The selector sub-package offers a declarative mechanism for guiding traversals and filtering relevant Nodes. (Refer to the selector sub-package for more details.)

"WalkLocal" is a special case of Walk that doesn't require a selector. It walks a local graph, not crossing link boundaries, and calls its VisitFn for each encountered Node.

"WalkMatching" traverses according to a selector, calling the VisitFn for each match based on the selector's matching rules.

"WalkAdv" performs the same traversal as WalkMatching, but calls its AdvVisitFn on every Node, regardless of whether it matches the selector.

"WalkTransforming" resembles "WalkMatching" but supports user-defined mutation using its TransformFn.

Usage Notes

These functions work via callbacks, performing traversal and calling a user-provided function with a handle to the reached Node(s). Further "Focus" and "Walk" operations can be performed recursively within this callback if desired.

All traversal functions operate on a Progress object, except "WalkLocal", which can be configured with a LinkSystem for automatic resolution and loading of new Node trees when IPLD Links are encountered.

The "*Transform" methods are best suited for point-mutation patterns. For more general transformations, use the read-only systems (e.g., Focus, Traverse) and handle accumulation in the visitor functions.

A common use case for walking traversal is running a selector over a graph and noting all the blocks it uses. This is achieved by configuring a LinkSystem that can handle and observe block loads. Be aware that a selector might visit the same block multiple times during a traversal, as IPLD graphs often form "diamond patterns" with the same block referenced from multiple locations.

The LinkVisitOnlyOnce option can be used to avoid duplicate loads, but it must be used carefully with non-trivial selectors, where repeat visits of the same block may be essential for traversal or visit callbacks.

A Budget can be set at the beginning of a traversal to limit the number of Nodes and/or Links encountered before failing the traversal (with the ErrBudgetExceeded error).

The "Preloader" option provides a way to parallelize block loading in environments where block loading is a high-latency operation (such as fetching over the network). The traversal operation itself is not parallel and will proceed strictly according to path or selector order. However, a Preloader can be used to load blocks asynchronously, and prepare the LinkSystem that the traversal is using with already-loaded blocks.

A Preloader and a Budget option can be used on the same traversal, BUT the Preloader may not receive the same links that the traversal wants to load from the LinkSystem. Use with care. See notes below.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Focus

func Focus(n datamodel.Node, p datamodel.Path, fn VisitFn) error

Focus traverses a Node graph according to a path, reaches a single Node, and calls the given VisitFn on that reached node.

This function is a helper function which starts a new traversal with default configuration. It cannot cross links automatically (since this requires configuration). Use the equivalent Focus function on the Progress structure for more advanced and configurable walks.

func FocusedTransform

func FocusedTransform(n datamodel.Node, p datamodel.Path, fn TransformFn, createParents bool) (datamodel.Node, error)

FocusedTransform traverses a datamodel.Node graph, reaches a single Node, and calls the given TransformFn to decide what new node to replace the visited node with. A new Node tree will be returned (the original is unchanged).

This function is a helper function which starts a new traversal with default configuration. It cannot cross links automatically (since this requires configuration). Use the equivalent FocusedTransform function on the Progress structure for more advanced and configurable walks.

func Get added in v0.6.0

Get is the equivalent of Focus, but returns the reached node (rather than invoking a callback at the target), and does not yield Progress information.

This function is a helper function which starts a new traversal with default configuration. It cannot cross links automatically (since this requires configuration). Use the equivalent Get function on the Progress structure for more advanced and configurable walks.

func SelectLinks(n datamodel.Node) ([]datamodel.Link, error)

SelectLinks walks a Node tree and returns a slice of all Links encountered. SelectLinks will recurse down into any maps and lists, but does not attempt to load any of the links it encounters nor recurse further through them (in other words, it's confined to one "block").

SelectLinks only returns the list of links; it does not return any other information about them such as position in the tree, etc.

An error may be returned if any of the nodes returns errors during iteration; this is generally only possible if one of the Nodes is an ADL, and unable to be fully walked because of the inability to load or process some data inside the ADL. Nodes already fully in memory should not encounter such errors, and it should be safe to ignore errors from this method when used in that situation. In case of an error, a partial list will still be returned.

If an identical link is found several times during the walk, it is reported several times in the resulting list; no deduplication is performed by this method.

func WalkAdv added in v0.0.2

func WalkAdv(n datamodel.Node, s selector.Selector, fn AdvVisitFn) error

WalkAdv is identical to WalkMatching, except it is called for *all* nodes visited (not just matching nodes), together with the reason for the visit. An AdvVisitFn is used instead of a VisitFn, so that the reason can be provided.

This function is a helper function which starts a new walk with default configuration. It cannot cross links automatically (since this requires configuration). Use the equivalent WalkAdv function on the Progress structure for more advanced and configurable walks.

func WalkLocal added in v0.14.1

func WalkLocal(n datamodel.Node, fn VisitFn) error

WalkLocal walks a tree of Nodes, visiting each of them, and calling the given VisitFn on all of them; it does not traverse any links.

WalkLocal can skip subtrees if the VisitFn returns SkipMe, but lacks any other options for controlling or directing the visit; consider using some of the various Walk functions with Selector parameters if you want more control.

func WalkMatching added in v0.0.2

func WalkMatching(n datamodel.Node, s selector.Selector, fn VisitFn) error

WalkMatching walks a graph of Nodes, deciding which to visit by applying a Selector, and calling the given VisitFn on those that the Selector deems a match.

This function is a helper function which starts a new walk with default configuration. It cannot cross links automatically (since this requires configuration). Use the equivalent WalkMatching function on the Progress structure for more advanced and configurable walks.

func WalkTransforming added in v0.0.2

func WalkTransforming(n datamodel.Node, s selector.Selector, fn TransformFn) (datamodel.Node, error)

WalkTransforming walks a graph of Nodes, deciding which to alter by applying a Selector, and calls the given TransformFn to decide what new node to replace the visited node with. A new Node tree will be returned (the original is unchanged).

This function is a helper function which starts a new walk with default configuration. It cannot cross links automatically (since this requires configuration). Use the equivalent WalkTransforming function on the Progress structure for more advanced and configurable walks.

Types

type AdvVisitFn

type AdvVisitFn func(Progress, datamodel.Node, VisitReason) error

AdvVisitFn is like VisitFn, but for use with AdvTraversal: it gets additional arguments describing *why* this node is visited.

type Budget added in v0.12.3

type Budget struct {
	// NodeBudget is a monotonically-decrementing "budget" for how many more nodes we're willing to visit before halting.
	NodeBudget int64
	// LinkBudget is a monotonically-decrementing "budget" for how many more links we're willing to load before halting.
	// (This is not aware of any caching; it's purely in terms of links encountered and traversed.)
	LinkBudget int64
}

If you set any budgets (by having a non-nil Progress.Budget field), you must set some value for all of them. Traversal halts when _any_ of the budgets reaches zero. The max value of an int (math.MaxInt64) is acceptable for any budget you don't care about.

Beware of using both Budget and Preloader! See the documentation on Progress for more information on this usage and the likely surprising effects.

func (*Budget) Clone added in v0.21.0

func (b *Budget) Clone() *Budget

Clone returns a copy of the budget.

type Config added in v0.0.2

type Config struct {
	// Ctx is the context carried through a traversal.
	// Optional; use it if you need cancellation.
	Ctx context.Context

	// LinkSystem is used for automatic link loading, and also any storing if mutation features (e.g. traversal.Transform) are used.
	LinkSystem linking.LinkSystem

	// LinkTargetNodePrototypeChooser is a chooser for Node implementations to produce during automatic link traversal.
	LinkTargetNodePrototypeChooser LinkTargetNodePrototypeChooser

	// LinkVisitOnlyOnce controls repeat-link visitation.
	// By default, we visit across links wherever we see them again, even if we've visited them before, because the reason for visiting might be different than it was before since we got to it via a different path.
	// If set to true, track links we've seen before in Progress.SeenLinks and do not visit them again.
	// Note that sufficiently complex selectors may require valid revisiting of some links, so setting this to true can change behavior noticably and should be done with care.
	LinkVisitOnlyOnce bool

	// StartAtPath, if set, causes a traversal to skip forward until passing this path, and only then begins calling visit functions.
	// Block loads will also be skipped wherever possible.
	StartAtPath datamodel.Path

	// Preloader receives links within each block prior to traversal-proper by performing a lateral scan of a block without descending into links themselves before backing up and doing a traversal-proper.
	// This can be used to asynchronously load blocks that will be required at a later phase of the retrieval, or even to load blocks in a different order than the traversal would otherwise do.
	// Preload calls are not de-duplicated, it is up to the receiver to do so if desired.
	// Beware of using both Budget and Preloader!  See the documentation on Progress for more information on this usage and the likely surprising effects.
	Preloader preload.Loader
}

Config is a set of options for a traversal. Set a Config on a Progress to customize the traversal.

type ErrBudgetExceeded added in v0.12.3

type ErrBudgetExceeded struct {
	BudgetKind string // "node"|"link"
	Path       datamodel.Path
	Link       datamodel.Link // only present if BudgetKind=="link"
}

func (*ErrBudgetExceeded) Error added in v0.12.3

func (e *ErrBudgetExceeded) Error() string

func (*ErrBudgetExceeded) Is added in v0.21.0

func (e *ErrBudgetExceeded) Is(target error) bool

type LinkTargetNodePrototypeChooser added in v0.5.0

type LinkTargetNodePrototypeChooser func(datamodel.Link, linking.LinkContext) (datamodel.NodePrototype, error)

LinkTargetNodePrototypeChooser is a function that returns a NodePrototype based on the information in a Link and/or its LinkContext.

A LinkTargetNodePrototypeChooser can be used in a traversal.Config to be clear about what kind of Node implementation to use when loading a Link. In a simple example, it could constantly return a `basicnode.Prototype.Any`. In a more complex example, a program using `bind` over native Go types could decide what kind of native type is expected, and return a `bind.NodeBuilder` for that specific concrete native type.

type Progress added in v0.0.2

type Progress struct {
	// Cfg is the configuration for the traversal, set by user.
	Cfg *Config

	// Budget, if present, tracks "budgets" for how many more steps we're willing to take before we should halt.
	// Budget is initially set by user, but is then updated as the traversal proceeds.
	Budget *Budget

	// Path is how we reached the current point in the traversal.
	Path datamodel.Path

	// LastBlock stores the Path and Link of the last block edge we had to load.  (It will always be zero in traversals with no linkloader.)
	LastBlock struct {
		Path datamodel.Path
		Link datamodel.Link
	}

	// PastStartAtPath indicates whether the traversal has progressed passed the StartAtPath in the config -- use to avoid path checks when inside a sub portion of a DAG that is entirely inside the "not-skipped" portion of a traversal
	PastStartAtPath bool

	// SeenLinks is a set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true.
	SeenLinks map[datamodel.Link]struct{}
}

Progress tracks a traversal as it proceeds. It is used initially to begin a traversal, and it is then passed to the visit function as the traversal proceeds.

As the traversal descends into the graph, new Progress values are created and passed to the visit function with updated properties representing the current state of the traversal.

Most customization of a traversal is done by setting a Cfg property on a Progress before beginning the traversal. Typical customization involves setting a LinkSystem for link loading and/or tracking.

Advanced traversal control options, such as LinkVisitOnlyOnce and StartAtPath, are also available in the Cfg but may have surprising effects on traversal behavior; be careful when using them.

Budgets are set on the Progress option because a Budget, while set at the beginning of a traversal, is also updated as the traversal proceeds, with its fields being monotonically decremented. Beware of using Budgets in tandem with a Preloader! The preloader discovers links in a lateral scan of a whole block, before rewinding for a depth-first walk for traversal-proper. Budgets are intended to be used for the depth-first walk, and there is no way to know ahead of time how the budget may impact the lateral parts of the graph that the preloader encounters. Currently a best-guess approach is used to try and have the preloader adhere to the budget, but with typical real-world graphs, this is likely to be inaccurate. In the case of inaccuracies, the budget will be properly applied to the traversal-proper, but the preloader may receive a different set of links than the traversal-proper will.

func (Progress) Focus added in v0.0.2

func (prog Progress) Focus(n datamodel.Node, p datamodel.Path, fn VisitFn) error

Focus traverses a Node graph according to a path, reaches a single Node, and calls the given VisitFn on that reached node.

Focus is a read-only traversal. See FocusedTransform if looking for a way to do an "update" to a Node.

Provide configuration to this process using the Config field in the Progress object.

This walk will automatically cross links, but requires some configuration with link loading functions to do so.

Focus (and the other traversal functions) can be used again again inside the VisitFn! By using the traversal.Progress handed to the VisitFn, the Path recorded of the traversal so far will continue to be extended, and thus continued nested uses of Walk and Focus will see the fully contextualized Path.

func (Progress) FocusedTransform added in v0.0.2

func (prog Progress) FocusedTransform(n datamodel.Node, p datamodel.Path, fn TransformFn, createParents bool) (datamodel.Node, error)

FocusedTransform traverses a datamodel.Node graph, reaches a single Node, and calls the given TransformFn to decide what new node to replace the visited node with. A new Node tree will be returned (the original is unchanged).

If the TransformFn returns the same Node which it was called with, then the transform is a no-op, and the Node returned from the FocusedTransform call as a whole will also be the same as its starting Node.

Otherwise, the reached node will be "replaced" with the new Node -- meaning that new intermediate nodes will be constructed to also replace each parent Node that was traversed to get here, thus propagating the changes in a copy-on-write fashion -- and the FocusedTransform function as a whole will return a new Node containing identical children except for those replaced.

Returning nil from the TransformFn as the replacement node means "remove this".

FocusedTransform can be used again inside the applied function! This kind of composition can be useful for doing batches of updates. E.g. if have a large Node graph which contains a 100-element list, and you want to replace elements 12, 32, and 95 of that list: then you should FocusedTransform to the list first, and inside that TransformFn's body, you can replace the entire list with a new one that is composed of copies of everything but those elements -- including using more TransformFn calls as desired to produce the replacement elements if it so happens that those replacement elements are easiest to construct by regarding them as incremental updates to the previous values. (This approach can also be used when doing other modifications like insertion or reordering -- which would otherwise be tricky to define, since each operation could change the meaning of subsequently used indexes.)

As a special case, list appending is supported by using the path segment "-". (This is determined by the node it applies to -- if that path segment is applied to a map, it's just a regular map key of the string of dash.)

Note that anything you can do with the Transform function, you can also do with regular Node and NodeBuilder usage directly. Transform just does a large amount of the intermediate bookkeeping that's useful when creating new values which are partial updates to existing values.

func (Progress) Get added in v0.6.0

Get is the equivalent of Focus, but returns the reached node (rather than invoking a callback at the target), and does not yield Progress information.

Provide configuration to this process using the Config field in the Progress object.

This walk will automatically cross links, but requires some configuration with link loading functions to do so.

If doing several traversals which are nested, consider using the Focus funcion in preference to Get; the Focus functions provide updated Progress objects which can be used to do nested traversals while keeping consistent track of progress, such that continued nested uses of Walk or Focus or Get will see the fully contextualized Path.

func (Progress) WalkAdv added in v0.0.2

func (prog Progress) WalkAdv(n datamodel.Node, s selector.Selector, fn AdvVisitFn) error

WalkAdv is identical to WalkMatching, except it is called for *all* nodes visited (not just matching nodes), together with the reason for the visit. An AdvVisitFn is used instead of a VisitFn, so that the reason can be provided.

func (Progress) WalkLocal added in v0.14.1

func (prog Progress) WalkLocal(n datamodel.Node, fn VisitFn) error

WalkLocal is the same as the package-scope function of the same name, but considers an existing Progress state (and any config it might reference).

func (Progress) WalkMatching added in v0.0.2

func (prog Progress) WalkMatching(n datamodel.Node, s selector.Selector, fn VisitFn) error

WalkMatching walks a graph of Nodes, deciding which to visit by applying a Selector, and calling the given VisitFn on those that the Selector deems a match.

WalkMatching is a read-only traversal. See WalkTransforming if looking for a way to do "updates" to a tree of nodes.

Provide configuration to this process using the Config field in the Progress object.

This walk will automatically cross links, but requires some configuration with link loading functions to do so.

Traversals are defined as visiting a (node,path) tuple. This is important to note because when walking DAGs with Links, it means you may visit the same node multiple times due to having reached it via a different path. (You can prevent this by using a LinkLoader function which memoizes a set of already-visited Links, and returns a SkipMe when encountering them again.)

WalkMatching (and the other traversal functions) can be used again again inside the VisitFn! By using the traversal.Progress handed to the VisitFn, the Path recorded of the traversal so far will continue to be extended, and thus continued nested uses of Walk and Focus will see the fully contextualized Path.

WalkMatching can be configured to run with a Preloader. When a Preloader is configured, the walk will first do a "preload" pass over the initial, root tree up to link boundaries and report any links encountered to the preloader. It will then perform a second pass over the tree, calling the VisitFn where necessary as per normal WalkMatching behavior. This two-pass operation will continue for each block loaded, allowing the preloader to potentially asynchronously preload any blocks that are going to be encountered at a future point in the walk.

func (Progress) WalkTransforming added in v0.0.2

func (prog Progress) WalkTransforming(n datamodel.Node, s selector.Selector, fn TransformFn) (datamodel.Node, error)

WalkTransforming walks a graph of Nodes, deciding which to alter by applying a Selector, and calls the given TransformFn to decide what new node to replace the visited node with. A new Node tree will be returned (the original is unchanged).

If the TransformFn returns the same Node which it was called with, then the transform is a no-op; if every visited node is a no-op, then the root node returned from the walk as a whole will also be the same as its starting Node (no new memory will be used).

When a Node is replaced, no further recursion of this walk will occur on its contents. (You can certainly do a additional traversals, including transforms, from inside the TransformFn while building the replacement node.)

The prototype (that is, implementation) of Node returned will be the same as the prototype of the Nodes at the same positions in the existing tree (literally, builders used to construct any new needed intermediate nodes are chosen by asking the existing nodes about their prototype).

type SkipMe added in v0.0.3

type SkipMe struct{}

SkipMe is a signalling "error" which can be used to tell traverse to skip some data.

SkipMe can be returned by the Config.LinkLoader to skip entire blocks without aborting the walk. (This can be useful if you know you don't have data on hand, but want to continue the walk in other areas anyway; or, if you're doing a way where you know that it's valid to memoize seen areas based on Link alone.)

func (SkipMe) Error added in v0.0.3

func (SkipMe) Error() string

type TransformFn

type TransformFn func(Progress, datamodel.Node) (datamodel.Node, error)

TransformFn is like a visitor that can also return a new Node to replace the visited one.

type VisitFn

type VisitFn func(Progress, datamodel.Node) error

VisitFn is a read-only visitor.

type VisitReason added in v0.0.2

type VisitReason byte

VisitReason provides additional information to traversals using AdvVisitFn.

const (
	// VisitReason_SelectionMatch tells AdvVisitFn that this node was explicitly selected.  (This is the set of nodes that VisitFn is called for.)
	VisitReason_SelectionMatch VisitReason = 'm'
	// VisitReason_SelectionParent tells AdvVisitFn that this node is a parent of one that will be explicitly selected.  (These calls only happen if the feature is enabled -- enabling parent detection requires a different algorithm and adds some overhead.)
	VisitReason_SelectionParent VisitReason = 'p'
	// VisitReason_SelectionCandidate tells AdvVisitFn that this node was visited while searching for selection matches.  It is not necessarily implied that any explicit match will be a child of this node; only that we had to consider it.  (Merkle-proofs generally need to include any node in this group.)
	VisitReason_SelectionCandidate VisitReason = 'x'
)

Directories

Path Synopsis
Package patch provides an implementation of the IPLD Patch specification.
Package patch provides an implementation of the IPLD Patch specification.
parse
selectorparse package contains some helpful functions for parsing the serial form of Selectors.
selectorparse package contains some helpful functions for parsing the serial form of Selectors.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL