edittree

package
v0.64.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package EditTree implements a specialized tree data structure that allows for cheap edits and modifications of nested Term structures.

Overview

The EditTree data structure exists to solve an ugly problem in Rego: modification/deletion of a Term can be very expensive, because we have to rebuild the whole Term, sans the modified/deleted parts.

To work around that problem, the EditTree allows simple add, modify, or delete operations on Term structures, and then at the end of a series of edits, the caller can pay the cost of generating a new Term from the tree of edits relatively efficiently. (Essentially a recursive, DFS traversal of the tree.)

The data structure preserves basic type/safety properties, the same as working on the real underlying Term values. To do this, recursive lookups are used. On average, these are fairly straightforward and cheap to do.

Basic Operations:

  • Insert/update EditTree node
  • Delete EditTree node
  • Unfold EditTree nodes along a path
  • Render EditTree node

These operations provide all of the basic utilities required to recursively construct the tree. Ref-based convenience functions are also provided, to make rendering subtrees at a particular JSON path easier.

Path-Based Convenience Functions:

  • InsertAtPath
  • DeleteAtPath
  • RenderAtPath

Additionally, a few "optional" (but nice to have) functions have been added, to allow replacing slower/less-efficient equivalents elsewhere.

Optional Functions:

  • Exists (a more efficient boolean alternative to Unfold)
  • Filter (an alternative to Object.Filter that efficiently renders paths out of an EditTree)

Storing scalar children "inline"

The original design for the EditTree allocated a new EditTree node for each Term stored in the tree, but this was found to be inefficient when dealing with large arrays and objects. The current design of the EditTree separates children based on their types, with scalars stored in a hash -> Term map, and composites stored in a hash -> EditTree map.

This results in dramatically fewer heap allocations and faster access times for "shallow" Term structures, without penalizing nested Term structures noticeably.

Object operations

Objects are the most straightforward composite type, as their key-value structure maps naturally onto trees. Their inserts/deletes are recorded directly in the appropriate child maps, with almost no additional complexity required.

Object EditTree nodes use the child key and value maps, and will not initialize the bit-vectors, since those are only used for Arrays.

Set operations

Set data types have a major problem: they're *content-addressed*. This means that we often have to render/materialize the sub-terms before carrying out inserts or deletes, in order to know if the path to the destination Term exists. This forces a tree collapse at the Set's EditTree node, and is brutally inefficient.

Example:

Source set: {[0], "a"}
{"op": "add", "path": [[0], 1], "value": 1}      -> result value: {[0, 1], "a"}
{"op": "add", "path": [[0, 1], 3], "value": 3}   -> result value: {[0, 1, 3], "a"}

We mitigate this somewhat by only collapsing a Set when a composite value is being used for indexing. Scalars imply a shallow access, which we can look up directly in the appropriate child map.

Set EditTree nodes use the child key and value maps, and will not initialize the bit-vectors, since those are only used for Arrays.

Array operations

Arrays can have elements inserted/replaced/deleted, and this requires some bookkeeping work to keep everything straight. We do this bookkeeping work using two bit vectors to track all the insertions/deletions.

One bit-vector tracks which indexes are preserved/eliminated from the original Array, and the second bit-vector tracks which indexes have insertions. We can record inserts and deletes *directly* on the second bit-vector, "bleeding through" deletions to the preserved/eliminated bit vector when there's not an insert to wipe out first.

For bleed-through deletes, a linear scan is required to find the index of which original element will be knocked out. We then mark that bit in the preserved/eliminated bit-vector. This is a fair bit of bookkeeping, but greatly reduces the cost and complexity of tracking Array state. There can only be insertions, or original values present. Any other "deletion" is an error.

Insert and Delete operations also imply a linear "index rewriting" pass for an Array's child maps, where indexes that occur above the affected index of the insertion/deletion must be incremented or decremented appropriately. This ensures that when rendered later, the original/inserted values will be spliced in at the correct offsets in the final Array value.

Due to optimizations discussed later, Array EditTree nodes do not use the child key map (leaving it uninitialized), but will initialize and use the child value maps normally. Array EditTree nodes are the only types of EditTree nodes that should ever be expected to have initialized bit-vectors present.

Scalar operations

Scalars are fairly simple: just a term stored in an EditTree node, or in the scalar child map of a composite type's EditTree node. They cannot have children, and normally do not exist as independent EditTree nodes, except to satisfy certain EditTree APIs.

Scalar EditTree nodes can only be expected to have a valid Term value; all other fields will be left uninitialized.

Optimization: Direct Array Indexing with ints

Arrays are unique in Rego, because the only valid Terms that can index into them are integer, numeric values. When processing the key Terms for Objects and Sets, we have to identify children by their hash values (which hash to integers). Because the only valid key Terms for Arrays work as ints as well, we can skip the hashing step entirely, and just use the int indexes *directly*.

This provides a substantial CPU savings in benchmarks, because the "index rewriting" passes become much cheaper from not having to rehash every child's index.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type EditTree

type EditTree struct {
	// contains filtered or unexported fields
}

Deletions are encoded with a nil value pointer.

func NewEditTree

func NewEditTree(term *ast.Term) *EditTree

Creates a new EditTree node from term.

func (*EditTree) Delete

func (e *EditTree) Delete(key *ast.Term) (*EditTree, error)

Delete removes a child of e, or else creates a delete node for a term already present in e. It then returns the deleted child EditTree node.

func (*EditTree) DeleteAtPath

func (e *EditTree) DeleteAtPath(path ast.Ref) (*EditTree, error)

DeleteAtPath traverses down the tree from e and uses the last path segment as the key to delete a node from the tree. Returns the deleted EditTree node.

func (*EditTree) Exists

func (e *EditTree) Exists(path ast.Ref) bool

func (*EditTree) Filter

func (e *EditTree) Filter(paths []ast.Ref) *ast.Term

Filter pulls out only the values selected by paths. This is done recursively by plucking off one level from the paths each time we descend a level. Note: Values pulled from arrays will have the same approximate ordering in the final term.

func (*EditTree) Insert

func (e *EditTree) Insert(key, value *ast.Term) (*EditTree, error)

Insert creates a new child of e, and returns the new child EditTree node.

func (*EditTree) InsertAtPath

func (e *EditTree) InsertAtPath(path ast.Ref, value *ast.Term) (*EditTree, error)

InsertAtPath traverses down the tree from e and uses the last path segment as the key to insert value into the tree. Returns the inserted EditTree node.

func (*EditTree) Render

func (e *EditTree) Render() *ast.Term

Render generates the effective value for the term at e by recursively rendering the children of e, and then copying over any leftover keys from the original term stored at e.

func (*EditTree) RenderAtPath

func (e *EditTree) RenderAtPath(path ast.Ref) (*ast.Term, error)

RenderAtPath traverses down the tree from e and renders the EditTree node at the end of path.

func (*EditTree) String

func (e *EditTree) String() string

func (*EditTree) Unfold

func (e *EditTree) Unfold(path ast.Ref) (*EditTree, error)

Unfurls a chain of EditTree nodes down a given path, or else returns an error.

Directories

Path Synopsis
Package bitvector provides the implementation of a variable sized compact vector of bits which supports lookups, sets, appends, insertions, and deletions.
Package bitvector provides the implementation of a variable sized compact vector of bits which supports lookups, sets, appends, insertions, and deletions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL