mast

package module
v1.2.28 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 29, 2024 License: MPL-2.0 Imports: 13 Imported by: 6

README

Go GoDoc

mast

immutable, versioned, diffable map implementation of the Merkle Search Tree

import "github.com/jrhy/mast"

Primary use cases

  • Strongly-consistent versioned KV store layer with flexible backends (S3, files today, designed for Dynamo, Firebase, SQL as well)
  • Provides consistent access to multiple versions of collections or materialized views, with incremental storage cost logarithmically proportional to delta size
  • Flexible change reporting through efficient diffing of snapshots

What's new?

  • v1.2.23 adds func (m *Mast) StartDiff(context.Context, oldMast *Mast) (*DiffCursor, error) for stateful iterating through differences without callbacks

Documentation

See Go package documentation at:

https://godoc.org/github.com/jrhy/mast

Documentation

Overview

Package mast provides an immutable, versioned, diffable map implementation of a Merkle Search Tree (MST). Masts can be huge (not limited to memory). Masts can be stored in anything, like a filesystem, KV store, or blob store. Masts are designed to be safely concurrently accessed by multiple threads/hosts. And like Merkle DAGs, Masts are designed to be easy to cache and synchronize.

Uses

- Efficient storage of multiple versions of materialized views

- Diffing of versions integrates CDC/streaming

- Efficient copy-on-write alternative to Go builtin map

What are MSTs

Mast is an implementation of the structure described in the awesome paper, "Merkle Search Trees: Efficient State-Based CRDTs in Open Networks", by Alex Auvolat and François Taïani, 2019 (https://hal.inria.fr/hal-02303490/document).

MSTs are similar to persistent B-Trees, except node splits are chosen deterministically based on key and tree height, rather than node size, so MSTs eliminate the dependence on insertion/deletion order that B-Trees have for equivalence, making MSTs an interesting choice for conflict-free replicated data types (CRDTs). (MSTs are also simpler to implement than B-Trees, needing less complicated rebalancing, and no rotations.)

MSTs are like other Merkle structures in that two instances can easily be compared to confirm equality or find differences, since equal node hashes indicate equal contents.

Concurrency

A Mast can be Clone()d for sharing between threads. Cloning creates a new version that can evolve independently, yet shares all the unmodified subtrees with its parent, and as such are relatively cheap to create.

Inspiration

The immutable data types in Clojure, Haskell, ML and other functional languages really do make it easier to "reason about" systems; easier to test, provide a foundation to build more quickly on.

https://github.com/bodil/im-rs, "Blazing fast immutable collection datatypes for Rust", by Bodil Stokke, is an exemplar: the diff algorithm and use of property testing, are instructive, and Chunks and PoolRefs fill gaps in understanding of Rust's ownership model for library writers coming from GC'd languages.

Index

Examples

Constants

View Source
const DefaultBranchFactor = 16

DefaultBranchFactor is how many entries per node a tree will normally have.

Variables

View Source
var (
	V1Marshaler = nodeFormat("v1marshaler")
	V115Binary  = nodeFormat("v1.1.5binary")
)
View Source
var (
	// ErrIterDone is the error returned by Iter and SeekIter to stop the iteration
	ErrIterDone = errors.New("iter done")
)
View Source
var ErrNoMoreDiffs = errors.New("no more differences")

Functions

func DefaultKeyCompare added in v1.2.13

func DefaultKeyCompare(marshaler func(interface{}) ([]byte, error)) func(i, i2 interface{}) (int, error)

func DefaultLayer added in v1.2.13

func DefaultLayer(marshaler func(interface{}) ([]byte, error)) func(i interface{}, branchFactor uint) (uint8, error)

Types

type CreateRemoteOptions

type CreateRemoteOptions struct {
	// BranchFactor, or number of entries per node.  0 means use DefaultBranchFactor.
	BranchFactor uint
	// NodeFormat, defaults to more-compact "v1.1.5binary" for new trees, can be set to "v1marshaler" to make nodes compatible with pre-v1.1.5 code.
	NodeFormat nodeFormat
}

CreateRemoteOptions sets initial parameters for the tree, that would be painful to change after the tree has data.

type Cursor added in v1.2.11

type Cursor struct {
	// contains filtered or unexported fields
}

Cursor can be used to seek around a tree.

func (*Cursor) Backward added in v1.2.11

func (c *Cursor) Backward(ctx context.Context) error

Backward moves the cursor to the entry with th enext-smaller key.

func (*Cursor) Ceil added in v1.2.11

func (c *Cursor) Ceil(ctx context.Context, key interface{}) error

Ceil moves the cursor to the entry with the given key, or if not present, the entry with the next-larger key.

func (*Cursor) Forward added in v1.2.11

func (c *Cursor) Forward(ctx context.Context) error

Forward moves the cursor to the entry with the next-larger key.

func (*Cursor) Get added in v1.2.11

func (c *Cursor) Get() (interface{}, interface{}, bool)

Get returns the key and value of the entry at the cursor, if there is an entry, or !ok if there is no entry.

func (*Cursor) Max added in v1.2.11

func (c *Cursor) Max(ctx context.Context) error

Max moves the cursor to the largest key in the subtree under the current position.

func (*Cursor) Min added in v1.2.11

func (c *Cursor) Min(ctx context.Context) error

Min moves the cursor to the smallest key in the subtree under the current position.

func (*Cursor) String added in v1.2.11

func (c *Cursor) String() string

type Diff added in v1.2.23

type Diff struct {
	Key      interface{}
	Type     DiffType
	OldValue interface{}
	NewValue interface{}
}

type DiffCursor added in v1.2.23

type DiffCursor struct {
	// contains filtered or unexported fields
}

func (*DiffCursor) NextEntry added in v1.2.23

func (dc *DiffCursor) NextEntry(ctx context.Context) (Diff, error)

type DiffType added in v1.2.23

type DiffType int
const (
	DiffType_Add DiffType = iota
	DiffType_Remove
	DiffType_Change
)

type Key

type Key interface {
	// Layer can deterministically compute its ideal layer (distance from leaves) in a tree with the given branch factor.
	Layer(branchFactor uint) uint8
	// Order returns -1 if the argument is less than than this one, 1 if greater, and 0 if equal.
	Order(Key) int
}

A Key has a sort order and deterministic maximum distance from leaves.

type Mast

type Mast struct {
	// contains filtered or unexported fields
}

Mast encapsulates data and parameters for the in-memory portion of a Merkle Search Tree.

func NewInMemory

func NewInMemory() Mast

NewInMemory returns a new tree for use as an in-memory data structure (i.e. that isn't intended to be remotely persisted).

func (*Mast) BranchFactor added in v0.5.0

func (m *Mast) BranchFactor() uint

BranchFactor returns the ideal number of entries that are stored per node.

func (*Mast) Clone added in v0.3.4

func (m *Mast) Clone(ctx context.Context) (Mast, error)

Clone() returns a new Mast that shares all the source's data but can evolve independently (copy-on-write).

func (*Mast) Cursor added in v1.2.11

func (m *Mast) Cursor(ctx context.Context) (*Cursor, error)

Cursor obtains a cursor set to the smallest value in the root node.

func (*Mast) Delete

func (m *Mast) Delete(ctx context.Context, key, value interface{}) error

Delete deletes the entry with given key and value from the tree.

func (*Mast) DiffIter

func (m *Mast) DiffIter(
	ctx context.Context,
	oldMast *Mast,
	f func(added, removed bool,
		key, addedValue, removedValue interface{},
	) (bool, error),
) error

DiffIter invokes the given callback for every entry that is different from the given tree. The iteration will stop if the callback returns keepGoing==false or an error. Callback invocation with added==removed==false signifies entries whose values have changed.

Example
ctx := context.Background()
v1 := NewInMemory()
v1.Insert(ctx, 0, "foo")
v1.Insert(ctx, 100, "asdf")
v2, err := v1.Clone(ctx)
if err != nil {
	panic(err)
}
v2.Insert(ctx, 0, "bar")
v2.Delete(ctx, 100, "asdf")
v2.Insert(ctx, 200, "qwerty")
v2.DiffIter(ctx, &v1, func(added, removed bool, key, addedValue, removedValue interface{}) (bool, error) {
	if !added && !removed {
		fmt.Printf("changed '%v'   from '%v' to '%v'\n", key, removedValue, addedValue)
	} else if removed {
		fmt.Printf("removed '%v' value '%v'\n", key, removedValue)
	} else if added {
		fmt.Printf("added   '%v' value '%v'\n", key, addedValue)
	}
	return true, nil
})
Output:

changed '0'   from 'foo' to 'bar'
removed '100' value 'asdf'
added   '200' value 'qwerty'
func (m *Mast) DiffLinks(
	ctx context.Context,
	oldMast *Mast,
	f func(removed bool, link interface{}) (bool, error),
) error

DiffLinks invokes the given callback for every node that is different from the given tree. The iteration will stop if the callback returns keepGoing==false or an error.

func (*Mast) Get

func (m *Mast) Get(ctx context.Context, k, value interface{}) (bool, error)

Get gets the value of the entry with the given key and stores it at the given value pointer. Returns false if the tree doesn't contain the given key.

func (*Mast) Height added in v0.5.0

func (m *Mast) Height() uint8

Height returns the number of levels between the leaves and root.

func (*Mast) Insert

func (m *Mast) Insert(ctx context.Context, key, value interface{}) error

Insert adds or replaces the value for the given key.

func (*Mast) IsDirty added in v0.5.0

func (m *Mast) IsDirty() bool

IsDirty signifies that in-memory values have been Set() or merged that haven't been Save()d.

func (*Mast) Iter

func (m *Mast) Iter(ctx context.Context, f func(interface{}, interface{}) error) error

Iter iterates over the entries of a tree, invoking the given callback for every entry's key and value.

func (*Mast) MakeRoot

func (m *Mast) MakeRoot(ctx context.Context) (*Root, error)

MakeRoot makes a new persistent root, after ensuring all the changed nodes have been written to the persistent store.

func (*Mast) SeekIter added in v1.1.4

func (m *Mast) SeekIter(ctx context.Context, k interface{}, f func(interface{}, interface{}) error) error

Seekiter is similar to Iter, but the difference is to find the first position greater than or equal to the key and start the iteration

func (*Mast) Size

func (m *Mast) Size() uint64

Size returns the number of entries in the tree.

Example
ctx := context.Background()
m := NewInMemory()
m.Insert(ctx, 0, "zero")
m.Insert(ctx, 1, "one")
fmt.Println(m.Size())
Output:

2

func (*Mast) StartDiff added in v1.2.23

func (m *Mast) StartDiff(
	ctx context.Context,
	oldMast *Mast,
) (*DiffCursor, error)

type Node added in v1.2.15

type Node struct {
	Key   []interface{}
	Value []interface{}
	Link  []interface{} `json:",omitempty"`
}

type NodeCache added in v0.3.2

type NodeCache interface {
	// Add adds a freshly-persisted node to the cache.
	Add(key, value interface{})
	// Contains indicates the node with the given key has already been persisted.
	Contains(key interface{}) bool
	// Get retrieves the already-deserialized node with the given hash, if cached.
	Get(key interface{}) (value interface{}, ok bool)
}

NodeCache caches the immutable nodes from a remote storage source. It is also used to avoid re-storing nodes, so care should be taken to switch/invalidate NodeCache when the Persist is changed.

func NewNodeCache added in v0.3.2

func NewNodeCache(size int) NodeCache

NewNodeCache creates a new LRU-based node cache of the given size. One cache can be shared by any number of trees.

type Persist

type Persist interface {
	// Store makes the given bytes accessible by the given name. The given string identity corresponds
	// to the content which is immutable (never modified).
	Store(context.Context, string, []byte) error
	// Load retrieves the previously-stored bytes by the given name.
	Load(context.Context, string) ([]byte, error)
	// NodeURLPrefix returns some string that identifies the
	// container this Persist uses, to enable NodeCaches to
	// distinguish identical nodes on different servers.
	NodeURLPrefix() string
}

Persist is the interface for loading and storing (serialized) tree nodes. The given string identity corresponds to the content which is immutable (never modified).

func NewInMemoryStore

func NewInMemoryStore() Persist

NewInMemoryStore provides a Persist that stores serialized nodes in a map, usually for testing.

type RemoteConfig

type RemoteConfig struct {
	// KeysLike is an instance of the type keys will be deserialized as.
	KeysLike interface{}

	// KeysLike is an instance of the type values will be deserialized as.
	ValuesLike interface{}

	// StoreImmutablePartsWith is used to store and load serialized nodes.
	StoreImmutablePartsWith Persist

	// Unmarshal function, defaults to JSON
	Unmarshal func([]byte, interface{}) error

	// Marshal function, defaults to JSON
	Marshal func(interface{}) ([]byte, error)

	// UnmarshalerUsesRegisteredTypes indicates that the unmarshaler will know how to deserialize an
	// interface{} for a key/value in an entry.  By default, JSON decoding doesn't do this, so is done
	// in two stages, the first to a JsonRawMessage, the second to the actual key/value type.
	UnmarshalerUsesRegisteredTypes bool

	// NodeCache caches deserialized nodes and may be shared across multiple trees.
	NodeCache NodeCache

	KeyCompare func(_, _ interface{}) (int, error)
}

RemoteConfig controls how nodes are persisted and loaded.

type Root

type Root struct {
	Link         *string
	Size         uint64
	Height       uint8
	BranchFactor uint
	NodeFormat   string `json:"NodeFormat,omitempty"`
}

Root identifies a version of a tree whose nodes are accessible in the persistent store.

func NewRoot

func NewRoot(remoteOptions *CreateRemoteOptions) *Root

NewRoot creates an empty tree whose nodes will be persisted remotely according to remoteOptions.

func (*Root) LoadMast

func (r *Root) LoadMast(ctx context.Context, config *RemoteConfig) (*Mast, error)

LoadMast loads a tree from a remote store. The root is loaded and verified; other nodes will be loaded on demand.

Directories

Path Synopsis
persist
s3
proto_test
v1

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL