The highest tagged major version is v3.

hamt

package module

v2.0.0 Latest Latest Go to latest Published: Sep 8, 2020 License: MIT Imports: 12 Imported by: 8

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/filecoin-project/go-hamt-ipld

Links

Open Source Insights

README ¶

go-hamt-ipld

This package is a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin.

HAMT is a "hash array mapped trie". This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID).

See https://godoc.org/github.com/filecoin-project/go-hamt-ipld for more information and API details.

License

MIT © Whyrusleeping

Documentation ¶

Overview ¶

Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin.

HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID).

Algorithm Overview ¶

The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive.

Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order.

Parameters ¶

This HAMT implementation:

• Fixes the `bucketSize` to 3.

• Defaults the `bitWidth` to 8, however within Filecoin it uses 5

• Defaults the hash algorithm to the 64-bit variant of Murmur3-x64

Index ¶

Variables
type KV
- func (t *KV) MarshalCBOR(w io.Writer) error
- func (t *KV) UnmarshalCBOR(r io.Reader) error
type Node
- func LoadNode(ctx context.Context, cs cbor.IpldStore, c cid.Cid, options ...Option) (*Node, error)
- func NewNode(cs cbor.IpldStore, options ...Option) *Node
type Option
- func UseHashFunction(hash func([]byte) []byte) Option
- func UseTreeBitWidth(bitWidth int) Option
type Pointer
- func (t *Pointer) MarshalCBOR(w io.Writer) error
- func (t *Pointer) UnmarshalCBOR(br io.Reader) error

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrMalformedHamt = fmt.Errorf("HAMT node was malformed")

ErrMalformedHamt is returned whenever a block intended as a HAMT node does not conform to the expected form that a block may take. This can occur during block-load where initial validation takes place or during traversal where certain conditions are expected to be met.

View Source

var ErrMaxDepth = fmt.Errorf("attempted to traverse HAMT beyond max-depth")

ErrMaxDepth is returned when the HAMT spans further than the hash function is capable of representing. This can occur when sufficient hash collisions (e.g. from a weak hash function and attacker-provided keys) extend leaf nodes beyond the number of bits that a hash can represent. Or this can occur on extremely large (likely impractical) HAMTs that are unable to be represented with the hash function used. Hash functions with larger byte output increase the maximum theoretical depth of a HAMT.

View Source

var ErrNotFound = fmt.Errorf("not found")

ErrNotFound is returned when a Find operation fails to locate the specified key in the HAMT

Functions ¶

This section is empty.

Types ¶

type KV ¶

type KV struct {
	Key   []byte
	Value *cbg.Deferred
}

KV represents leaf storage within a HAMT node. A Pointer may hold up to `bucketSize` KV elements, where each KV contains a key and value pair stored by the user.

Keys are represented as bytes.

The IPLD Schema representation of this data structure is as follows:

type KV struct {
	key Bytes
	value Any
} representation tuple

func (*KV) MarshalCBOR ¶

func (t *KV) MarshalCBOR(w io.Writer) error

func (*KV) UnmarshalCBOR ¶

func (t *KV) UnmarshalCBOR(r io.Reader) error

type Node ¶

type Node struct {
	Bitfield *big.Int   `refmt:"bf"`
	Pointers []*Pointer `refmt:"p"`
	// contains filtered or unexported fields
}

Node is a single point in the HAMT, encoded as an IPLD tuple in DAG-CBOR of shape:

[bytes, [Pointer...]]

where 'bytes' is the big.Int#Bytes() and the Pointers array is between 1 and `2^bitWidth`.

The Bitfield provides us with a mechanism to store a compacted array of Pointers. Each bit in the Bitfield represents an element in a sparse array where `1` indicates the element is present in the Pointers array and `0` indicates it is omitted. To look-up a specific index in the Pointers array you must first make a count of the number of `1`s (popcount) up to the element you are looking for. e.g. a Bitfield of `10010110000` shows that we have a 4 element Pointers array. Indexes `[1]` and `[2]` are not present, but index `[3]` is at the second position of our Pointers array.

(Note: the `refmt` tags are ignored by cbor-gen which will generate an array type rather than map.)

The IPLD Schema representation of this data structure is as follows:

type Node struct {
	bitfield Bytes
	pointers [Pointer]
} representation tuple

func LoadNode ¶

func LoadNode(ctx context.Context, cs cbor.IpldStore, c cid.Cid, options ...Option) (*Node, error)

LoadNode loads a HAMT Node from the IpldStore and configures it according to any specified Option parameters. Where the parameters of this HAMT vary from the defaults (hash function and bitWidth), those variations _must_ be supplied here via Options otherwise the HAMT will not be readable.

Users should consider how their HAMT parameters are stored or specified along with their HAMT where the data is expected to have a long shelf-life as future users will need to know the parameters of a HAMT being loaded in order to decode it. Users should also NOT rely on the default parameters of this library to remain the defaults long-term and have strategies in place to manage variations.

func NewNode ¶

func NewNode(cs cbor.IpldStore, options ...Option) *Node

NewNode creates a new IPLD HAMT Node with the given IPLD store and any additional options (bitWidth and hash function).

This function creates a new HAMT that you can use directly and is also used internally to create child nodes.

func (*Node) Copy ¶

func (n *Node) Copy() *Node

Copy a HAMT node and all of its contents. May be useful for mutation operations where the original needs to be preserved in memory.

This operation will also recursively clone any child nodes that are attached as cached nodes.

func (*Node) Delete ¶

func (n *Node) Delete(ctx context.Context, k string) error

Delete removes an entry entirely from the HAMT structure.

This operation will result in the modification of _at least_ one IPLD block via the IpldStore. Depending on the contents of the leaf node, this operation may result in a node collapse to shrink the HAMT into its canonical form for the remaining data. For an insufficiently random collection of keys at the relevant leaf nodes such a collapse may cascade to further nodes.

func (*Node) Find ¶

func (n *Node) Find(ctx context.Context, k string, out interface{}) error

Find navigates through the HAMT structure to where key `k` should exist. If the key is not found, an ErrNotFound error is returned. If the key is found and the `out` parameter has an UnmarshalCBOR(Reader) method, the decoded value is returned. If found and the `out` parameter is `nil`, then `nil` will be returned (can be used to determine if a key exists where you don't need the value, e.g. using the HAMT as a Set).

Depending on the size of the HAMT, this method may load a large number of child nodes via the HAMT's IpldStore.

func (*Node) FindRaw ¶

func (n *Node) FindRaw(ctx context.Context, k string) ([]byte, error)

FindRaw performs the same function as Find, but returns the raw bytes found at the key's location (which may or may not be DAG-CBOR, see also SetRaw).

func (*Node) Flush ¶

func (n *Node) Flush(ctx context.Context) error

Flush saves and purges any cached Nodes recursively from this Node through its (cached) children. Cached nodes primarily exist through the use of Copy() operations where the entire graph is instantiated in memory and each child pointer exists in cached form.

func (*Node) ForEach ¶

func (n *Node) ForEach(ctx context.Context, f func(k string, val interface{}) error) error

ForEach recursively calls function f on each k / val pair found in the HAMT. This performs a full traversal of the graph and for large HAMTs can cause a large number of loads from the IpldStore. This should not be used lightly as it can incur large costs.

func (*Node) MarshalCBOR ¶

func (t *Node) MarshalCBOR(w io.Writer) error

func (*Node) Set ¶

func (n *Node) Set(ctx context.Context, k string, v interface{}) error

Set key k to value v, where v is has a MarshalCBOR(bytes.Buffer) method to encode it.

func (*Node) SetRaw ¶

func (n *Node) SetRaw(ctx context.Context, k string, raw []byte) error

SetRaw is similar to Set but sets key k in the HAMT to raw bytes without performing a DAG-CBOR marshal. The bytes may or may not be encoded DAG-CBOR (see also FindRaw for fetching raw form).

func (*Node) UnmarshalCBOR ¶

func (t *Node) UnmarshalCBOR(r io.Reader) error

type Option ¶

type Option func(*Node)

Option is a function that configures the node

See UseTreeBitWidth and UseHashFunction

func UseHashFunction ¶

func UseHashFunction(hash func([]byte) []byte) Option

UseHashFunction allows you to set the hash function used for internal indexing by the HAMT.

Passing in the returned Option to NewNode will generate a new HAMT that uses the specified hash function.

The default hash function is murmur3-x64 but you should use a cryptographically secure function such as SHA2-256 if an attacker may be able to pick the keys in order to avoid potential hash collision (tree explosion) attacks.

func UseTreeBitWidth ¶

func UseTreeBitWidth(bitWidth int) Option

UseTreeBitWidth allows you to set a custom bitWidth of the HAMT in bits (from 1-8).

Passing in the returned Option to NewNode will generate a new HAMT that uses the specified bitWidth.

The default bitWidth is 8.

type Pointer ¶

type Pointer struct {
	KVs  []*KV   `refmt:"v,omitempty"`
	Link cid.Cid `refmt:"l,omitempty"`
	// contains filtered or unexported fields
}

Pointer is an element in a HAMT node's Pointers array, encoded as an IPLD tuple in DAG-CBOR of shape:

{"0": CID} or {"1": [KV...]}

Where a map with a single key of "0" contains a Link, where a map with a single key of "1" contains a KV bucket. The map may contain only one of these two possible keys.

There are between 1 and 2^bitWidth of these Pointers in any HAMT node.

A Pointer contains either a KV bucket of up to `bucketSize` (3) values or a link (CID) to a child node. When a KV bucket overflows beyond `bucketSize`, the bucket is replaced with a link to a newly created HAMT node which will contain the `bucketSize+1` elements in its own Pointers array.

(Note: the `refmt` tags are ignored by cbor-gen which will generate an array type rather than map.)

The IPLD Schema representation of this data structure is as follows:

type Pointer union {
	&Node "0"
	Bucket "1"
} representation keyed

type Bucket [KV]

func (*Pointer) MarshalCBOR ¶

func (t *Pointer) MarshalCBOR(w io.Writer) error

func (*Pointer) UnmarshalCBOR ¶

func (t *Pointer) UnmarshalCBOR(br io.Reader) error

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
gen

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

go-hamt-ipld

License

Documentation ¶

Overview ¶

Algorithm Overview ¶

Parameters ¶

Further Reading ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type KV ¶

func (*KV) MarshalCBOR ¶

func (*KV) UnmarshalCBOR ¶

type Node ¶

func LoadNode ¶

func NewNode ¶

func (*Node) Copy ¶

func (*Node) Delete ¶

func (*Node) Find ¶

func (*Node) FindRaw ¶

func (*Node) Flush ¶

func (*Node) ForEach ¶

func (*Node) MarshalCBOR ¶

func (*Node) Set ¶

func (*Node) SetRaw ¶

func (*Node) UnmarshalCBOR ¶

type Option ¶

func UseHashFunction ¶

func UseTreeBitWidth ¶

type Pointer ¶

func (*Pointer) MarshalCBOR ¶

func (*Pointer) UnmarshalCBOR ¶

Source Files ¶

Directories ¶