deephash

package
v0.0.0-...-113f59a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2024 License: BSD-3-Clause Imports: 11 Imported by: 0

Documentation

Overview

Package deephash hashes a Go value recursively, in a predictable order, without looping. The hash is only valid within the lifetime of a program. Users should not store the hash on disk or send it over the network. The hash is sufficiently strong and unique such that Hash(&x) == Hash(&y) is an appropriate replacement for x == y.

The definition of equality is identical to reflect.DeepEqual except:

  • Floating-point values are compared based on the raw bits, which means that NaNs (with the same bit pattern) are treated as equal.
  • time.Time are compared based on whether they are the same instant in time and also in the same zone offset. Monotonic measurements and zone names are ignored as part of the hash.
  • netip.Addr are compared based on a shallow comparison of the struct.

WARNING: This package, like most of the tailscale.com Go module, should be considered Tailscale-internal; we make no API promises.

Cycle detection

This package correctly handles cycles in the value graph, but in a way that is potentially pathological in some situations.

The algorithm for cycle detection operates by pushing a pointer onto a stack whenever deephash is visiting a pointer and popping the pointer from the stack after deephash is leaving the pointer. Before visiting a new pointer, deephash checks whether it has already been visited on the pointer stack. If so, it hashes the index of the pointer on the stack and avoids visiting the pointer.

This algorithm is guaranteed to detect cycles, but may expand pointers more often than a potential alternate algorithm that remembers all pointers ever visited in a map. The current algorithm uses O(D) memory, where D is the maximum depth of the recursion, while the alternate algorithm would use O(P) memory where P is all pointers ever seen, which can be a lot, and most of which may have nothing to do with cycles. Also, the alternate algorithm has to deal with challenges of producing deterministic results when pointers are visited in non-deterministic ways such as when iterating through a Go map. The stack-based algorithm avoids this challenge since the stack is always deterministic regardless of non-deterministic iteration order of Go maps.

To concretely see how this algorithm can be pathological, consider the following data structure:

var big *Item = ... // some large data structure that is slow to hash
var manyBig []*Item
for i := 0; i < 1000; i++ {
	manyBig = append(manyBig, &big)
}
deephash.Hash(manyBig)

Here, the manyBig data structure is not even cyclic. We have the same big *Item being stored multiple times in a []*Item. When deephash hashes []*Item, it hashes each individual *Item not realizing that it had just done the computation earlier. To avoid the pathological situation, Item should implement SelfHasher and memoize attempts to hash itself.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HasherForType

func HasherForType[T any](opts ...Option) func(*T) Sum

HasherForType returns a hash that is specialized for the provided type.

HasherForType panics if the opts are invalid for the provided type.

Currently, at most one option can be provided (IncludeFields or ExcludeFields) and its type must match the type of T. Those restrictions may be removed in the future, along with documentation about their precedence when combined.

func Update

func Update[T any](last *Sum, v *T) (changed bool)

Update sets last to the hash of v and reports whether its value changed.

Types

type Hasher

type Hasher struct {
	// contains filtered or unexported fields
}

Hasher is a value passed to [SelfHasher.Hash] that allow implementations to hash themselves in a structured manner.

func (Hasher) HashBytes

func (h Hasher) HashBytes(b []byte)

HashBytes hashes a sequence of bytes b. The length of b is not explicitly hashed.

func (Hasher) HashString

func (h Hasher) HashString(s string)

HashString hashes the string data of s The length of s is not explicitly hashed.

func (Hasher) HashSum

func (h Hasher) HashSum(s Sum)

HashSum hashes a Sum.

func (Hasher) HashUint16

func (h Hasher) HashUint16(n uint16)

HashUint16 hashes a uint16.

func (Hasher) HashUint32

func (h Hasher) HashUint32(n uint32)

HashUint32 hashes a uint32.

func (Hasher) HashUint64

func (h Hasher) HashUint64(n uint64)

HashUint64 hashes a uint64.

func (Hasher) HashUint8

func (h Hasher) HashUint8(n uint8)

HashUint8 hashes a uint8.

type Option

type Option interface {
	// contains filtered or unexported methods
}

Option is an optional argument to HasherForType.

func ExcludeFields

func ExcludeFields[T any](fields ...string) Option

ExcludeFields returns an option that modifies the hashing for T to include all struct fields of T except those provided in fields.

T must be a struct type, and must match the type of the value passed to HasherForType.

func IncludeFields

func IncludeFields[T any](fields ...string) Option

IncludeFields returns an option that modifies the hashing for T to only include the named struct fields.

T must be a struct type, and must match the type of the value passed to HasherForType.

type SelfHasher

type SelfHasher interface {
	Hash(Hasher)
}

SelfHasher is implemented by types that can compute their own hash by writing values through the provided Hasher parameter. Implementations must not leak the provided Hasher.

If the implementation of SelfHasher recursively calls deephash.Hash, then infinite recursion is quite likely to occur. To avoid this, use a type definition to drop methods before calling deephash.Hash:

func (v *MyType) Hash(h deephash.Hasher) {
	v.hashMu.Lock()
	defer v.hashMu.Unlock()
	if v.dirtyHash {
		type MyTypeWithoutMethods MyType // type define MyType to drop Hash method
		v.dirtyHash = false              // clear out dirty bit to avoid hashing over it
		v.hashSum = deephash.Sum{}       // clear out hashSum to avoid hashing over it
		v.hashSum = deephash.Hash((*MyTypeWithoutMethods)(v))
	}
	h.HashSum(v.hashSum)
}

In the above example, we acquire a lock since it is possible that deephash is called in a concurrent manner, which implies that MyType.Hash may also be called in a concurrent manner. Whether this lock is necessary is application-dependent and left as an exercise to the reader. Also, the example assumes that dirtyHash is set elsewhere by application logic whenever a mutation is made to MyType that would alter the hash.

type Sum

type Sum struct {
	// contains filtered or unexported fields
}

Sum is an opaque checksum type that is comparable.

func Hash

func Hash[T any](v *T) Sum

Hash returns the hash of v.

func (Sum) AppendTo

func (s Sum) AppendTo(b []byte) []byte

AppendTo appends the string encoding of this sum (as returned by the String method) to the provided byte slice and returns the extended buffer.

func (Sum) String

func (s Sum) String() string

Directories

Path Synopsis
Package testtype contains types for testing deephash.
Package testtype contains types for testing deephash.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL