hashmachine

package module
v0.0.0-...-6d9a4ec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 6, 2021 License: MIT Imports: 6 Imported by: 0

README

hashmachine

Hashmachine is a simple and general protocol for hashing byte strings, useful when evaluating hash proofs from a number of sources (e.g. Merkle Trees).

Hashmachine programs are intended to support use cases where data is timestamped in a verifiable log and an inclusion proof (in hashmachine format) is later stored with the data. Protocol Buffers are used to ensure hashmachine program formats can evolve while maintaining compatibility (see below).

Running hashmachine programs

Hashmachine encodes proofs as instructions to a stack machine

A Hashmachine program can be thought of logically as a function with the following signature:

func hashMachineProgram(inputs [][]byte, ops []opcodes) ([]byte, error)

Programs consist of the following opcodes:

  • PUSH_INPUT(index uint32): pushes input[index] onto the stack and mark that input as "used"; fail if no such input exists
  • PUSH_BYTES(payload []byte): push a byte string literal onto the stack; fail if no byte string is provided in the program
  • POP_N_PUSH_HASH(index uint32): pop index values from the stack, hashing each value in pop order, and pushing the hash result onto the stack; fail if the stack has insufficient values
    • This op code is useful for multi-valued top-level digests, like that of the MMR
  • POP_CHILDREN_PUSH_HASH: equivalent to POP_N_PUSH_HASH where N == metadata.branching_factor.
    • This op code is useful for hashing a set of interior nodes to produce their parent
    • Having a separate op code saves us from repetitively storing the branching factor for this common case

All operations are executed sequentially and exactly once. There is no flow control.

All inputs provided to the program must be used exactly once by a call to PUSH_INPUT. Otherwise, the program is not valid.

After completing, exactly one byte string, representing the program output, must be left on the stack. If the stack is empty or has more than one value on it, the program is invalid and execution fails.

The output value can be compared to some expected value to validate the "proof" that the program encodes.

Constructing hashmachine programs

Hashmachine programs are constructed by walking verifiable data structures to generate proofs linking some input value(s) to an expected output value.

The smallest valid hashmachine program simply hashes a single input:

Metadata{
    expected_input_count = 1
    branching_factor = 1
}
PUSH_INPUT(0)
POP_N_PUSH_HASH
Inclusion proofs

A more complex example might involve a client requesting a proof of inclusion for the value b in a binary Merkle tree summarized by o. In this tree, each non-leaf node is the hash of its two children from right to left.

      ---- o ----
    /             \
    g              n
   /  \           /  \
 /     \         /    \
 c      f       j     m
/ \    / \     / \   / \
a  b  d   e   h   i k   l

To prove inclusion of b in o, the prover would provide values a, f, and n along with instructions on how to hash them together to produce o.

Metadata{
    expected_input_count = 1
    branching_factor = 2
}
PUSH_BYTES(a)
PUSH_INPUT(0)                 // == b (as input)
POP_CHILDREN_PUSH_HASH    // hashes b, then a, pushes c
PUSH_BYTES(f)
POP_CHILDREN_PUSH_HASH    // hashes f, then c, pushes g
PUSH_BYTES(n)
POP_CHILDREN_PUSH_HASH    // hashes n, then g, pushes o

The client can then verify that the output of the above program matches o.

Intermediate (non-leaf) values can also be proven. The proof of inclusion for j in o is:

Metadata{
    expected_input_count = 1
    branching_factor = 2
}
PUSH_BYTES(g)
PUSH_INPUT(0)                 // == j (as input)
PUSH_BYTES(m)
POP_CHILDREN_PUSH_HASH    // hashes m, then j, pushes n
POP_CHILDREN_PUSH_HASH    // hashes n, then g, pushes o

Multiple input values can be proven in a single program. The proof for [b, j] in o is:

Metadata{
    expected_input_count = 2
    branching_factor = 2
}
PUSH_BYTES(a)
PUSH_INPUT(0)                 // == b (as input)
POP_CHILDREN_PUSH_HASH    // hashes b, then a, pushes c
PUSH_BYTES(f)
POP_CHILDREN_PUSH_HASH    // hashes f, then c, pushes g
PUSH_INPUT(1)                 // == j (as input)
PUSH_BYTES(m)
POP_CHILDREN_PUSH_HASH    // hashes m, then j, pushes n
POP_CHILDREN_PUSH_HASH    // hashes n, then g, pushes o

Proving multiple values reuses literals and intermediates, reducing the size of the proof. The proof for b alone had 3 literals and 7 operations, the proof for j alone had 2 literals and 5 operations, totalling 5 literals and 12 operations. The combined proof, however, only had 3 literals and 9 operations. For a given tree, combined proof lengths grow approximately O(logn) in the number of values being proven (i.e. the number of inputs).

Consistency proofs

Whereas inclusion proofs demonstrate inclusion of a value in a summary, consistency proofs demonstrate inclusion of a prior summary in a future one.

Consider the Merkle mountain range below:

      ---- o ----
    /             \
    g              n
   /  \           /  \
 /     \         /    \
 c      f       j     m      r
/ \    / \     / \   / \    /  \
a  b  d   e   h   i k   l   p  q   s

digest_1 = hash(s, r, o)

At a later time, the MMR may have the form below, with new entries shown in bracketed uppercase letters:

      ---- o ----
    /             \
    g              n           [V]
   /  \           /  \         /  \
 /     \         /    \       /    \
 c      f       j     m      r     [U]
/ \    / \     / \   / \    /  \   /  \
a  b  d   e   h   i k   l   p  q   s [T]

digest_2 = hash(V, o)

We want to demonstrate that digest_2 is consistent with digest_1. To do this, we first must recreate digest_1 using its underlying hashes and then proceed to recreate digest_2 from those and additional hashes.

Metadata{
    expected_input_count = 1
    branching_factor = 2
}
PUSH_BYTES(o)
PUSH_BYTES(r)
PUSH_BYTES(s)
PEAK_N_PUSH_HASH(3)       // hashes s, r, o (left on stack), pushes digest_1
MATCH_INPUT(0)            // pop digest_1, match it with input 0

PUSH_BYTES(T)
POP_CHILDREN_PUSH_HASH    // hashes T, then s, pushes U
POP_CHILDREN_PUSH_HASH    // hashes U, then r, pushes V
POP_N_PUSH_HASH(2)        // hashes V, then o, pushes digest_2

The top of the stack can then be compared with digest_2 to complete the proof.

TODO: Support data entries within the tree nodes. Hash children plus data? Make parent?

TODO: Generalize Execute the program with inputs as well as outputs. Each input must be pushed onto the stack exactly once and each output must be matched with a MATCH_OUTPUT(i) opcode exactly once (can drop expected_input_count).

Compatibility

Hashmachine is currently pre-alpha. The hashmachine format and semantics are not stable

When hashmachine reaches 1.0, the format will be considered stable. This means programs encoded in hashmachine are guaranteed to verify or fail to verify consistently over time.

Caveats

Reusable sponges

Modern hashes like Keccak/SHA-3 use a sponge construction whereby reading output from the hash modifies internal hash state, and where output can be read from the hash multiple times.

While hashmachine can make use of hashes relying on sponge construction, hashmachine itself uses them as though they were legacy non-sponge hashes. That is, hashmachine opcodes can only be used to create programs that write to a hash function multiple times and perform a single final read from the hash before the internal hash state is discarded. This is in keeping with the use case of verifiable data structure proofs: hash state is not used by these data strucutures after a single parent node hash is read from them.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	HashFunctionOutputLength_name = map[int32]string{
		0: "HASHFUNCTIONOUTPUTLENGTH_UNKNOWN",
		1: "HASHFUNCTIONOUTPUTLENGTH_FIXED",
		2: "HASHFUNCTIONOUTPUTLENGTH_VARIABLE",
	}
	HashFunctionOutputLength_value = map[string]int32{
		"HASHFUNCTIONOUTPUTLENGTH_UNKNOWN":  0,
		"HASHFUNCTIONOUTPUTLENGTH_FIXED":    1,
		"HASHFUNCTIONOUTPUTLENGTH_VARIABLE": 2,
	}
)

Enum value maps for HashFunctionOutputLength.

View Source
var (
	HashFunction_name = map[int32]string{
		0: "HASHFUNCTION_UNKNOWN",
		1: "HASHFUNCTION_SHA_256",
		2: "HASHFUNCTION_SHA3_512",
	}
	HashFunction_value = map[string]int32{
		"HASHFUNCTION_UNKNOWN":  0,
		"HASHFUNCTION_SHA_256":  1,
		"HASHFUNCTION_SHA3_512": 2,
	}
)

Enum value maps for HashFunction.

View Source
var (
	OpCode_name = map[int32]string{
		0: "OPCODE_UNKNOWN",
		1: "OPCODE_INVALID",
		2: "OPCODE_PUSH_INPUT",
		3: "OPCODE_PUSH_BYTES",
		4: "OPCODE_POP_CHILDREN_PUSH_HASH",
		5: "OPCODE_POP_N_PUSH_HASH",
		6: "OPCODE_PEAK_N_PUSH_HASH",
		7: "OPCODE_MATCH_INPUT",
	}
	OpCode_value = map[string]int32{
		"OPCODE_UNKNOWN":                0,
		"OPCODE_INVALID":                1,
		"OPCODE_PUSH_INPUT":             2,
		"OPCODE_PUSH_BYTES":             3,
		"OPCODE_POP_CHILDREN_PUSH_HASH": 4,
		"OPCODE_POP_N_PUSH_HASH":        5,
		"OPCODE_PEAK_N_PUSH_HASH":       6,
		"OPCODE_MATCH_INPUT":            7,
	}
)

Enum value maps for OpCode.

View Source
var (
	// optional hashmachine.HashFunctionOutputLength output_length = 54435;
	E_OutputLength = &file_hashmachine_proto_extTypes[0]
)

Extension fields to descriptor.EnumValueOptions.

View Source
var File_hashmachine_proto protoreflect.FileDescriptor

Functions

This section is empty.

Types

type HashConfig

type HashConfig struct {
	HashFunction HashFunction `` /* 128-byte string literal not displayed */
	// hash_output_length_bytes is the number of bytes to read from a variable-
	// length hash function.
	//
	// For hash functions with variable-length outputs, this field must be set
	// and must be greater than zero, otherwise the program is invalid.
	//
	// For hashes with fixed-length outputs, this must not be set or set to
	// zero, otherwise the program is invalid.
	HashOutputLengthBytes uint32 `` /* 129-byte string literal not displayed */
	// contains filtered or unexported fields
}

HashConfig specifies the configuration for hashing operations used in verifying the hashmachine program.

func (*HashConfig) Descriptor deprecated

func (*HashConfig) Descriptor() ([]byte, []int)

Deprecated: Use HashConfig.ProtoReflect.Descriptor instead.

func (*HashConfig) GetHashFunction

func (x *HashConfig) GetHashFunction() HashFunction

func (*HashConfig) GetHashOutputLengthBytes

func (x *HashConfig) GetHashOutputLengthBytes() uint32

func (*HashConfig) ProtoMessage

func (*HashConfig) ProtoMessage()

func (*HashConfig) ProtoReflect

func (x *HashConfig) ProtoReflect() protoreflect.Message

func (*HashConfig) Reset

func (x *HashConfig) Reset()

func (*HashConfig) String

func (x *HashConfig) String() string

type HashFunction

type HashFunction int32

HashFunction specifies the hash function to use when evaluating the hashmachine program.

const (
	HashFunction_HASHFUNCTION_UNKNOWN  HashFunction = 0
	HashFunction_HASHFUNCTION_SHA_256  HashFunction = 1
	HashFunction_HASHFUNCTION_SHA3_512 HashFunction = 2
)

func (HashFunction) Descriptor

func (HashFunction) Enum

func (x HashFunction) Enum() *HashFunction

func (HashFunction) EnumDescriptor deprecated

func (HashFunction) EnumDescriptor() ([]byte, []int)

Deprecated: Use HashFunction.Descriptor instead.

func (HashFunction) Number

func (HashFunction) String

func (x HashFunction) String() string

func (HashFunction) Type

type HashFunctionOutputLength

type HashFunctionOutputLength int32

HashFunctionOutputLength describes whether a HashFunction has fixed or variable length output.

Programs using a HashFunction with variable length output must also set hash_output_length_bytes in their HashConfig.

const (
	HashFunctionOutputLength_HASHFUNCTIONOUTPUTLENGTH_UNKNOWN  HashFunctionOutputLength = 0
	HashFunctionOutputLength_HASHFUNCTIONOUTPUTLENGTH_FIXED    HashFunctionOutputLength = 1
	HashFunctionOutputLength_HASHFUNCTIONOUTPUTLENGTH_VARIABLE HashFunctionOutputLength = 2
)

func (HashFunctionOutputLength) Descriptor

func (HashFunctionOutputLength) Enum

func (HashFunctionOutputLength) EnumDescriptor deprecated

func (HashFunctionOutputLength) EnumDescriptor() ([]byte, []int)

Deprecated: Use HashFunctionOutputLength.Descriptor instead.

func (HashFunctionOutputLength) Number

func (HashFunctionOutputLength) String

func (x HashFunctionOutputLength) String() string

func (HashFunctionOutputLength) Type

type Op

type Op struct {
	Opcode OpCode `protobuf:"varint,1,opt,name=opcode,proto3,enum=hashmachine.OpCode" json:"opcode,omitempty"`
	// Parameters used by some opcodes
	Index   uint64 `protobuf:"varint,2,opt,name=index,proto3" json:"index,omitempty"`
	Payload []byte `protobuf:"bytes,3,opt,name=payload,proto3" json:"payload,omitempty"`
	// contains filtered or unexported fields
}

Op represents a single operation in the hashmachine program. An Op can be evaluated using its opcode and parameters as well as the current stack of the hashmachine.

func (*Op) Descriptor deprecated

func (*Op) Descriptor() ([]byte, []int)

Deprecated: Use Op.ProtoReflect.Descriptor instead.

func (*Op) GetIndex

func (x *Op) GetIndex() uint64

func (*Op) GetOpcode

func (x *Op) GetOpcode() OpCode

func (*Op) GetPayload

func (x *Op) GetPayload() []byte

func (*Op) ProtoMessage

func (*Op) ProtoMessage()

func (*Op) ProtoReflect

func (x *Op) ProtoReflect() protoreflect.Message

func (*Op) Reset

func (x *Op) Reset()

func (*Op) String

func (x *Op) String() string

type OpCode

type OpCode int32

OpCode identifies the operation to be performed.

const (
	OpCode_OPCODE_UNKNOWN OpCode = 0
	// OPCODE_INVALID invalidates a program if it appears anywhere in the list
	// of opcodes.
	//
	// Implementations should not check for this opcode and should instead use
	// the opcode for testing handling of unknown opcodes.
	//
	// OPCODE_INVALID is distinct from OPCODE_UNKNOWN. OPCODE_UNKNOWN is the
	// default value and represents an opcode that has not been set, rather than
	// an opcode that is set but to an invalid or unrecognized value.
	OpCode_OPCODE_INVALID OpCode = 1
	// OPCODE_PUSH_INPUT pushes the input value at 'index' onto the stack.
	//
	// The program is invalid if there is no input at 'index'.
	OpCode_OPCODE_PUSH_INPUT OpCode = 2
	// OPCODE_PUSH_BYTES pushes 'payload' onto the stack.
	//
	// The program is invalid if there is no 'payload'.
	OpCode_OPCODE_PUSH_BYTES OpCode = 3
	// OPCODE_POP_CHILDREN_PUSH_HASH pops metadata.branchingfactor values
	// from the stack, hashing each in pop order, gets the hash sum and pushes
	// it onto the stack. This is equivalent to OPCODE_POP_N_PUSH_HASH where
	// N == medatadata.branchingfactor.
	//
	// OPCODE_POP_CHILDREN_PUSH_HASH saves from repeatedly encoding a count
	// of values to pop for the common case where we are popping a fixed number
	// of children to produce a parent in a tree of hashes.
	//
	// The program is invalid if the stack underflows.
	OpCode_OPCODE_POP_CHILDREN_PUSH_HASH OpCode = 4
	// OPCODE_POP_N_PUSH_HASH pops 'index' values from the stack, hashing
	// each in pop order, gets the hash sum and pushes it onto the stack.
	//
	// OPCODE_POP_N_PUSH_HASH ignores metadata.branchingfactor. This opcode
	// is useful when hashing variable numbers of inputs such as when creating
	// a digest of peaks of a growing Merkle Mountain Range.
	OpCode_OPCODE_POP_N_PUSH_HASH OpCode = 5
	// OPCODE_PEAK_N_PUSH_HASH peaks 'index' values from the stack, hashing
	// each in pop order, gets the hash sum and pushes it onto the stack.
	//
	// OPCODE_PEAK_N_PUSH_HASH ignores metadata.branchingfactor. This opcode
	// is useful when hashing variable numbers of inputs such as when recreating
	// an interim digest of peaks while consistency checking a Merkle Mountain
	// Range.
	OpCode_OPCODE_PEAK_N_PUSH_HASH OpCode = 6
	// OPCODE_MATCH_INPUT pops the top value of the stack and compares it with
	// the input indentified by 'index'. If the values match, the program
	// proceeds. If the values do not match, the program fails verification.
	OpCode_OPCODE_MATCH_INPUT OpCode = 7
)

func (OpCode) Descriptor

func (OpCode) Descriptor() protoreflect.EnumDescriptor

func (OpCode) Enum

func (x OpCode) Enum() *OpCode

func (OpCode) EnumDescriptor deprecated

func (OpCode) EnumDescriptor() ([]byte, []int)

Deprecated: Use OpCode.Descriptor instead.

func (OpCode) Number

func (x OpCode) Number() protoreflect.EnumNumber

func (OpCode) String

func (x OpCode) String() string

func (OpCode) Type

func (OpCode) Type() protoreflect.EnumType

type Program

type Program struct {
	Metadata *ProgramMetadata `protobuf:"bytes,1,opt,name=metadata,proto3" json:"metadata,omitempty"`
	Ops      []*Op            `protobuf:"bytes,2,rep,name=ops,proto3" json:"ops,omitempty"`
	// contains filtered or unexported fields
}

func (*Program) Descriptor deprecated

func (*Program) Descriptor() ([]byte, []int)

Deprecated: Use Program.ProtoReflect.Descriptor instead.

func (*Program) GetMetadata

func (x *Program) GetMetadata() *ProgramMetadata

func (*Program) GetOps

func (x *Program) GetOps() []*Op

func (*Program) ProtoMessage

func (*Program) ProtoMessage()

func (*Program) ProtoReflect

func (x *Program) ProtoReflect() protoreflect.Message

func (*Program) Reset

func (x *Program) Reset()

func (*Program) String

func (x *Program) String() string

type ProgramMetadata

type ProgramMetadata struct {
	HashConfig *HashConfig `protobuf:"bytes,1,opt,name=hash_config,json=hashConfig,proto3" json:"hash_config,omitempty"`
	// expected_input_count is the number of byte strings the program expects
	// to be provided when the program is executed. Executions with a different
	// number of byte strings as inputs (regardless of whether they are used)
	// must fail.
	ExpectedInputCount uint32 `protobuf:"varint,3,opt,name=expected_input_count,json=expectedInputCount,proto3" json:"expected_input_count,omitempty"`
	// branching_factor is the branching factor of the tree of hashes used to
	// create the program. If the program uses OPCODE_POP_CHILDREN_PUSH_HASH
	// then branching_factor must be set and non-zero, otherwise the program is
	// invalid. If the program does not use OPCODE_POP_CHILDREN_PUSH_HASH,
	// then branching_factor is ignored.
	BranchingFactor uint32 `protobuf:"varint,4,opt,name=branching_factor,json=branchingFactor,proto3" json:"branching_factor,omitempty"`
	// contains filtered or unexported fields
}

ProgramMetadata provides metadata to verify and execute the hashmachine program.

func (*ProgramMetadata) Descriptor deprecated

func (*ProgramMetadata) Descriptor() ([]byte, []int)

Deprecated: Use ProgramMetadata.ProtoReflect.Descriptor instead.

func (*ProgramMetadata) GetBranchingFactor

func (x *ProgramMetadata) GetBranchingFactor() uint32

func (*ProgramMetadata) GetExpectedInputCount

func (x *ProgramMetadata) GetExpectedInputCount() uint32

func (*ProgramMetadata) GetHashConfig

func (x *ProgramMetadata) GetHashConfig() *HashConfig

func (*ProgramMetadata) ProtoMessage

func (*ProgramMetadata) ProtoMessage()

func (*ProgramMetadata) ProtoReflect

func (x *ProgramMetadata) ProtoReflect() protoreflect.Message

func (*ProgramMetadata) Reset

func (x *ProgramMetadata) Reset()

func (*ProgramMetadata) String

func (x *ProgramMetadata) String() string

Directories

Path Synopsis
cmd
pkg
hm
Package hm provides the implementation of the hash machine.
Package hm provides the implementation of the hash machine.
oncehash
package onceHash provides an interface for a hash that can only be used once.
package onceHash provides an interface for a hash that can only be used once.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL