rabin

package

v0.0.0-...-d0b643e Latest Latest Go to latest Published: Sep 11, 2017 License: BSD-3-Clause Imports: 4 Imported by: 8

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/aclements/go-rabin

Links

Open Source Insights

Documentation ¶

Overview ¶

Package rabin implements Rabin hashing (fingerprinting).

A given Rabin hash function is defined by a polynomial over GF(2):

p(x) = ... + p₂x² + p₁x + p₀   where pₙ ∈ GF(2)

The message to be hashed is likewise interpreted as a polynomial over GF(2), where the coefficients are the bits of the message in left-to-right most-significant-bit-first order. Given a message polynomial m(x) and a hashing polynomial p(x), the Rabin hash is simply the coefficients of m(x) mod p(x).

Rabin hashing has the unusual property that it can efficiently compute a "rolling hash" of a stream of data, where the hash value reflects only the most recent w bytes of the stream, for some window size w. This property makes it ideal for "content-defined chunking", which sub-divides sequential data on boundaries that are robust to insertions and deletions.

The details of Rabin hashing are described in Rabin, Michael (1981). "Fingerprinting by Random Polynomials." Center for Research in Computing Technology, Harvard University. Tech Report TR-CSE-03-01.

Index ¶

Constants
type Chunker
- func NewChunker(table *Table, r io.Reader, minBytes, avgBytes, maxBytes int) *Chunker
- func (c *Chunker) Next() (int, error)
- func (c *Chunker) Reset()
type Discarder
type Hash
- func New(table *Table) *Hash
type Table
- func NewTable(polynomial uint64, window int) *Table

Constants ¶

View Source

const Poly64 = 0xbfe6b8a5bf378d83

Poly64 is an 64-bit (degree 63) irreducible polynomial over GF(2).

This is a convenient polynomial to use for computing 64-bit Rabin hashes.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Chunker ¶

type Chunker struct {
	// contains filtered or unexported fields
}

A Chunker performs content-defined chunking. It divides a sequence of bytes into chunks such that insertions and deletions in the sequence will only affect chunk boundaries near those modifications.

func NewChunker ¶

func NewChunker(table *Table, r io.Reader, minBytes, avgBytes, maxBytes int) *Chunker

NewChunker returns a content-defined chunker for data read from r using the Rabin hash defined by table. The chunks produced by this Chunker will be at least minBytes and at most maxBytes large and will, on average, be avgBytes large.

The Chunker buffers data from the Reader internally, so the Reader need not buffer itself. The caller may seek the reader, but if it does, it must only seek to a known chunk boundary and it must call Reset on the Chunker.

If the Reader additionally implements Discarder, the Chunker will use this to skip over bytes more efficiently.

The hash function defined by table must have a non-zero window size.

minBytes must be >= the window size. This ensures that chunk boundary n+1 does not depend on data from before chunk boundary n.

avgBytes must be a power of two.

func (*Chunker) Next ¶

func (c *Chunker) Next() (int, error)

Next returns the length in bytes of the next chunk. If there are no more chunks, it returns 0, io.EOF. If the underlying Reader returns some other error, it passes that error on to the caller.

func (*Chunker) Reset ¶

func (c *Chunker) Reset()

Reset resets c and clears its internal buffer. The caller must ensure that the underlying Reader is at a chunk boundary when calling Reset.

This is useful if the caller has knowledge of where an already-chunked stream is being modified. It can start at the chunk boundary before the modified point and re-chunk the stream until a new chunk boundary lines up with a boundary in the previous version of the stream.

type Discarder ¶

type Discarder interface {
	// Discard skips the next n bytes, returning the number of
	// bytes discarded.
	//
	// If Discard skips fewer than n bytes, it also returns an
	// error. Discard must not skip beyond the end of the file.
	Discard(n int) (discarded int, err error)
}

A Discarder supports discarding bytes from an input stream.

type Hash ¶

type Hash struct {
	// contains filtered or unexported fields
}

Hash computes Rabin hashes (often called fingerprints).

Hash implements hash.Hash64.

func New ¶

func New(table *Table) *Hash

New returns a new Rabin hash using the polynomial and window size represented by table.

func (*Hash) BlockSize ¶

func (h *Hash) BlockSize() int

BlockSize returns the window size if a window is configured, and otherwise returns 1.

This satisfies the hash.Hash interface and indicates that Write is most efficient if writes are a multiple of the returned size.

func (*Hash) Reset ¶

func (h *Hash) Reset()

Reset resets h to its initial state.

func (*Hash) Size ¶

func (h *Hash) Size() int

Size returns the number of bytes Sum will append. This is the minimum number of bytes necessary to represent the hash.

func (*Hash) Sum ¶

func (h *Hash) Sum(b []byte) []byte

Sum appends the least-significant byte first representation of the current hash to b and returns the resulting slice.

func (*Hash) Sum64 ¶

func (h *Hash) Sum64() uint64

Sum64 returns the hash of all bytes written to h.

func (*Hash) Write ¶

func (h *Hash) Write(p []byte) (n int, err error)

Write adds p to the running hash h.

If h is windowed, this may also expire previously written bytes from the running hash so that h represents the hash of only the most recently written window bytes.

It always returns len(p), nil.

type Table ¶

type Table struct {
	// contains filtered or unexported fields
}

Table is a set of pre-computed tables for computing Rabin fingerprints for a given polynomial and window size.

func NewTable ¶

func NewTable(polynomial uint64, window int) *Table

NewTable returns a Table for constructing Rabin hashes using the polynomial

p(x) = ... + p₂x² + p₁x + p₀   where pₙ ∈ GF(2)

where pₙ = (polynomial >> n) & 1. This polynomial must be irreducible and must have degree >= 8. The number of bits in the resulting hash values will be the same as the number of bits in polynomial.

This package defines Poly64 as a convenient 64-bit irreducible polynomial that can be used with this function.

If window > 0, hashes constructed from this Table will be rolling hash over only the most recently written window bytes of data.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL