README
¶
Rolling Hashes
Philosophy
This package contains several various rolling hashes for you to play with crazy ideas. The API design philosophy is to stick as closely as possible to the interface provided by the builtin hash package (the hashes implemented here are effectively drop-in replacements for their builtin counterparts), while providing simultaneously the highest speed and simplicity.
Usage
A rollinghash.Hash
is just a hash.Hash
which
implements the
Roller
interface. Here is how it is typically used:
data := []byte("here is some data to roll on")
h := buzhash64.New()
n := 16
// Initialize the rolling window
h.Write(data[:n])
for _, c := range(data[n:]) {
// Slide the window and update the hash
h.Roll(c)
// Get the updated hash value
fmt.Println(h.Sum64())
}
Gotchas
The rolling window MUST be initialized by calling Write
first (which
saves a copy). The byte leaving the rolling window is inferred from the
internal copy of the rolling window, which is updated with every call to
Roll
.
If you want your code to run at the highest speed, do NOT cast the result
of a New()
as a rollinghash.Hash. Instead, use the native type returned
by New()
. This is because the go compiler cannot inline calls from an
interface. When later you call Roll(), the native type call will be
inlined by the compiler, but not the casted type call.
var h1 rollinghash.Hash
h1 = buzhash32.New()
h2 := buzhash32.New()
[...]
h1.Roll(b) // Not inlined (slow)
h2.Roll(b) // inlined (fast)
What's new in v4
In v4:
-
Write
has become fully consistent withhash.Hash
. As opposed to previous versions, where writing data would reinitialize the window, it now appends this data to the existing window. In order to reset the window, one should instead use theReset
method. -
Calling
Roll
on an empty window is considered a bug, and now triggers a panic.
Brief reminder of the behaviors in previous versions:
-
From v0.x.x to v2.x.x:
Roll
returns an error for an empty window.Write
reinitializes the rolling window. -
v3.x.x :
Roll
does not return anything.Write
still reinitializes the rolling window. The rolling window always has a minimum size of 1, which yields wrong results when using roll before having initialized the window.
Go versions
The RabinKarp64
rollinghash does not yield consistent results before
go1.7. This is because it uses Rand.Read()
from the builtin math/rand
.
This function was fixed in go
1.7 to produce a consistent
stream of bytes that is independant of the size of the input buffer. If
you depend on this hash, it is strongly recommended to stick to versions
of go superior to 1.7.
License
This code is delivered to you under the terms of the MIT public license,
except the rabinkarp64
subpackage, which has been adapted from
restic (BSD 2-clause "Simplified").
Notable users
If you are using this in production or for research, let me know and I will happily put a link here!
Documentation
¶
Overview ¶
Package rollinghash implements rolling versions of some hashes
Index ¶
Examples ¶
Constants ¶
const DefaultWindowCap = 64
DefaultWindowCap is the default capacity of the internal window of a new Hash.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Hash ¶
rollinghash.Hash extends hash.Hash by adding the method Roll. A rollinghash.Hash can be updated byte by byte, by specifying which byte enters the window.
type Hash32 ¶
rollinghash.Hash32 extends hash.Hash by adding the method Roll. A rollinghash.Hash32 can be updated byte by byte, by specifying which byte enters the window.
type Hash64 ¶
rollinghash.Hash64 extends hash.Hash by adding the method Roll. A rollinghash.Hash64 can be updated byte by byte, by specifying which byte enters the window.
type Roller ¶
type Roller interface {
Roll(b byte)
}
A Roller is a type that has the method Roll. Roll updates the hash of a rolling window from just the entering byte. You MUST call Write() BEFORE using this method and provide it with an initial window of size at least 1 byte. You can then call this method for every new byte entering the window. The byte leaving the window is automatically computed from a copy of the window internally kept in the checksum. This window is updated along with the internal state of the checksum every time Roll() is called.