rsqf

package module

v0.0.0-...-f4eb522 Latest Latest Go to latest Published: Dec 19, 2017 License: BSD-3-Clause Imports: 3 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/nfisher/rsqf

Links

Open Source Insights

README ¶

Rank and Seek Quotient Filter

A general purpose counting filter: making every bit count by Pandey et al., SIGMOD’17

A Rank and Seek Quotient Filter (RSQF) is an Approximate Membership Query data structure. It is similar to the popular Bloom Filter (BF) however where a BF only provides insert, and lookup.

An RSQF provides:

insert
delete
lookup
resize¹
merge

1 - resize does not permit an increase in p (e.g. unique identity capacity). It does however increase the total number of available slots. This provides two benefits:

Faster queries as the remainders can be aligned closer to their home slot with shorter runs.
Increase in the number of entries allowing for deletion.

Overview

This implementation of RSQF has a fixed error rate of 1/512 or ~0.2%. This was done so that each block is ensured to be contiguous in memory.

r is therefore fixed at 9 (r = log2(1/0.001953)). q will vary relative to p (e.g. for 1m entries p = log2(1,000,000/0.001953)).

A filter that accepts 1 million entries will require ~11.67 bits per entry and require approximately 1.46MB (Si) of memory.

Status

Rank
Select
Hash (fnv, might consider Murmur3, CityHash, or xxHash)
FirstAvailableSlot
Insert
MayContain
Resize
Merge
Count (CQF)

Sizing

The following table shows the approximate sizing for this RSQF implementation at various values of n.

n	δ	p	r	q	Q size
100,000	1/512	26	9	17	184.32 KB
1,000,000	1/512	29	9	20	1.47 MB
10,000,000	1/512	33	9	24	23.59 MB
100,000,000	1/512	36	9	27	188.74 MB
1,000,000,000	1/512	39	9	30	1.51 GB

Glossary

This glossary summarises the variables specified in the stonybrook paper.

Note: Where integers are specified, fractional values are rounded up to the nearest integer when calculated from floating point values.

n - (integer): Maximum number of insertions (e.g. 1,000,000).
δ - (fraction): Error rate or false-positive rate (e.g. 1/512 or 1/100).
p - (integer): Number of bits required from the hashed input to achieve the target error for the given number of insertions (n). The p-bit hash is split into high bits (quotient) and low bits (remainder).
r / remainder - (integer): The number of remainder bits which are written to `Q.remainders`.
q / quotient - (integer): The number of quotient bits used to indicate the expected `home slot` in the filter.
run: A run is a consecutive group of remainders where the quotient is equal.
occupied - (bit): A bit that indicates the position of a home slot for a given `run`.
runend - (bit): A bit that indicates the end of a `run`.
Q - (struct): The RSQF data structure which contains 2^q r-bits of available space allocated by a `block` array. The memory in bits required by the struct can be calculated as follows:
Q.occupieds - (bit vector)
Q.runends - (bit vector)
Q.remainders - (bit vector)
block: A block is `64(r + 2) + 8` bit structure. It is composed of the following fields:
home slot - (array index): The home slot is the location where a remainder would be placed if h0(x) is unoccupied by another value.
slot i - (array index)
h(x) - (integer): A universal hashing function. For this library FNV-1a (64-bit) was employed as it is available in the standard library.
h0(x) / i - (integer): The masked upper bits of the hash shifted right `r` times.
h1(x) - (integer): The masked lower half of the hash.
B - (bit vector): Variable representing a bit-vector. Typically one of `Q.occupieds`, `Q.runends`, or `Q.remainders`.
RANK(B, i) - (integer): Rank returns the number of 1s in B up to position i.
SELECT(B, i) - (integer): Select returns the index of the i^th 1 in B.
O_i - (integer): Is every 64^th slot which is stored in `Q[i].offset` to save space. The offset is calculated with the algorithm that follows.
O_j - (integer): O_j is a derived intermediate slot value which is discovered using the algorithm that follows.

Documentation ¶

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrFilterOverflow = errors.New("RSQF overflow")

ErrFilterOverflow is returned if an insert would result in an overflow within the filter.

Functions ¶

func Rank ¶

func Rank(B, i uint64) uint64

Rank returns the number of 1s in B up to position i. Where position i can be between 0 to 63.

func Select ¶

func Select(B, i uint64) uint64

Select returns the index of the ith 1 in B. If return is 64 it spans into the next bit vector. ~83ns... too damn slow! References:

https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

Types ¶

type Rsqf ¶

type Rsqf struct {
	Q []block
	// contains filtered or unexported fields
}

Rsqf is the core datastructure for this filter. Might evolve to using a 64-bit array which expands the filters size to 3-bits + r per slot from 2.125 + r.

func New ¶

func New(n float64) *Rsqf

New returns a new Rsqf with a fixed 1% error rate.

func (*Rsqf) Hash ¶

func (q *Rsqf) Hash(b []byte) uint64

Hash applies a 64-bit hashing algorithm to b and then splits the result into h0 and h1. Shifting h0 to the right by the remainder size.

func (*Rsqf) Insert ¶

func (q *Rsqf) Insert(x uint64) error

Insert places the hash x into the filter where space is available.

func Insert(Q, x)

r <- rank(Q.occupieds, b)
s <- select(Q.runends, t)
if h0(x) > s then // home slot advantage
	Q.remainders[h0(x)] <- h1(x)
	Q.runends[h0(x)] <- 1
else // oh noes someones in our home slot
	s <- s + 1 // next slot
	n <- FirstAvailableSlot(Q, x) // end of this run
	while n > s do // can prob do this as a block shift op on remainders/runends
		Q.remainders[n] <- Q.remainders[n - 1] // shift remainder right.
		Q.runends[n] <- Q.runends[n - 1] // shift runend value right
		n <- n - 1 // decrement to previous slot
	Q.remainders[s] <- h1(x) // insert slot
	if Q.occupieds[h0(x)] == 1 then
		Q.runends[s - 1] <- 0 // zero previous runend
	Q.runends[s] <- 1 // set current runend
Q.occupieds[h0(x)] <- 1 // force set occupieds for h0(x)
return

func (*Rsqf) MayContain ¶

func (q *Rsqf) MayContain(x []byte)

MayContain tests if the hash exists in this filter. False positives are possible however false negatives cannot occur.

func MayContain(Q, x)

	b <- h0(x)
	if Q.occupieds[b] = 0 then
		return 0
	t <- rank(Q.occupieds, b)
	l <- select(Q.runends, t)
	v <- h1(x)
	repeat
		if Q.remainders[l] == v then
			return 1
		l <- l - 1
	until l < b or Q.runends[l] = 1
  return false

func (*Rsqf) Put ¶

func (q *Rsqf) Put(h0, h1 uint64)

Put treats the Remainders block as a block of memory.

func (*Rsqf) Put2 ¶

func (q *Rsqf) Put2(h0, h1 uint64)

Put2 treats each row in the Remainders block as a bit field for the associated bit position in a given the remainder.

Source Files ¶

View all Source files

rsqf.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL