cuckoo

package module

v1.0.6 Latest Latest Go to latest Published: Oct 4, 2023 License: MIT Imports: 5 Imported by: 8

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/panmari/cuckoofilter

Links

Open Source Insights

README ¶

Cuckoo Filter

Well-tuned, production-ready cuckoo filter that performs best in class for low false positive rates (at around 0.01%). For details, see full evaluation.

Background

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose. In this implementation

Every element has 2 possible bucket indices
Buckets have a static size of 4 fingerprints
Fingerprints have a static size of 16 bits

1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r and a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

With the 16 bit fingerprint size in this repository, you can expect r ~= 0.0001. Other implementations use 8 bit, which correspond to a false positive rate of r ~= 0.03.

Example usage

import (
	"fmt"

	cuckoo "github.com/panmari/cuckoofilter"
)

func Example() {
	cf := cuckoo.NewFilter(1000)

	cf.Insert([]byte("pizza"))
	cf.Insert([]byte("tacos"))
	cf.Insert([]byte("tacos")) // Re-insertion is possible.

	fmt.Println(cf.Lookup([]byte("pizza")))
	fmt.Println(cf.Lookup([]byte("missing")))

	cf.Reset()
	fmt.Println(cf.Lookup([]byte("pizza")))
	// Output:
	// true
	// false
	// false
}

For more examples, see the example tests. Operations on a filter are not thread safe by default. See this example for using the filter concurrently.

Documentation ¶

Overview ¶

Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries.

While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the ﬂexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom ﬁlters, for applications that require low false positive rates (< 3%).

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

Example ¶

package main

import (
	"fmt"

	cuckoo "github.com/panmari/cuckoofilter"
)

func main() {
	cf := cuckoo.NewFilter(1000)

	cf.Insert([]byte("pizza"))
	cf.Insert([]byte("tacos"))
	cf.Insert([]byte("tacos")) // Re-insertion is possible.

	fmt.Println(cf.Lookup([]byte("pizza")))
	fmt.Println(cf.Lookup([]byte("missing")))

	cf.Reset()
	fmt.Println(cf.Lookup([]byte("pizza")))
}

Output:

true
false
false

Example (ThreadSafe) ¶

package main

import (
	"fmt"
	"sync"

	cuckoo "github.com/panmari/cuckoofilter"
)

// Small wrapper around cuckoo filter making it thread safe.
type threadSafeFilter struct {
	cf *cuckoo.Filter
	mu sync.RWMutex
}

func (f *threadSafeFilter) insert(item []byte) {
	// Concurrent inserts need a Write lock.
	f.mu.Lock()
	defer f.mu.Unlock()
	f.cf.Insert(item)
}

func (f *threadSafeFilter) lookup(item []byte) bool {
	// Concurrent lookups need a read lock.
	f.mu.RLock()
	defer f.mu.RUnlock()
	return f.cf.Lookup(item)
}

func main() {
	cf := &threadSafeFilter{
		cf: cuckoo.NewFilter(1000),
	}

	var wg sync.WaitGroup
	// Insert items concurrently...
	for i := byte(0); i < 50; i++ {
		wg.Add(1)
		go func(item byte) {
			defer wg.Done()
			cf.insert([]byte{item})
		}(i)
	}

	// ...while also doing lookups concurrently.
	for i := byte(0); i < 100; i++ {
		wg.Add(1)
		go func(item byte) {
			defer wg.Done()
			// State is not well-defined here, so we can't define expectations.
			cf.lookup([]byte{item})
		}(i)
	}
	wg.Wait()

	// Simple lookups to verify initialization.
	fmt.Println(cf.lookup([]byte{1}))
	fmt.Println(cf.lookup([]byte{99}))

}

Output:

true
false

Index ¶

type Filter
- func Decode(data []byte) (*Filter, error)
- func NewFilter(numElements uint) *Filter

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Filter ¶

type Filter struct {
	// contains filtered or unexported fields
}

Filter is a probabilistic counter.

func Decode ¶

func Decode(data []byte) (*Filter, error)

Decode returns a Cuckoofilter from a byte slice created using Encode.

func NewFilter ¶

func NewFilter(numElements uint) *Filter

NewFilter returns a new cuckoofilter suitable for the given number of elements. When inserting more elements, insertion speed will drop significantly and insertions might fail altogether. A capacity of 1000000 is a normal default, which allocates about ~2MB on 64-bit machines.

func (*Filter) Count ¶

func (cf *Filter) Count() uint

Count returns the number of items in the filter.

func (*Filter) Delete ¶

func (cf *Filter) Delete(data []byte) bool

Delete data from the filter. Returns true if the data was found and deleted.

Example ¶

package main

import (
	"fmt"

	cuckoo "github.com/panmari/cuckoofilter"
)

func main() {
	cf := cuckoo.NewFilter(1000)

	cf.Insert([]byte("pizza"))
	cf.Insert([]byte("tacos"))

	fmt.Println(cf.Lookup([]byte("pizza")))

	cf.Delete([]byte("pizza"))
	fmt.Println(cf.Lookup([]byte("pizza")))
}

Output:

true
false

func (*Filter) Encode ¶

func (cf *Filter) Encode() []byte

Encode returns a byte slice representing a Cuckoofilter.

func (*Filter) Insert ¶

func (cf *Filter) Insert(data []byte) bool

Insert data into the filter. Returns false if insertion failed. In the resulting state, the filter * Might return false negatives * Deletes are not guaranteed to work To increase success rate of inserts, create a larger filter.

func (*Filter) LoadFactor ¶ added in v0.0.6

func (cf *Filter) LoadFactor() float64

LoadFactor returns the fraction slots that are occupied.

func (*Filter) Lookup ¶

func (cf *Filter) Lookup(data []byte) bool

Lookup returns true if data is in the filter.

Example ¶

package main

import (
	"fmt"

	cuckoo "github.com/panmari/cuckoofilter"
)

func main() {
	cf := cuckoo.NewFilter(1000)

	cf.Insert([]byte("pizza"))
	cf.Insert([]byte("tacos"))

	fmt.Println(cf.Lookup([]byte("pizza")))
	fmt.Println(cf.Lookup([]byte("missing")))
}

Output:

true
false

func (*Filter) Reset ¶

func (cf *Filter) Reset()

Reset removes all items from the filter, setting count to 0.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL