bloom: github.com/yourbasic/bloom Index | Examples | Files

package bloom

import "github.com/yourbasic/bloom"

Package bloom provides a Bloom filter implementation.

Bloom filters

A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership.

A membership test returns either ”likely member” or ”definitely not a member”. Only false positives can occur: an element that has been added to the filter will always be identified as ”likely member”.

The probabilities of different outcomes of a membership test at a false-positives rate of 1/100 are:

Test(s)                 true     false
--------------------------------------
s has been added        1        0
s has not been added    0.01     0.99

Elements can be added, but not removed. With more elements in the filter, the probability of false positives increases.

Performance

A full filter with a false-positives rate of 1/p uses roughly 0.26ln(p) bytes per element and performs ⌈1.4ln(p)⌉ bit array lookups per test:

    p     bytes   lookups
-------------------------
    4      0.4       2
    8      0.5       3
   16      0.7       4
   32      0.9       5
   64      1.1       6
  128      1.3       7
  256      1.5       8
  512      1.6       9
 1024      1.8      10

Each membership test makes a single call to a 128-bit hash function. This improves speed without increasing the false-positives rate as shown by Kirsch and Mitzenmacher.

Limitations

This implementation is not intended for cryptographic use.

The internal data representation is different for big-endian and little-endian machines.

Typical use case

The Basics example contains a typcial use case: a blacklist of shady websites.

Build a blacklist of shady websites.

Code:

// Create a Bloom filter with room for 10000 elements
// at a false-positives rate less than 0.5 percent.
blacklist := bloom.New(10000, 200)

// Add an element to the filter.
url := "https://rascal.com"
blacklist.Add(url)

// Test for membership.
if blacklist.Test(url) {
    fmt.Println(url, "seems to be shady.")
} else {
    fmt.Println(url, "has not yet been added to our blacklist.")
}

Output:

https://rascal.com seems to be shady.

Index

Examples

Package Files

filter.go hash.go

type Filter Uses

type Filter struct {
    // contains filtered or unexported fields
}

Filter represents a Bloom filter.

func New Uses

func New(n int, p int) *Filter

New creates an empty Bloom filter with room for n elements at a false-positives rate less than 1/p.

func (*Filter) Add Uses

func (f *Filter) Add(s string) bool

Add adds s to the filter and tells if s was already a likely member.

func (*Filter) AddByte Uses

func (f *Filter) AddByte(b []byte) bool

AddByte adds b to the filter and tells if b was already a likely member.

func (*Filter) Count Uses

func (f *Filter) Count() int64

Count returns an estimate of the number of elements in the filter.

func (*Filter) Test Uses

func (f *Filter) Test(s string) bool

Test tells if s is a likely member of the filter. If true, s is probably a member; if false, s is definitely not a member.

func (*Filter) TestByte Uses

func (f *Filter) TestByte(b []byte) bool

TestByte tells if b is a likely member of the filter. If true, b is probably a member; if false, b is definitely not a member.

func (*Filter) Union Uses

func (f1 *Filter) Union(f2 *Filter) *Filter

Union returns a new Bloom filter that consists of all elements that belong to either f1 or f2. The two filters must be of the same size n and have the same false-positives rate p.

The resulting filter is the same as the filter created from scratch using the union of the two sets.

Compute the union of two filters.

Code:

// Create two Bloom filters, each with room for 1000 elements
// at a false-positives rate less than 1/100.
n, p := 1000, 100
f1 := bloom.New(n, p)
f2 := bloom.New(n, p)

// Add "0", "2", …, "498" to f1
for i := 0; i < n/2; i += 2 {
    f1.Add(strconv.Itoa(i))
}

// Add "1", "3", …, "499" to f2
for i := 1; i < n/2; i += 2 {
    f2.Add(strconv.Itoa(i))
}

// Compute the approximate size of f1 ∪ f2.
fmt.Println("f1 ∪ f2:", f1.Union(f2).Count())

Output:

f1 ∪ f2: 505

Package bloom imports 1 packages (graph). Updated 2017-12-20. Refresh now. Tools for package owners.