Documentation ¶
Overview ¶
Package bloom provides a Bloom filter implementation.
Bloom filters ¶
A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership.
A membership test returns either ”likely member” or ”definitely not a member”. Only false positives can occur: an element that has been added to the filter will always be identified as ”likely member”.
The probabilities of different outcomes of a membership test at a false-positives rate of 1/100 are:
Test(s) true false -------------------------------------- s has been added 1 0 s has not been added 0.01 0.99
Elements can be added, but not removed. With more elements in the filter, the probability of false positives increases.
Performance ¶
A full filter with a false-positives rate of 1/p uses roughly 0.26ln(p) bytes per element and performs ⌈1.4ln(p)⌉ bit array lookups per test:
p bytes lookups ------------------------- 4 0.4 2 8 0.5 3 16 0.7 4 32 0.9 5 64 1.1 6 128 1.3 7 256 1.5 8 512 1.6 9 1024 1.8 10
Each membership test makes a single call to a 128-bit hash function. This improves speed without increasing the false-positives rate as shown by Kirsch and Mitzenmacher.
Limitations ¶
This implementation is not intended for cryptographic use.
The internal data representation is different for big-endian and little-endian machines.
Typical use case ¶
The Basics example contains a typcial use case: a blacklist of shady websites.
Example (Basics) ¶
Build a blacklist of shady websites.
package main import ( "fmt" "github.com/yourbasic/bloom" ) func main() { // Create a Bloom filter with room for 10000 elements // at a false-positives rate less than 0.5 percent. blacklist := bloom.New(10000, 200) // Add an element to the filter. url := "https://rascal.com" blacklist.Add(url) // Test for membership. if blacklist.Test(url) { fmt.Println(url, "seems to be shady.") } else { fmt.Println(url, "has not yet been added to our blacklist.") } }
Output: https://rascal.com seems to be shady.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Filter ¶
type Filter struct {
// contains filtered or unexported fields
}
Filter represents a Bloom filter.
func New ¶
New creates an empty Bloom filter with room for n elements at a false-positives rate less than 1/p.
func (*Filter) Test ¶
Test tells if s is a likely member of the filter. If true, s is probably a member; if false, s is definitely not a member.
func (*Filter) TestByte ¶
TestByte tells if b is a likely member of the filter. If true, b is probably a member; if false, b is definitely not a member.
func (*Filter) Union ¶
Union returns a new Bloom filter that consists of all elements that belong to either f1 or f2. The two filters must be of the same size n and have the same false-positives rate p.
The resulting filter is the same as the filter created from scratch using the union of the two sets.
Example ¶
Compute the union of two filters.
package main import ( "fmt" "github.com/yourbasic/bloom" "strconv" ) func main() { // Create two Bloom filters, each with room for 1000 elements // at a false-positives rate less than 1/100. n, p := 1000, 100 f1 := bloom.New(n, p) f2 := bloom.New(n, p) // Add "0", "2", …, "498" to f1 for i := 0; i < n/2; i += 2 { f1.Add(strconv.Itoa(i)) } // Add "1", "3", …, "499" to f2 for i := 1; i < n/2; i += 2 { f2.Add(strconv.Itoa(i)) } // Compute the approximate size of f1 ∪ f2. fmt.Println("f1 ∪ f2:", f1.Union(f2).Count()) }
Output: f1 ∪ f2: 505