bloom

package module

v0.0.0-...-04a87e7 Latest Latest Go to latest Published: Jun 2, 2017 License: BSD-2-Clause Imports: 1 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/yourbasic/bloom

Links

Open Source Insights

README ¶

Your basic Bloom filter

Golang probabilistic set data structure

A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership. A membership test returns either ”likely member” or ”definitely not a member”.

Neutral density filter

Image by Robert Emperley, CC BY-SA 2.0.

Installation

Once you have installed Go, run this command to install the bloom package:

go get github.com/yourbasic/bloom

Documentation

There is an online reference for the package at godoc.org/github.com/yourbasic/bloom.

Roadmap

The API of this library is frozen.
Version numbers adhere to semantic versioning.

The only accepted reason to modify the API of this package is to handle issues that can't be resolved in any other reasonable way.

Thanks

Thanks to Sébastien Paolacci for his excellent MurmurHash implementation.

Stefan Nilsson – korthaj

Documentation ¶

Overview ¶

Package bloom provides a Bloom filter implementation.

Bloom filters ¶

A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership.

A membership test returns either ”likely member” or ”definitely not a member”. Only false positives can occur: an element that has been added to the filter will always be identified as ”likely member”.

The probabilities of different outcomes of a membership test at a false-positives rate of 1/100 are:

Test(s)                 true     false
--------------------------------------
s has been added        1        0
s has not been added    0.01     0.99

Elements can be added, but not removed. With more elements in the filter, the probability of false positives increases.

Performance ¶

A full filter with a false-positives rate of 1/p uses roughly 0.26ln(p) bytes per element and performs ⌈1.4ln(p)⌉ bit array lookups per test:

    p     bytes   lookups
-------------------------
    4      0.4       2
    8      0.5       3
   16      0.7       4
   32      0.9       5
   64      1.1       6
  128      1.3       7
  256      1.5       8
  512      1.6       9
 1024      1.8      10

Each membership test makes a single call to a 128-bit hash function. This improves speed without increasing the false-positives rate as shown by Kirsch and Mitzenmacher.

Limitations ¶

This implementation is not intended for cryptographic use.

The internal data representation is different for big-endian and little-endian machines.

Typical use case ¶

The Basics example contains a typcial use case: a blacklist of shady websites.

Example (Basics) ¶

Build a blacklist of shady websites.

package main

import (
	"fmt"
	"github.com/yourbasic/bloom"
)

func main() {
	// Create a Bloom filter with room for 10000 elements
	// at a false-positives rate less than 0.5 percent.
	blacklist := bloom.New(10000, 200)

	// Add an element to the filter.
	url := "https://rascal.com"
	blacklist.Add(url)

	// Test for membership.
	if blacklist.Test(url) {
		fmt.Println(url, "seems to be shady.")
	} else {
		fmt.Println(url, "has not yet been added to our blacklist.")
	}
}

Output:

https://rascal.com seems to be shady.

Examples ¶

Package (Basics)
Filter.Union

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Filter ¶

type Filter struct {
	// contains filtered or unexported fields
}

Filter represents a Bloom filter.

func New ¶

func New(n int, p int) *Filter

New creates an empty Bloom filter with room for n elements at a false-positives rate less than 1/p.

func (*Filter) Add ¶

func (f *Filter) Add(s string) bool

Add adds s to the filter and tells if s was already a likely member.

func (*Filter) AddByte ¶

func (f *Filter) AddByte(b []byte) bool

AddByte adds b to the filter and tells if b was already a likely member.

func (*Filter) Count ¶

func (f *Filter) Count() int64

Count returns an estimate of the number of elements in the filter.

func (*Filter) Test ¶

func (f *Filter) Test(s string) bool

Test tells if s is a likely member of the filter. If true, s is probably a member; if false, s is definitely not a member.

func (*Filter) TestByte ¶

func (f *Filter) TestByte(b []byte) bool

TestByte tells if b is a likely member of the filter. If true, b is probably a member; if false, b is definitely not a member.

func (*Filter) Union ¶

func (f1 *Filter) Union(f2 *Filter) *Filter

Union returns a new Bloom filter that consists of all elements that belong to either f1 or f2. The two filters must be of the same size n and have the same false-positives rate p.

The resulting filter is the same as the filter created from scratch using the union of the two sets.

Example ¶

Compute the union of two filters.

package main

import (
	"fmt"
	"github.com/yourbasic/bloom"
	"strconv"
)

func main() {
	// Create two Bloom filters, each with room for 1000 elements
	// at a false-positives rate less than 1/100.
	n, p := 1000, 100
	f1 := bloom.New(n, p)
	f2 := bloom.New(n, p)

	// Add "0", "2", …, "498" to f1
	for i := 0; i < n/2; i += 2 {
		f1.Add(strconv.Itoa(i))
	}

	// Add "1", "3", …, "499" to f2
	for i := 1; i < n/2; i += 2 {
		f2.Add(strconv.Itoa(i))
	}

	// Compute the approximate size of f1 ∪ f2.
	fmt.Println("f1 ∪ f2:", f1.Union(f2).Count())
}

Output:

f1 ∪ f2: 505

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL