bloom

package module
v0.0.0-...-04a87e7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 2, 2017 License: BSD-2-Clause Imports: 1 Imported by: 0

README

Your basic Bloom filter

Golang probabilistic set data structure

A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership. A membership test returns either ”likely member” or ”definitely not a member”.

Neutral density filter

Image by Robert Emperley, CC BY-SA 2.0.

Installation

Once you have installed Go, run this command to install the bloom package:

go get github.com/yourbasic/bloom
Documentation

There is an online reference for the package at godoc.org/github.com/yourbasic/bloom.

Roadmap

The only accepted reason to modify the API of this package is to handle issues that can't be resolved in any other reasonable way.

Thanks

Thanks to Sébastien Paolacci for his excellent MurmurHash implementation.

Stefan Nilsson – korthaj

Documentation

Overview

Package bloom provides a Bloom filter implementation.

Bloom filters

A Bloom filter is a fast and space-efficient probabilistic data structure used to test set membership.

A membership test returns either ”likely member” or ”definitely not a member”. Only false positives can occur: an element that has been added to the filter will always be identified as ”likely member”.

The probabilities of different outcomes of a membership test at a false-positives rate of 1/100 are:

Test(s)                 true     false
--------------------------------------
s has been added        1        0
s has not been added    0.01     0.99

Elements can be added, but not removed. With more elements in the filter, the probability of false positives increases.

Performance

A full filter with a false-positives rate of 1/p uses roughly 0.26ln(p) bytes per element and performs ⌈1.4ln(p)⌉ bit array lookups per test:

    p     bytes   lookups
-------------------------
    4      0.4       2
    8      0.5       3
   16      0.7       4
   32      0.9       5
   64      1.1       6
  128      1.3       7
  256      1.5       8
  512      1.6       9
 1024      1.8      10

Each membership test makes a single call to a 128-bit hash function. This improves speed without increasing the false-positives rate as shown by Kirsch and Mitzenmacher.

Limitations

This implementation is not intended for cryptographic use.

The internal data representation is different for big-endian and little-endian machines.

Typical use case

The Basics example contains a typcial use case: a blacklist of shady websites.

Example (Basics)

Build a blacklist of shady websites.

package main

import (
	"fmt"
	"github.com/yourbasic/bloom"
)

func main() {
	// Create a Bloom filter with room for 10000 elements
	// at a false-positives rate less than 0.5 percent.
	blacklist := bloom.New(10000, 200)

	// Add an element to the filter.
	url := "https://rascal.com"
	blacklist.Add(url)

	// Test for membership.
	if blacklist.Test(url) {
		fmt.Println(url, "seems to be shady.")
	} else {
		fmt.Println(url, "has not yet been added to our blacklist.")
	}
}
Output:

https://rascal.com seems to be shady.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Filter

type Filter struct {
	// contains filtered or unexported fields
}

Filter represents a Bloom filter.

func New

func New(n int, p int) *Filter

New creates an empty Bloom filter with room for n elements at a false-positives rate less than 1/p.

func (*Filter) Add

func (f *Filter) Add(s string) bool

Add adds s to the filter and tells if s was already a likely member.

func (*Filter) AddByte

func (f *Filter) AddByte(b []byte) bool

AddByte adds b to the filter and tells if b was already a likely member.

func (*Filter) Count

func (f *Filter) Count() int64

Count returns an estimate of the number of elements in the filter.

func (*Filter) Test

func (f *Filter) Test(s string) bool

Test tells if s is a likely member of the filter. If true, s is probably a member; if false, s is definitely not a member.

func (*Filter) TestByte

func (f *Filter) TestByte(b []byte) bool

TestByte tells if b is a likely member of the filter. If true, b is probably a member; if false, b is definitely not a member.

func (*Filter) Union

func (f1 *Filter) Union(f2 *Filter) *Filter

Union returns a new Bloom filter that consists of all elements that belong to either f1 or f2. The two filters must be of the same size n and have the same false-positives rate p.

The resulting filter is the same as the filter created from scratch using the union of the two sets.

Example

Compute the union of two filters.

package main

import (
	"fmt"
	"github.com/yourbasic/bloom"
	"strconv"
)

func main() {
	// Create two Bloom filters, each with room for 1000 elements
	// at a false-positives rate less than 1/100.
	n, p := 1000, 100
	f1 := bloom.New(n, p)
	f2 := bloom.New(n, p)

	// Add "0", "2", …, "498" to f1
	for i := 0; i < n/2; i += 2 {
		f1.Add(strconv.Itoa(i))
	}

	// Add "1", "3", …, "499" to f2
	for i := 1; i < n/2; i += 2 {
		f2.Add(strconv.Itoa(i))
	}

	// Compute the approximate size of f1 ∪ f2.
	fmt.Println("f1 ∪ f2:", f1.Union(f2).Count())
}
Output:

f1 ∪ f2: 505

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL