bloomfilter

package module
v0.0.0-...-132606f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2014 License: MIT Imports: 5 Imported by: 0

README

Bloom Filter

A simple data structure that can be used to test set membership. It's main advantage is that it is space efficient compared to structures like hash tables or basic arrays. The downside however is that it can result in false positives, though false negatives are not possible. It's time complexity is dependent on the number of hash functions used.

The basic structure is an array of m bits with k different hash functions. When an element is added or checked, it is run through the hash functions. The results of the hash identify positions in the bit array. The structure can then set the bit to add an element or check the bit for existence of the element.

Usage

Call go get github.com/asadmshah/bloomfilter to install the package. Using it is just as straightforward:

package main

import (
	"fmt"
	"github.com/asadmshah/bloomfilter"
)

func main() {
	n := 500000
	p := 0.01
	e := []byte("Hello World")
	filter := NewBloomFilter(n, p)
	filter.Add(e)
	fmt.Println(filter.Has(e))
}

The NewBloomFilter constructor requires two variables: n the expected number of elements and p the acceptable false positive rate. Add and Has, both require []byte type as their argument. This package does provide a ToBytes helper function for converting anything to bytes but it's probably very inefficient compared to using something like encoding/binary when converting numbers.

By default, the constructor will evaluate the optimum number of bits and hash functions based on the arguments provided. If you'd like more control, the BloomFilter type provides methods SetK and SetM for manually setting them. Calling these methods will automatically clear out any existing data.

Hashing

You're free to change the default Hashing algorithm to any that satisfies the hash.Hash64 interface. The FNV1a hash provided by the standard library is the default one in use. Based on this paper, only two hash functions are required to create a Bloom filter as long as k hashes are generated using gi(x) = h1(x) + (h2(x) * i) with i running from 0:k-1. We can use a single 64-bit hash function to mock two different hash functions by splitting the bytes in half, hence the need for filter.H to satisfy the interface for hash.Hash64 instead of hash.Hash32, from here.

Documentation

Overview

Package bloomfilter provides a space efficient data structure called a bloom filter. It's main advantage is that it is space efficient compared to structures like hash tables or basic arrays. The downside however is that it can result in false positives,/ though false negatives are not possible. It's time complexity is dependent on/ the number of hash functions used.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ToBytes

func ToBytes(x interface{}) []byte

ToBytes is a helper function to convert any type to an array of bytes. This can be used to convert numbers, but using encoding/binary is much more efficient.

Types

type BitSet

type BitSet []uint8

BitSet is an array of 8-bit integers.

func NewBitSet

func NewBitSet(n uint) BitSet

NewBitSet creates a new BitSet with size n.

func (BitSet) Get

func (self BitSet) Get(x uint) bool

Get returns true if the bit at index x is set to 1.

func (BitSet) Set

func (self BitSet) Set(x uint)

Set sets the bit at index x to 1.

type BloomFilter

type BloomFilter struct {
	H hash.Hash64
	// contains filtered or unexported fields
}

BloomFilter represents a basic bloom filter.

func NewBloomFilter

func NewBloomFilter(n int, p float64) *BloomFilter

NewBloomFilter returns an empty bloom filter. It chooses the optimum number of hash functions, k, and bits, m, to be used based on the given expected number of elements, n, and the acceptable false positive rate.

func (*BloomFilter) Add

func (self *BloomFilter) Add(member []byte)

Add adds a member to the set.

func (*BloomFilter) CalculateK

func (self *BloomFilter) CalculateK() float64

CalculateK returns the number of hashes required in order to satisfy the false positive rate.

func (*BloomFilter) CalculateM

func (self *BloomFilter) CalculateM() float64

CalculateM returns the number of bits required in order to satisfy the false positive rate.

func (*BloomFilter) CalculateP

func (self *BloomFilter) CalculateP() float64

CalculateP returns the expected false positive rate based on the number of hash functions and the size of the bitset.

func (*BloomFilter) Clear

func (self *BloomFilter) Clear()

Clear generates and sets a new empty bitset.

func (*BloomFilter) Has

func (self *BloomFilter) Has(member []byte) bool

Has returns true if the member exists in the set.

func (*BloomFilter) K

func (self *BloomFilter) K() uint

K returns the number of hash functions to be used in each operation.

func (*BloomFilter) M

func (self *BloomFilter) M() uint

M returns the size of the set.

func (*BloomFilter) P

func (self *BloomFilter) P() float64

P returns the false positive rate.

func (*BloomFilter) SetK

func (self *BloomFilter) SetK(k uint)

SetK can be used to manually set the number of hash functions. This will empty out the set, so use this before inserting any items.

func (*BloomFilter) SetM

func (self *BloomFilter) SetM(m uint)

SetM can be used to manually set the number of bits. This will empty out the set, so use this before inserting any items.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL