bloomfilter

package module
v0.0.0-...-c08d238 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2017 License: BSD-3-Clause Imports: 5 Imported by: 0

README

License BSD Go Report Card GoDoc Build Status

bloomfilter

This package implements a bloom filter in Go.

http://en.wikipedia.org/wiki/Bloom_filter

provides an explanation of what a bloom filter is. Essentially it is a probabilistic memebership function with good size characteristics. For example, we may wish to read in the words from the dictionary file into the filter. Over 99% of the words can be entered into the filter before a collision occurs.

The approach in this package for hashing items into the filter is to obtain the 160 bit SHA1 hash of the original input item, which should give a good distribution. Then, this 160 bit value is decomposed into five 32-bit integers which are then used as modulo (wrapping) offsets into a BitSet (the storage mechanism used by this package). The bits at those offsets are set to true (1).

Should an input token ever hash to five locations that are already set in the BitSet, it will be considered a collision (false positive). Experiment with size settings that minimize collisions.

The size argument provided to the constructor is a desired size in bits for the BitSet used by the bloom filter as a storage mechanism. This value is rounded up to a byte boundary.

Documentation

Overview

Package bloomfilter implemented with SHA1 for hashing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter is implemented using the bitset package.

func New

func New(n uint32) *BloomFilter

New is an alias for NewBloomFilter.

func NewBloomFilter

func NewBloomFilter(n uint32) *BloomFilter

NewBloomFilter will construct a new BloomFilter intended to model n bits. The BitSet constructor will round that number up to the next byte boundary. The BitSet should be adequately compact. Values written into the bloom filter will use modulo to determine the index to set...meaning, overflow indexes will wrap. The BitSet is already concurrent safe through the use of RWMutex. Note: each entry into the filter sets five values, so having n below be less than five is nonsensical

func (*BloomFilter) Read

func (b *BloomFilter) Read(sha1Ints SHA1Ints) (FilterVals, bool, error)

Read the filter values for the modulo offsets for the SHA1Ints, and also send back a convenience bool to indicate if they were all true or not

func (*BloomFilter) Size

func (b *BloomFilter) Size() int

Size will return the size of the underlying BitSet. May be greater than the arg provided to the constructor...the BitSet package rounds up to a byte boundary.

func (*BloomFilter) Write

func (b *BloomFilter) Write(sha1Ints SHA1Ints) (bool, error)

Write shall enter a true (1) value into the underlying BitSet at the modulo offsets described by the sha1Ints (five 32-bit ints). Returns a boolean indicating if there was a collision in the filter (meaning all indexes to be set were already set to true)

type FilterVals

type FilterVals [5]bool

FilterVals are filter values corresponding to offsets derived from the SHA1-ints.

type SHA1Ints

type SHA1Ints [5]uint32

SHA1Ints is 160 bits which we can decompose into 5 32-bit ints.

func GetSHA1Ints

func GetSHA1Ints(s string) (SHA1Ints, error)

GetSHA1Ints will calculate the sha1 hash of a string. From this 160 bit hash, the five 32 bit ints are returned.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL