filter

package module
v0.0.0-...-68e4ff5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2015 License: BSD-3-Clause Imports: 7 Imported by: 0

README

Inverse Bloom Filter

An inverse Bloom filter, or "the opposite of a Bloom filter", is a concurrent, probabilistic data structure used to test whether an item has been observed or not. This is a Go implementation, originally by Jeff Hodges, which replaces the use of MD5 hashing with a non-cryptographic FNV-1a function.

The inverse filter may report a false negative but can never report a false positive. That is, it may report that an item has not been seen when it actually has, but it will never report an item as seen which it hasn't come across. This behaves in a similar manner to a fixed-size hashmap which does not handle conflicts.

An example use case is de-duplicating events while processing a stream of data. Ideally, duplicate events are relatively close together.

Documentation

Overview

Package filter implements a concurrent "inverse" Bloom filter, which is effectively the opposite of a classical Bloom filter. It may report a false negative but can never report a false positive. That is, it may report that an item has not been seen when it actually has, but it will never report an item as seen which it hasn't come across. This behaves in a similar manner to a fixed-size hashmap which does not handle conflicts. An example use case is de-duplicating events while processing a stream of data. Ideally, duplicate events are relatively close together.

Credits go to Jeff Hodges (http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a-bloom-filter/).

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrSizeTooLarge is returned by NewSet when the specified size is too
	// large to allocate.
	ErrSizeTooLarge = errors.New("Size given too large to round to a power of 2")

	// ErrSizeTooSmall is returned by NewSet when the specified size is less
	// than or equal to zero.
	ErrSizeTooSmall = errors.New("Cannot have a zero or negative size")

	// MaxSize indicates the largest possible filter size.
	MaxSize = 1 << 30
)

Functions

This section is empty.

Types

type InverseBloomFilter

type InverseBloomFilter struct {
	// contains filtered or unexported fields
}

InverseBloomFilter is a concurrent, probabilistic data structure which can be thought of as the "opposite" of a Bloom filter. It may report a false negative but can never report a false positive.

func NewInverseBloomFilter

func NewInverseBloomFilter(size int) (*InverseBloomFilter, error)

NewInverseBloomFilter creates and returns a new InverseBloomFilter with the specified capacity. It returns an error if the size is not between 0 and MaxSize.

func (*InverseBloomFilter) Observe

func (i *InverseBloomFilter) Observe(key []byte) bool

Observe marks a key as observed. It returns true if the key has been previously observed and false if the key has possibly not been observed yet. It may report a false negative but will never report a false positive. That is, it may return false even though the key was previously observed, but it will never return true for a key that has never been observed.

func (*InverseBloomFilter) Size

func (i *InverseBloomFilter) Size() int

Size returns the filter length.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL