filter

package
v3.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2024 License: AGPL-3.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func OptimalK

func OptimalK(fpRate float64) uint

OptimalK calculates the optimal number of hash functions to use for a Bloom filter based on the desired rate of false positives.

func OptimalM

func OptimalM(n uint, fpRate float64) uint

OptimalM calculates the optimal Bloom filter size, m, based on the number of items and the desired rate of false positives.

Types

type BucketGetter

type BucketGetter interface {
	Get(bucket uint) uint32
}

type Buckets

type Buckets struct {
	// contains filtered or unexported fields
}

Buckets is a fast, space-efficient array of buckets where each bucket can store up to a configured maximum value.

func NewBuckets

func NewBuckets(count uint, bucketSize uint8) *Buckets

NewBuckets creates a new Buckets with the provided number of buckets where each bucket is the specified number of bits.

func (*Buckets) Count

func (b *Buckets) Count() uint

Count returns the number of buckets.

func (*Buckets) DecodeFrom

func (b *Buckets) DecodeFrom(data []byte) (int64, error)

DecodeFrom reads a binary representation of Buckets (such as might have been written by WriteTo()) from a buffer. Whereas ReadFrom() reads the entire data into memory and makes a copy of the data buffer, DecodeFrom keeps a reference to the original data buffer and can only be used to Test.

func (*Buckets) Get

func (b *Buckets) Get(bucket uint) uint32

Get returns the value in the specified bucket.

func (*Buckets) GobDecode

func (b *Buckets) GobDecode(data []byte) error

GobDecode implements gob.GobDecoder interface.

func (*Buckets) GobEncode

func (b *Buckets) GobEncode() ([]byte, error)

GobEncode implements gob.GobEncoder interface.

func (*Buckets) Increment

func (b *Buckets) Increment(bucket uint, delta int32) *Buckets

Increment will increment the value in the specified bucket by the provided delta. A bucket can be decremented by providing a negative delta. The value is clamped to zero and the maximum bucket value. Returns itself to allow for chaining.

func (*Buckets) MaxBucketValue

func (b *Buckets) MaxBucketValue() uint8

MaxBucketValue returns the maximum value that can be stored in a bucket.

func (*Buckets) PopCount

func (b *Buckets) PopCount() (count int)

func (*Buckets) ReadFrom

func (b *Buckets) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads a binary representation of Buckets (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.

func (*Buckets) Reset

func (b *Buckets) Reset() *Buckets

Reset restores the Buckets to the original state. Returns itself to allow for chaining.

func (*Buckets) Set

func (b *Buckets) Set(bucket uint, value uint8) *Buckets

Set will set the bucket value. The value is clamped to zero and the maximum bucket value. Returns itself to allow for chaining.

func (*Buckets) WriteTo

func (b *Buckets) WriteTo(stream io.Writer) (int64, error)

WriteTo writes a binary representation of Buckets to an i/o stream. It returns the number of bytes written.

type Checker

type Checker interface {
	// Test will test for membership of the data and returns true if it is a
	// member, false if not.
	Test(data []byte) bool
}

type Filter

type Filter interface {
	Checker

	// Add will add the data to the Bloom filter. It returns the filter to
	// allow for chaining.
	Add([]byte) Filter

	// TestAndAdd is equivalent to calling Test followed by Add. It returns
	// true if the data is a member, false if not.
	TestAndAdd([]byte) bool
}

Filter is a probabilistic data structure which is used to test the membership of an element in a set.

type PartitionedBloomFilter

type PartitionedBloomFilter struct {
	// contains filtered or unexported fields
}

PartitionedBloomFilter implements a variation of a classic Bloom filter as described by Almeida, Baquero, Preguica, and Hutchison in Scalable Bloom Filters:

http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

This filter works by partitioning the M-sized bit array into k slices of size m = M/k bits. Each hash function produces an index over m for its respective slice. Thus, each element is described by exactly k bits, meaning the distribution of false positives is uniform across all elements.

func NewPartitionedBloomFilter

func NewPartitionedBloomFilter(n uint, fpRate float64) *PartitionedBloomFilter

NewPartitionedBloomFilter creates a new partitioned Bloom filter optimized to store n items with a specified target false-positive rate.

func NewPartitionedBloomFilterWithCapacity

func NewPartitionedBloomFilterWithCapacity(m uint, fpRate float64) *PartitionedBloomFilter

NewPartitionedBloomFilterWithEstimates creates a new partitioned Bloom filter with a specific capacity

func (*PartitionedBloomFilter) Add

func (p *PartitionedBloomFilter) Add(data []byte) Filter

Add will add the data to the Bloom filter. It returns the filter to allow for chaining.

func (*PartitionedBloomFilter) Capacity

func (p *PartitionedBloomFilter) Capacity() uint

Capacity returns the Bloom filter capacity, m.

func (*PartitionedBloomFilter) Count

func (p *PartitionedBloomFilter) Count() uint

Count returns the number of items added to the filter.

func (*PartitionedBloomFilter) DecodeFrom

func (p *PartitionedBloomFilter) DecodeFrom(data []byte) (int64, error)

DecodeFrom reads a binary representation of PartitionedBloomFilter (such as might have been written by WriteTo()) from a buffer. Whereas ReadFrom() calls Buckets.ReadFrom() hence making a copy of the data, DecodeFrom calls Buckets.DecodeFrom which keeps a reference to the original data buffer.

func (*PartitionedBloomFilter) EstimatedFillRatio

func (p *PartitionedBloomFilter) EstimatedFillRatio() float64

EstimatedFillRatio returns the current estimated ratio of set bits.

func (*PartitionedBloomFilter) FillRatio

func (p *PartitionedBloomFilter) FillRatio() float64

FillRatio returns the average ratio of set bits across all partitions.

func (*PartitionedBloomFilter) GobDecode

func (p *PartitionedBloomFilter) GobDecode(data []byte) error

GobDecode implements gob.GobDecoder interface.

func (*PartitionedBloomFilter) GobEncode

func (p *PartitionedBloomFilter) GobEncode() ([]byte, error)

GobEncode implements gob.GobEncoder interface.

func (*PartitionedBloomFilter) K

K returns the number of hash functions.

func (*PartitionedBloomFilter) OptimalCount

func (p *PartitionedBloomFilter) OptimalCount() uint

OptimalCount returns the optimal number of distinct items that can be stored in this filter.

func (*PartitionedBloomFilter) ReadFrom

func (p *PartitionedBloomFilter) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads a binary representation of PartitionedBloomFilter (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.

func (*PartitionedBloomFilter) Reset

Reset restores the Bloom filter to its original state. It returns the filter to allow for chaining.

func (*PartitionedBloomFilter) SetHash

func (p *PartitionedBloomFilter) SetHash(h hash.Hash64)

SetHash sets the hashing function used in the filter. For the effect on false positive rates see: https://github.com/tylertreat/BoomFilters/pull/1

func (*PartitionedBloomFilter) Test

func (p *PartitionedBloomFilter) Test(data []byte) bool

Test will test for membership of the data and returns true if it is a member, false if not. This is a probabilistic test, meaning there is a non-zero probability of false positives but a zero probability of false negatives. Due to the way the filter is partitioned, the probability of false positives is uniformly distributed across all elements.

func (*PartitionedBloomFilter) TestAndAdd

func (p *PartitionedBloomFilter) TestAndAdd(data []byte) bool

TestAndAdd is equivalent to calling Test followed by Add. It returns true if the data is a member, false if not.

func (*PartitionedBloomFilter) UpdateCount

func (p *PartitionedBloomFilter) UpdateCount() float64

Since duplicates can be added to a bloom filter, we update the count via the following formula via https://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

n ≈ −m ln(1 − p). This prevents the count from being off by a large amount when duplicates are added. NOTE: this calls FillRatio which calculates the hamming weight from across all bits which can be relatively expensive. Returns current exact fill ratio

func (*PartitionedBloomFilter) WriteTo

func (p *PartitionedBloomFilter) WriteTo(stream io.Writer) (int64, error)

WriteTo writes a binary representation of the PartitionedBloomFilter to an i/o stream. It returns the number of bytes written.

type ScalableBloomFilter

type ScalableBloomFilter struct {
	// contains filtered or unexported fields
}

ScalableBloomFilter implements a Scalable Bloom Filter as described by Almeida, Baquero, Preguica, and Hutchison in Scalable Bloom Filters:

http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

A Scalable Bloom Filter dynamically adapts to the number of elements in the data set while enforcing a tight upper bound on the false-positive rate. This works by adding Bloom filters with geometrically decreasing false-positive rates as filters become full. The tightening ratio, r, controls the filter growth. The compounded probability over the whole series converges to a target value, even accounting for an infinite series.

Scalable Bloom Filters are useful for cases where the size of the data set isn't known a priori and memory constraints aren't of particular concern. For situations where memory is bounded, consider using Inverse or Stable Bloom Filters.

func NewDefaultScalableBloomFilter

func NewDefaultScalableBloomFilter(fpRate float64) *ScalableBloomFilter

NewDefaultScalableBloomFilter creates a new Scalable Bloom Filter with the specified target false-positive rate and an optimal tightening ratio.

func NewScalableBloomFilter

func NewScalableBloomFilter(hint uint, fpRate, r float64) *ScalableBloomFilter

NewScalableBloomFilter creates a new Scalable Bloom Filter with the specified target false-positive rate and tightening ratio. Use NewDefaultScalableBloomFilter if you don't want to calculate these parameters.

func (*ScalableBloomFilter) Add

func (s *ScalableBloomFilter) Add(data []byte) Filter

Add will add the data to the Bloom filter. It returns the filter to allow for chaining.

func (*ScalableBloomFilter) Capacity

func (s *ScalableBloomFilter) Capacity() uint

Capacity returns the current Scalable Bloom Filter capacity, which is the sum of the capacities for the contained series of Bloom filters.

func (*ScalableBloomFilter) DecodeFrom

func (s *ScalableBloomFilter) DecodeFrom(data []byte) (int64, error)

DecodeFrom reads a binary representation of ScalableBloomFilter (such as might have been written by WriteTo()) from a buffer. Whereas ReadFrom() calls PartitionedBloomFilter.ReadFrom() hence making a copy of the data, DecodeFrom calls PartitionedBloomFilter.DecodeFrom which keeps a reference to the original data buffer.

func (*ScalableBloomFilter) FillRatio

func (s *ScalableBloomFilter) FillRatio() float64

FillRatio returns the average ratio of set bits across every filter.

func (*ScalableBloomFilter) GobDecode

func (s *ScalableBloomFilter) GobDecode(data []byte) error

GobDecode implements gob.GobDecoder interface.

func (*ScalableBloomFilter) GobEncode

func (s *ScalableBloomFilter) GobEncode() ([]byte, error)

GobEncode implements gob.GobEncoder interface.

func (*ScalableBloomFilter) K

func (s *ScalableBloomFilter) K() uint

K returns the number of hash functions used in each Bloom filter. Returns the highest value (the last filter)

func (*ScalableBloomFilter) ReadFrom

func (s *ScalableBloomFilter) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads a binary representation of ScalableBloomFilter (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.

func (*ScalableBloomFilter) Reset

Reset restores the Bloom filter to its original state. It returns the filter to allow for chaining.

func (*ScalableBloomFilter) SetHash

func (s *ScalableBloomFilter) SetHash(h hash.Hash64)

SetHash sets the hashing function used in the filter. For the effect on false positive rates see: https://github.com/tylertreat/BoomFilters/pull/1

func (*ScalableBloomFilter) Test

func (s *ScalableBloomFilter) Test(data []byte) bool

Test will test for membership of the data and returns true if it is a member, false if not. This is a probabilistic test, meaning there is a non-zero probability of false positives but a zero probability of false negatives.

func (*ScalableBloomFilter) TestAndAdd

func (s *ScalableBloomFilter) TestAndAdd(data []byte) bool

TestAndAdd is equivalent to calling Test followed by Add. It returns true if the data is a member, false if not.

func (*ScalableBloomFilter) WriteTo

func (s *ScalableBloomFilter) WriteTo(stream io.Writer) (int64, error)

WriteTo writes a binary representation of the ScalableBloomFilter to an i/o stream. It returns the number of bytes written.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL