Documentation ¶
Index ¶
- func OptimalK(fpRate float64) uint
- func OptimalM(n uint, fpRate float64) uint
- type BucketGetter
- type Buckets
- func (b *Buckets) Count() uint
- func (b *Buckets) DecodeFrom(data []byte) (int64, error)
- func (b *Buckets) Get(bucket uint) uint32
- func (b *Buckets) GobDecode(data []byte) error
- func (b *Buckets) GobEncode() ([]byte, error)
- func (b *Buckets) Increment(bucket uint, delta int32) *Buckets
- func (b *Buckets) MaxBucketValue() uint8
- func (b *Buckets) PopCount() (count int)
- func (b *Buckets) ReadFrom(stream io.Reader) (int64, error)
- func (b *Buckets) Reset() *Buckets
- func (b *Buckets) Set(bucket uint, value uint8) *Buckets
- func (b *Buckets) WriteTo(stream io.Writer) (int64, error)
- type Checker
- type Filter
- type PartitionedBloomFilter
- func (p *PartitionedBloomFilter) Add(data []byte) Filter
- func (p *PartitionedBloomFilter) Capacity() uint
- func (p *PartitionedBloomFilter) Count() uint
- func (p *PartitionedBloomFilter) DecodeFrom(data []byte) (int64, error)
- func (p *PartitionedBloomFilter) EstimatedFillRatio() float64
- func (p *PartitionedBloomFilter) FillRatio() float64
- func (p *PartitionedBloomFilter) GobDecode(data []byte) error
- func (p *PartitionedBloomFilter) GobEncode() ([]byte, error)
- func (p *PartitionedBloomFilter) K() uint
- func (p *PartitionedBloomFilter) OptimalCount() uint
- func (p *PartitionedBloomFilter) ReadFrom(stream io.Reader) (int64, error)
- func (p *PartitionedBloomFilter) Reset() *PartitionedBloomFilter
- func (p *PartitionedBloomFilter) SetHash(h hash.Hash64)
- func (p *PartitionedBloomFilter) Test(data []byte) bool
- func (p *PartitionedBloomFilter) TestAndAdd(data []byte) bool
- func (p *PartitionedBloomFilter) UpdateCount() float64
- func (p *PartitionedBloomFilter) WriteTo(stream io.Writer) (int64, error)
- type ScalableBloomFilter
- func (s *ScalableBloomFilter) Add(data []byte) Filter
- func (s *ScalableBloomFilter) Capacity() uint
- func (s *ScalableBloomFilter) DecodeFrom(data []byte) (int64, error)
- func (s *ScalableBloomFilter) FillRatio() float64
- func (s *ScalableBloomFilter) GobDecode(data []byte) error
- func (s *ScalableBloomFilter) GobEncode() ([]byte, error)
- func (s *ScalableBloomFilter) K() uint
- func (s *ScalableBloomFilter) ReadFrom(stream io.Reader) (int64, error)
- func (s *ScalableBloomFilter) Reset() *ScalableBloomFilter
- func (s *ScalableBloomFilter) SetHash(h hash.Hash64)
- func (s *ScalableBloomFilter) Test(data []byte) bool
- func (s *ScalableBloomFilter) TestAndAdd(data []byte) bool
- func (s *ScalableBloomFilter) WriteTo(stream io.Writer) (int64, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type BucketGetter ¶
type Buckets ¶
type Buckets struct {
// contains filtered or unexported fields
}
Buckets is a fast, space-efficient array of buckets where each bucket can store up to a configured maximum value.
func NewBuckets ¶
NewBuckets creates a new Buckets with the provided number of buckets where each bucket is the specified number of bits.
func (*Buckets) DecodeFrom ¶
DecodeFrom reads a binary representation of Buckets (such as might have been written by WriteTo()) from a buffer. Whereas ReadFrom() reads the entire data into memory and makes a copy of the data buffer, DecodeFrom keeps a reference to the original data buffer and can only be used to Test.
func (*Buckets) Increment ¶
Increment will increment the value in the specified bucket by the provided delta. A bucket can be decremented by providing a negative delta. The value is clamped to zero and the maximum bucket value. Returns itself to allow for chaining.
func (*Buckets) MaxBucketValue ¶
MaxBucketValue returns the maximum value that can be stored in a bucket.
func (*Buckets) ReadFrom ¶
ReadFrom reads a binary representation of Buckets (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.
func (*Buckets) Reset ¶
Reset restores the Buckets to the original state. Returns itself to allow for chaining.
type Filter ¶
type Filter interface { Checker // Add will add the data to the Bloom filter. It returns the filter to // allow for chaining. Add([]byte) Filter // TestAndAdd is equivalent to calling Test followed by Add. It returns // true if the data is a member, false if not. TestAndAdd([]byte) bool }
Filter is a probabilistic data structure which is used to test the membership of an element in a set.
type PartitionedBloomFilter ¶
type PartitionedBloomFilter struct {
// contains filtered or unexported fields
}
PartitionedBloomFilter implements a variation of a classic Bloom filter as described by Almeida, Baquero, Preguica, and Hutchison in Scalable Bloom Filters:
http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf
This filter works by partitioning the M-sized bit array into k slices of size m = M/k bits. Each hash function produces an index over m for its respective slice. Thus, each element is described by exactly k bits, meaning the distribution of false positives is uniform across all elements.
func NewPartitionedBloomFilter ¶
func NewPartitionedBloomFilter(n uint, fpRate float64) *PartitionedBloomFilter
NewPartitionedBloomFilter creates a new partitioned Bloom filter optimized to store n items with a specified target false-positive rate.
func NewPartitionedBloomFilterWithCapacity ¶
func NewPartitionedBloomFilterWithCapacity(m uint, fpRate float64) *PartitionedBloomFilter
NewPartitionedBloomFilterWithEstimates creates a new partitioned Bloom filter with a specific capacity
func (*PartitionedBloomFilter) Add ¶
func (p *PartitionedBloomFilter) Add(data []byte) Filter
Add will add the data to the Bloom filter. It returns the filter to allow for chaining.
func (*PartitionedBloomFilter) Capacity ¶
func (p *PartitionedBloomFilter) Capacity() uint
Capacity returns the Bloom filter capacity, m.
func (*PartitionedBloomFilter) Count ¶
func (p *PartitionedBloomFilter) Count() uint
Count returns the number of items added to the filter.
func (*PartitionedBloomFilter) DecodeFrom ¶
func (p *PartitionedBloomFilter) DecodeFrom(data []byte) (int64, error)
DecodeFrom reads a binary representation of PartitionedBloomFilter (such as might have been written by WriteTo()) from a buffer. Whereas ReadFrom() calls Buckets.ReadFrom() hence making a copy of the data, DecodeFrom calls Buckets.DecodeFrom which keeps a reference to the original data buffer.
func (*PartitionedBloomFilter) EstimatedFillRatio ¶
func (p *PartitionedBloomFilter) EstimatedFillRatio() float64
EstimatedFillRatio returns the current estimated ratio of set bits.
func (*PartitionedBloomFilter) FillRatio ¶
func (p *PartitionedBloomFilter) FillRatio() float64
FillRatio returns the average ratio of set bits across all partitions.
func (*PartitionedBloomFilter) GobDecode ¶
func (p *PartitionedBloomFilter) GobDecode(data []byte) error
GobDecode implements gob.GobDecoder interface.
func (*PartitionedBloomFilter) GobEncode ¶
func (p *PartitionedBloomFilter) GobEncode() ([]byte, error)
GobEncode implements gob.GobEncoder interface.
func (*PartitionedBloomFilter) K ¶
func (p *PartitionedBloomFilter) K() uint
K returns the number of hash functions.
func (*PartitionedBloomFilter) OptimalCount ¶
func (p *PartitionedBloomFilter) OptimalCount() uint
OptimalCount returns the optimal number of distinct items that can be stored in this filter.
func (*PartitionedBloomFilter) ReadFrom ¶
func (p *PartitionedBloomFilter) ReadFrom(stream io.Reader) (int64, error)
ReadFrom reads a binary representation of PartitionedBloomFilter (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.
func (*PartitionedBloomFilter) Reset ¶
func (p *PartitionedBloomFilter) Reset() *PartitionedBloomFilter
Reset restores the Bloom filter to its original state. It returns the filter to allow for chaining.
func (*PartitionedBloomFilter) SetHash ¶
func (p *PartitionedBloomFilter) SetHash(h hash.Hash64)
SetHash sets the hashing function used in the filter. For the effect on false positive rates see: https://github.com/tylertreat/BoomFilters/pull/1
func (*PartitionedBloomFilter) Test ¶
func (p *PartitionedBloomFilter) Test(data []byte) bool
Test will test for membership of the data and returns true if it is a member, false if not. This is a probabilistic test, meaning there is a non-zero probability of false positives but a zero probability of false negatives. Due to the way the filter is partitioned, the probability of false positives is uniformly distributed across all elements.
func (*PartitionedBloomFilter) TestAndAdd ¶
func (p *PartitionedBloomFilter) TestAndAdd(data []byte) bool
TestAndAdd is equivalent to calling Test followed by Add. It returns true if the data is a member, false if not.
func (*PartitionedBloomFilter) UpdateCount ¶
func (p *PartitionedBloomFilter) UpdateCount() float64
Since duplicates can be added to a bloom filter, we update the count via the following formula via https://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf
n ≈ −m ln(1 − p). This prevents the count from being off by a large amount when duplicates are added. NOTE: this calls FillRatio which calculates the hamming weight from across all bits which can be relatively expensive. Returns current exact fill ratio
type ScalableBloomFilter ¶
type ScalableBloomFilter struct {
// contains filtered or unexported fields
}
ScalableBloomFilter implements a Scalable Bloom Filter as described by Almeida, Baquero, Preguica, and Hutchison in Scalable Bloom Filters:
http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf
A Scalable Bloom Filter dynamically adapts to the number of elements in the data set while enforcing a tight upper bound on the false-positive rate. This works by adding Bloom filters with geometrically decreasing false-positive rates as filters become full. The tightening ratio, r, controls the filter growth. The compounded probability over the whole series converges to a target value, even accounting for an infinite series.
Scalable Bloom Filters are useful for cases where the size of the data set isn't known a priori and memory constraints aren't of particular concern. For situations where memory is bounded, consider using Inverse or Stable Bloom Filters.
func NewDefaultScalableBloomFilter ¶
func NewDefaultScalableBloomFilter(fpRate float64) *ScalableBloomFilter
NewDefaultScalableBloomFilter creates a new Scalable Bloom Filter with the specified target false-positive rate and an optimal tightening ratio.
func NewScalableBloomFilter ¶
func NewScalableBloomFilter(hint uint, fpRate, r float64) *ScalableBloomFilter
NewScalableBloomFilter creates a new Scalable Bloom Filter with the specified target false-positive rate and tightening ratio. Use NewDefaultScalableBloomFilter if you don't want to calculate these parameters.
func (*ScalableBloomFilter) Add ¶
func (s *ScalableBloomFilter) Add(data []byte) Filter
Add will add the data to the Bloom filter. It returns the filter to allow for chaining.
func (*ScalableBloomFilter) Capacity ¶
func (s *ScalableBloomFilter) Capacity() uint
Capacity returns the current Scalable Bloom Filter capacity, which is the sum of the capacities for the contained series of Bloom filters.
func (*ScalableBloomFilter) DecodeFrom ¶
func (s *ScalableBloomFilter) DecodeFrom(data []byte) (int64, error)
DecodeFrom reads a binary representation of ScalableBloomFilter (such as might have been written by WriteTo()) from a buffer. Whereas ReadFrom() calls PartitionedBloomFilter.ReadFrom() hence making a copy of the data, DecodeFrom calls PartitionedBloomFilter.DecodeFrom which keeps a reference to the original data buffer.
func (*ScalableBloomFilter) FillRatio ¶
func (s *ScalableBloomFilter) FillRatio() float64
FillRatio returns the average ratio of set bits across every filter.
func (*ScalableBloomFilter) GobDecode ¶
func (s *ScalableBloomFilter) GobDecode(data []byte) error
GobDecode implements gob.GobDecoder interface.
func (*ScalableBloomFilter) GobEncode ¶
func (s *ScalableBloomFilter) GobEncode() ([]byte, error)
GobEncode implements gob.GobEncoder interface.
func (*ScalableBloomFilter) K ¶
func (s *ScalableBloomFilter) K() uint
K returns the number of hash functions used in each Bloom filter. Returns the highest value (the last filter)
func (*ScalableBloomFilter) ReadFrom ¶
func (s *ScalableBloomFilter) ReadFrom(stream io.Reader) (int64, error)
ReadFrom reads a binary representation of ScalableBloomFilter (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.
func (*ScalableBloomFilter) Reset ¶
func (s *ScalableBloomFilter) Reset() *ScalableBloomFilter
Reset restores the Bloom filter to its original state. It returns the filter to allow for chaining.
func (*ScalableBloomFilter) SetHash ¶
func (s *ScalableBloomFilter) SetHash(h hash.Hash64)
SetHash sets the hashing function used in the filter. For the effect on false positive rates see: https://github.com/tylertreat/BoomFilters/pull/1
func (*ScalableBloomFilter) Test ¶
func (s *ScalableBloomFilter) Test(data []byte) bool
Test will test for membership of the data and returns true if it is a member, false if not. This is a probabilistic test, meaning there is a non-zero probability of false positives but a zero probability of false negatives.
func (*ScalableBloomFilter) TestAndAdd ¶
func (s *ScalableBloomFilter) TestAndAdd(data []byte) bool
TestAndAdd is equivalent to calling Test followed by Add. It returns true if the data is a member, false if not.