pachyderm: Index | Files

package bloom

import ""


Package Files

bloom.go bloom.pb.go


var (
    ErrInvalidLengthBloom        = fmt.Errorf("proto: negative length found during unmarshaling")
    ErrIntOverflowBloom          = fmt.Errorf("proto: integer overflow")
    ErrUnexpectedEndOfGroupBloom = fmt.Errorf("proto: unexpected end of group")

func FilterSizeForFalsePositiveRate Uses

func FilterSizeForFalsePositiveRate(falsePositiveRate float64, elementCount int) int

FilterSizeForFalsePositiveRate returns the memory footprint for a filter with the given constraints.

type BloomFilter Uses

type BloomFilter struct {
    NumSubhashes uint32 `protobuf:"varint,1,opt,name=num_subhashes,json=numSubhashes,proto3" json:"num_subhashes,omitempty"`
    // TODO: we could make each bucket signed, which would allow us to do some interesting things,
    // but it might make the design a little confusing.
    // Two BloomFilters with identical hash_length and len(buckets) can be summed
    // to produce a new filter which should be identical to running all operations
    // from the original two filters onto a blank filter.
    // Negative bucket values may be useful if we were to add a background
    // reprocessing stage that would iterate over all existing items and re-add
    // them, but would also track live updates to the set.  Due to live updates,
    // some buckets may need to go negative temporarily - but we would lose the
    // guarantee that removing something that wasn't added to the set is an error.
    // Perhaps better to provide a 'BloomFilterDelta' that can later be combined
    // into an existing filter, while still preserving the invariant that all
    // buckets are positive.
    Buckets              []uint32 `protobuf:"varint,2,rep,packed,name=buckets,proto3" json:"buckets,omitempty"`
    XXX_NoUnkeyedLiteral struct{} `json:"-"`
    XXX_unrecognized     []byte   `json:"-"`
    XXX_sizecache        int32    `json:"-"`

func NewFilterWithFalsePositiveRate Uses

func NewFilterWithFalsePositiveRate(falsePositiveRate float64, elementCount int, maxBytes int) *BloomFilter

NewFilterWithFalsePositiveRate constructs a new bloom filter with the given constraints:

* 'falsePositiveRate' - the expected false positive rate when the filter is
    loaded with 'elementCount' elements
* 'elementCount' - the expected number of elements that will be present in
    the filter
* 'maxBytes' - the maximum memory footprint to allocate for this filter.

Given these, the filter will choose a size that will provide the specified constraints. If the memory footprint is capped by 'maxBytes', the falsePositiveRate will increase.

func NewFilterWithSize Uses

func NewFilterWithSize(bytes int, elementCount int) *BloomFilter

NewFilterWithSize constructs a new bloom filter with the given constraints.

* 'bytes' - the number of bytes to allocate for the filter buckets
* 'elementCount' - the expected number of elements that will be present in
    the filter

Given these, the filter will choose the optimal number of hashes to perform to get the best possible false-positive rate.

func (*BloomFilter) Add Uses

func (f *BloomFilter) Add(hash []byte)

Add adds an item to the filter.

func (*BloomFilter) Descriptor Uses

func (*BloomFilter) Descriptor() ([]byte, []int)

func (*BloomFilter) FalsePositiveRate Uses

func (f *BloomFilter) FalsePositiveRate(elementCount int) float64

FalsePositiveRate returns the expected rate of false positives if the filter contains the given number of elements.

func (*BloomFilter) GetBuckets Uses

func (m *BloomFilter) GetBuckets() []uint32

func (*BloomFilter) GetNumSubhashes Uses

func (m *BloomFilter) GetNumSubhashes() uint32

func (*BloomFilter) IsNotPresent Uses

func (f *BloomFilter) IsNotPresent(hash []byte) bool

IsNotPresent returns true if an item corresponding to the given hash has definitely not been added to the set.

func (*BloomFilter) Marshal Uses

func (m *BloomFilter) Marshal() (dAtA []byte, err error)

func (*BloomFilter) MarshalTo Uses

func (m *BloomFilter) MarshalTo(dAtA []byte) (int, error)

func (*BloomFilter) MarshalToSizedBuffer Uses

func (m *BloomFilter) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*BloomFilter) OverflowRate Uses

func (f *BloomFilter) OverflowRate() float64

OverflowRate iterates over all buckets in the filter and returns the ratio of buckets that are in overflow to the total number of buckets. Once a bucket overflows, it will never return to normal, even if all items corresponding to that bucket are removed - a new filter must be constructed instead.

func (*BloomFilter) ProtoMessage Uses

func (*BloomFilter) ProtoMessage()

func (*BloomFilter) Remove Uses

func (f *BloomFilter) Remove(hash []byte)

Remove removes an item from the filter

func (*BloomFilter) Reset Uses

func (m *BloomFilter) Reset()

func (*BloomFilter) Size Uses

func (m *BloomFilter) Size() (n int)

func (*BloomFilter) String Uses

func (m *BloomFilter) String() string

func (*BloomFilter) Unmarshal Uses

func (m *BloomFilter) Unmarshal(dAtA []byte) error

func (*BloomFilter) UpperBoundCount Uses

func (f *BloomFilter) UpperBoundCount(hash []byte) int

UpperBoundCount returns the maximum number of items in the filter matching the given hash.

func (*BloomFilter) XXX_DiscardUnknown Uses

func (m *BloomFilter) XXX_DiscardUnknown()

func (*BloomFilter) XXX_Marshal Uses

func (m *BloomFilter) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*BloomFilter) XXX_Merge Uses

func (m *BloomFilter) XXX_Merge(src proto.Message)

func (*BloomFilter) XXX_Size Uses

func (m *BloomFilter) XXX_Size() int

func (*BloomFilter) XXX_Unmarshal Uses

func (m *BloomFilter) XXX_Unmarshal(b []byte) error

Package bloom imports 6 packages (graph). Updated 2020-02-03. Refresh now. Tools for package owners.