gostatix

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 24, 2023 License: MIT Imports: 24 Imported by: 0

README

GoStatix Tests

Thread-safe and persistent Golang implementations of probabilistic data structures: Bloom Filter, Cuckoo Filter, HyperLogLog, Count-Min Sketch and Top-K

About

This package provides two implementations of the data structures (a) In-memory (b) Redis backed. While keeping data in-memory has the advantages of being higly performant and yielding high throughput, it doesn't serve use cases of portability or communication of the underlying data so well. Take, for example, a setup where two applications are running to fulfill a goal. One of them writes the data and the other reads from it taking decisions based upon that. In this case, there should be a mechanism to share the underlying data structure for the two applications to do their tasks. With some trade-off to performance, this package solves that problem using Redis as the intermediate storage layer to achieve the same.

Quick Start

Install

Bloom Filters

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It provides a way to check for the presence of an element in a set without actually storing the entire set. Bloom filters are particularly useful in scenarios where memory is limited or when the exact membership information is not critical.

Refer: https://web.stanford.edu/~balaji/papers/bloom.pdf

In-memory
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {

	// create a new in-memory bloom filter with 1000000 items and a false positive rate of 0.0001
	filter, _ := gostatix.NewMemBloomFilterWithParameters(1000000, 0.001)

	e1 := []byte("cat")
	e2 := []byte("dog")

	// insert a few elements - "cat" and "dog"
	filter.Insert(e1).Insert(e2)

	// do a lookup for an element
	ok := filter.Lookup(e1)
	fmt.Printf("found cat, %v\n", ok) // found cat, true

	ok = filter.Lookup([]byte("elephant"))
	fmt.Printf("found elephant, %v\n", ok) // found elephant, false
}

Redis
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {

    // parse redis uri
    redisConnOpt, _ := gostatix.ParseRedisURI("redis://127.0.0.1:6379")

    // create redis connection
    gostatix.MakeRedisClient(*redisConnOpt)

    // create a new redis bloom filter with 1000000 items and a false positive rate of 0.0001
    bloomRedis, _ := gostatix.NewRedisBloomFilterWithParameters(1000000, 0.001)

    e1 := []byte("cat")
    e2 := []byte("dog")

    // insert a few elements - "cat" and "dog"
    bloomRedis.Insert(e1).Insert(e2)

    // do a lookup for an element
    ok := bloomRedis.Lookup(e1)
    fmt.Printf("found cat, %v\n", ok) // found cat, true

    ok = filter.Lookup([]byte("elephant"))
    fmt.Printf("found elephant, %v\n", ok) // found elephant, false
}

Cuckoo Filters

A Cuckoo filter is a data structure used for approximate set membership queries, similar to a Bloom filter. It is designed to provide a compromise between memory efficiency, fast membership queries, and the ability to delete elements from the filter. Unlike a Bloom filter, a Cuckoo filter allows for efficient removal of elements while maintaining relatively low false positive rates. Refer: https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

In-memory
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
	// create a new in-memory cuckoo filter with 1000000 items and a false positive rate of 0.0001
    // see doc for more details
	filter := gostatix.NewCuckooFilterWithErrorRate(100000, 4, 100, 0.001)

	e1 := []byte("cat")
	e2 := []byte("dog")

	// insert a few elements - "cat" and "dog"
	filter.Insert(e1, false)
    filter.Insert(e2, false)

	// do a lookup for an element
	ok := filter.Lookup(e1)
	fmt.Printf("found cat, %v\n", ok) // found cat, true

	ok = filter.Lookup([]byte("elephant"))
	fmt.Printf("found elephant, %v\n", ok) // found elephant, false

	ok = filter.Lookup(e2)
	fmt.Printf("found dog, %v\n", ok) // found dog, true

	//remove an element
	_ = filter.Remove(e2)

	ok = filter.Lookup(e2)
	fmt.Printf("found dog, %v\n", ok) // found dog, false
}

Redis
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
    // parse redis uri
    redisConnOpt, _ := gostatix.ParseRedisURI("redis://127.0.0.1:6379")

    // create redis connection
    gostatix.MakeRedisClient(*redisConnOpt)

    // create a new redis backed cuckoo filter with 1000000 items and a false positive rate of 0.0001
    // see doc for more details
    filter, _ := gostatix.NewCuckooFilterRedisWithErrorRate(1000000, 4, 100, 0.001)

    // insert a few elements - "cat" and "dog"
    filter.Insert(e1, false)
    filter.Insert(e2, false)

    // do a lookup for an element
    ok, _ = filter.Lookup(e1)
    fmt.Printf("found cat, %v\n", ok) // found cat, true

    ok, _ = filter.Lookup([]byte("elephant"))
    fmt.Printf("found elephant, %v\n", ok) // found elephant, false

    ok, _ = filter.Lookup(e2)
    fmt.Printf("found dog, %v\n", ok) // found dog, true

    //remove an element
    _, _ = filter.Remove(e2)

    ok, _ = filter.Lookup(e2)
    fmt.Printf("found dog, %v\n", ok) // found dog, false
}

Count-Min Sketch

A probabilistic data structure used to estimate the frequency of items in a data stream. Refer: http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf

In-memory
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
	// create a new in-memory count-min sketch with error rate of 0.0001 and delta of 0.9999
	sketch, _ := gostatix.NewCountMinSketchFromEstimates(0.0001, 0.9999)

	e1 := []byte("cat")
	e2 := []byte("dog")

	// insert counts of few elements - "cat" and "dog"
	sketch.Update(e1, 2)
	sketch.Update(e2, 3)
	sketch.Update(e1, 1)

	// do a lookup for an count for any element
	count := sketch.Count(e1)
	fmt.Printf("found %v cats\n", count) // found 3 cats
}

Redis
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
    // create redis connection
    gostatix.MakeRedisClient(*redisConnOpt)

    // create a new redis backed count-min sketch with error rate of 0.0001 and delta of 0.9999
    sketch, _ := gostatix.NewCountMinSketchRedisFromEstimates(0.001, 0.999)

    e1 := []byte("cat")
    e2 := []byte("dog")

    // insert counts of few elements - "cat" and "dog"
    err := sketch.Update(e1, 2)
    if err != nil {
        fmt.Printf("error: %v\n", err)
    }
    sketch.Update(e2, 3)
    sketch.Update(e1, 1)

    // do a lookup for an count for any element
    count, _ = sketch.Count(e1)
    fmt.Printf("found %v cats\n", count) // found 3 cats
}

HyperLogLog

A probabilistic data structure used for estimating the cardinality (number of unique elements) of in a very large dataset. Refer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf

In-memory
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
	// create a new in-memory hyperloglog with 4 registers
	hll, _ := gostatix.NewHyperLogLog(4)

	e1 := []byte("cat")
	e2 := []byte("dog")

	// insert counts of few elements
	hll.Update(e1)
	hll.Update(e2)
	hll.Update(e1)

	// do a lookup for count of distinct elements
	distinct := hll.Count(true, true)
	fmt.Printf("found %v distinct elements\n", distinct) // found 2 distinct elements
}

Redis
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
    // parse redis uri
    redisConnOpt, _ := gostatix.ParseRedisURI("redis://127.0.0.1:6379")

    // create redis connection
    gostatix.MakeRedisClient(*redisConnOpt)

    // create a new redis backed hyperloglog with 4 registers
    hll, _ := gostatix.NewHyperLogLogRedis(4)

    e1 := []byte("cat")
    e2 := []byte("dog")

    // insert counts of few elements
    hll.Update(e1)
    hll.Update(e2)
    hll.Update(e1)

    // do a lookup for count of distinct elements
    distinct, _ := hll.Count(true, true)
    fmt.Printf("found %v distinct elements\n", distinct) // found 2 distinct elements
}

Top-K

It's a data structure designed to efficiently retrieve the "top-K" or "largest-K" elements from a dataset based on a certain criterion, such as frequency, value, or score.

In-memory
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
	// create a new in-memory top-k with k=2, error rate of 0.001 and delta of 0.999
	t := gostatix.NewTopK(2, 0.001, 0.999)

	e1 := []byte("cat")
	e2 := []byte("dog")
	e3 := []byte("lion")
	e4 := []byte("tiger")

	// insert counts of few elements
	t.Insert(e1, 3)
	t.Insert(e2, 2)
	t.Insert(e1, 1)
	t.Insert(e3, 3)
	t.Insert(e4, 1)

	// do a lookup for top-k (2) elements
	values := t.Values()
	fmt.Printf("%v\n", values) // [{cat 4} {lion 3}]
}

Redis
package main

import (
	"fmt"

	"github.com/kwertop/gostatix"
)

func main() {
    // parse redis uri
    redisConnOpt, _ := gostatix.ParseRedisURI("redis://127.0.0.1:6379")

    // create redis connection
    gostatix.MakeRedisClient(*redisConnOpt)

    // create a new redis backed top-k with k=2, error rate of 0.001 and delta of 0.999
    t1 := gostatix.NewTopKRedis(2, 0.001, 0.999)

    e1 := []byte("cat")
    e2 := []byte("dog")
    e3 := []byte("lion")
    e4 := []byte("tiger")

    // insert counts of few elements
    t1.Insert(e1, 3)
    t1.Insert(e2, 2)
    t1.Insert(e1, 1)
    t1.Insert(e3, 3)
    t1.Insert(e4, 1)

    // do a lookup for top-k (2) elements
    values1, _ := t1.Values()
    fmt.Printf("%v\n", values1) // [{cat 4} {lion 3}]
}

Documentation

Overview

Implements buckets - a container of fixed number of entries used in cuckoo filters.

Implements probabilistic data structure used in estimating count.

Count-Min Sketch: A probabilistic data structure used to estimate the frequency of items in a data stream. Refer: http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Provides data structures and methods for creating probabilistic filters. This package provides implementation Cuckoo Filter.

BaseFilter provides data structures and methods for creating probabilistic filters. This package provides implementations of two of the most widely used filters, Bloom Filter and Cuckoo Filter.

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It provides a way to check for the presence of an element in a set without actually storing the entire set. Bloom filters are particularly useful in scenarios where memory is limited or when the exact membership information is not critical. Refer: https://web.stanford.edu/~balaji/papers/bloom.pdf

A Cuckoo filter is a data structure used for approximate set membership queries, similar to a Bloom filter. It is designed to provide a compromise between memory efficiency, fast membership queries, and the ability to delete elements from the filter. Unlike a Bloom filter, a Cuckoo filter allows for efficient removal of elements while maintaining relatively low false positive rates. Refer: https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf

Implements probabilistic data structure hyperloglog used in estimating unique entries in a large dataset.

Hyperloglog: A probabilistic data structure used for estimating the cardinality (number of unique elements) of in a very large dataset. Refer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Implements bitsets - both in-memory and redis. For in-memory, https://github.com/bits-and-blooms/bitset is used while for redis, bitset operations of redis are used.

Implements bitsets - both in-memory and redis. For in-memory, https://github.com/bits-and-blooms/bitset is used while for redis, bitset operations of redis are used.

Provides data structures and methods for creating probabilistic filters. This package provides implementation of Bloom Filter.

Implements buckets - a container of fixed number of entries used in cuckoo filters.

Implements buckets - a container of fixed number of entries used in cuckoo filters.

Implements probabilistic data structure used in estimating count.

Count-Min Sketch: A probabilistic data structure used to estimate the frequency of items in a data stream. Refer: http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Implements probabilistic data structure used in estimating count.

Count-Min Sketch: A probabilistic data structure used to estimate the frequency of items in a data stream. Refer: http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Provides data structures and methods for creating probabilistic filters. This package provides implementation Cuckoo Filter.

Provides data structures and methods for creating probabilistic filters. This package provides implementation Cuckoo Filter.

Implements probabilistic data structure hyperloglog used in estimating unique entries in a large dataset.

Hyperloglog: A probabilistic data structure used for estimating the cardinality (number of unique elements) of in a very large dataset. Refer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Implements probabilistic data structure hyperloglog used in estimating unique entries in a large dataset.

Hyperloglog: A probabilistic data structure used for estimating the cardinality (number of unique elements) of in a very large dataset. Refer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Implements bitsets - both in-memory and redis. For in-memory, https://github.com/bits-and-blooms/bitset is used while for redis, bitset operations of redis are used.

Package count implements various probabilistic data structures used in estimating top-K elements.

Top-K: A data structure designed to efficiently retrieve the "top-K" or "largest-K" elements from a dataset based on a certain criterion, such as frequency, value, or score

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Package count implements various probabilistic data structures used in estimating top-K elements.

Top-K: A data structure designed to efficiently retrieve the "top-K" or "largest-K" elements from a dataset based on a certain criterion, such as frequency, value, or score

The package implements both in-mem and Redis backed solutions for the data structures. The in-memory data structures are thread-safe.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MakeRedisClient

func MakeRedisClient(options RedisConnOptions)

Types

type AbstractBucket

type AbstractBucket struct {
	BaseBucket
	// contains filtered or unexported fields
}

func (*AbstractBucket) Size

func (bucket *AbstractBucket) Size() uint64

Size returns the number of entries in the bucket

type AbstractCountMinSketch

type AbstractCountMinSketch struct {
	BaseCountMinSketch
	// contains filtered or unexported fields
}

func (*AbstractCountMinSketch) GetColumns

func (cms *AbstractCountMinSketch) GetColumns() uint

GetColumns returns the number of columns in the underlying matrix of the Count-Min Sketch

func (*AbstractCountMinSketch) GetRows

func (cms *AbstractCountMinSketch) GetRows() uint

GetRows returns the number of rows in the underlying matrix of the Count-Min Sketch

type AbstractCuckooFilter

type AbstractCuckooFilter struct {
	BaseCuckooFilter
	// contains filtered or unexported fields
}

func (*AbstractCuckooFilter) BucketSize

func (cuckooFilter *AbstractCuckooFilter) BucketSize() uint64

BucketSize returns the size of the individual buckets of the Cuckoo Filter

func (*AbstractCuckooFilter) CellSize

func (cuckooFilter *AbstractCuckooFilter) CellSize() uint64

CellSize returns the overall size of the Cuckoo Filter - _size_ * _bucketSize_

func (*AbstractCuckooFilter) CuckooPositiveRate

func (cuckooFilter *AbstractCuckooFilter) CuckooPositiveRate() float64

CuckooPositiveRate returns the false positive error rate of the filter

func (*AbstractCuckooFilter) FingerPrintLength

func (cuckooFilter *AbstractCuckooFilter) FingerPrintLength() uint64

FingerPrintLength returns the length of the fingerprint of the Cuckoo Filter

func (*AbstractCuckooFilter) Retries

func (cuckooFilter *AbstractCuckooFilter) Retries() uint64

Retries returns the number of retries that the Cuckoo Filter makes if the first and second indices returned after hashing the input is already occupied in the filter

func (*AbstractCuckooFilter) Size

func (cuckooFilter *AbstractCuckooFilter) Size() uint64

Size returns the size of the buckets slice of the Cuckoo Filter

type AbstractHyperLogLog

type AbstractHyperLogLog struct {
	BaseHyperLogLog
	// contains filtered or unexported fields
}

func (*AbstractHyperLogLog) Accuracy

func (h *AbstractHyperLogLog) Accuracy() float64

Accuracy returns the accuracy of the hyperloglog

func (*AbstractHyperLogLog) NumRegisters

func (h *AbstractHyperLogLog) NumRegisters() uint64

NumRegisters returns the number of registers in the hyperloglog

type BaseBucket

type BaseBucket interface {
	Size() uint64
}

type BaseCountMinSketch

type BaseCountMinSketch interface {
	GetRows() uint
	GetColumns() uint
	Update(data []byte, count uint64)
	UpdateString(data string, count uint64)
	Count(data []byte) uint64
	CountString(data string) uint64
	UpdateOnce(data []byte)
}

Interface for Count-Min Sketch

type BaseCuckooFilter

type BaseCuckooFilter interface {
	Size() uint64
	Length() uint64
	BucketSize() uint64
	FingerPrintLength() uint64
	CellSize() uint64
	Retries() uint64
}

type BaseFilter

type BaseFilter[T any] interface {
	Insert(element T) (bool, error)
	Lookup(element T) (bool, error)
}

type BaseHyperLogLog

type BaseHyperLogLog interface {
	NumRegisters() uint64
	Accuracy() float64
	Count(withCorrection bool, withRoundingOff bool) uint64
	Update(data []byte)
	Equals(g *HyperLogLog) bool
}

Interface for Hyperloglog

type BitSetMem

type BitSetMem struct {
	// contains filtered or unexported fields
}

BitSetMem is an implementation of IBitSet. _size_ is the number of bits in the bitset _set_ is the bitset implementation adopted from https://github.com/bits-and-blooms/bitset

type BitSetRedis

type BitSetRedis struct {
	// contains filtered or unexported fields
}

BitSetRedis is an implementation of IBitSet. size is the number of bits in the bitset key is the redis key to the bitset data structure in redis Bitsets or Bitmaps are implemented in Redis using string. All bit operations are done on the string stored at _key_. For more details, please refer https://redis.io/docs/data-types/bitmaps/

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

The BloomFilter data structure. It mainly has two fields: _size_ and _numHashes_ _size_ denotes the maximum size of the bloom filter _numHashes_ denotes the number of hashing functions applied on the entrant element during insertion or lookup. _filter_ is the bitset backing internally the bloom filter. It can either be a type of BitSetMem (in-memory) or BitSetRedis (redis-backed). _metadataKey_ saves the information about a Bloom Filter saved on Redis _lock_ is used to synchronize read/write on an in-memory BitSetMem. It's not used for BitSetRedis as Redis is event-driven single threaded

func NewBloomFilterWithBitSet

func NewBloomFilterWithBitSet(size, numHashes uint, filter IBitSet, metadataKey string) (*BloomFilter, error)

NewBloomFilterWithBitSet creates and returns a new BloomFilter _size_ is the maximum size of the bloom filter _numHashes_ is the number of hashing functions to be applied on the entrant _filter_ is either BitSetMem or BitSetRedis _metadataKey_ is needed if the filter is of type BitSetRedis otherwise it's overlooked

func NewMemBloomFilterFromBitSet

func NewMemBloomFilterFromBitSet(data []uint64, numHashes uint) *BloomFilter

NewRedisBloomFilterFromBitSet creates and returns a new in-memory BloomFilter from the bitset passed in the parameter _data_ _numHashes_ parameter is needed for the number of hashing functions

func NewMemBloomFilterWithParameters

func NewMemBloomFilterWithParameters(numItems uint, errorRate float64) (*BloomFilter, error)

NewRedisBloomFilterWithParameters creates and returns a new in-memory BloomFilter _numItems_ is the number of items for which the bloom filter has to be checked for validation _errorRate_ is the acceptable false positive error rate Based upon the above two parameters passed, the size of the bloom filter is calculated

func NewRedisBloomFilterFromBitSet

func NewRedisBloomFilterFromBitSet(data []uint64, numHashes uint) (*BloomFilter, error)

NewRedisBloomFilterFromBitSet creates and returns a new Redis backed BloomFilter from the bitset passed in the parameter _data_ _numHashes_ parameter is needed for the number of hashing functions

func NewRedisBloomFilterFromKey

func NewRedisBloomFilterFromKey(metadataKey string) (*BloomFilter, error)

NewRedisBloomFilterFromKey is used to create a new Redis backed BloomFilter from the _metadataKey_ (the Redis key used to store the metadata about the bloom filter) passed For this to work, value should be present in Redis at _key_

func NewRedisBloomFilterWithParameters

func NewRedisBloomFilterWithParameters(numItems uint, errorRate float64) (*BloomFilter, error)

NewRedisBloomFilterWithParameters creates and returns a new Redis backed BloomFilter _numItems_ is the number of items for which the bloom filter has to be checked for validation _errorRate_ is the acceptable false positive error rate Based upon the above two parameters passed, the size of the bloom filter is calculated metadataKey is created using a random alpha-numeric generator which can be retrieved using MetadataKey() method

func (*BloomFilter) BloomPositiveRate

func (bloomFilter *BloomFilter) BloomPositiveRate() float64

BloomPositiveRate returns the false positive error rate of the filter

func (*BloomFilter) Equals

func (aFilter *BloomFilter) Equals(bFilter *BloomFilter) (bool, error)

Equals checks if two BloomFilter's are equal

func (*BloomFilter) Export

func (bloomFilter *BloomFilter) Export() ([]byte, error)

Export JSON marshals the BloomFilter and returns a byte slice containing the data

func (*BloomFilter) GetBitSet

func (bloomFilter *BloomFilter) GetBitSet() *IBitSet

GetBitSet returns the internal bitset. It would be a BitSetMem in case of an in-memory Bloom filter while it would be a BitSetRedis for a Redis backed Bloom filter.

func (*BloomFilter) GetCap

func (bloomFilter *BloomFilter) GetCap() uint

GetCap returns the size of the bloom filter

func (*BloomFilter) GetMetadataKey

func (bloomFilter *BloomFilter) GetMetadataKey() string

GetMetadataKey returns the Redis key used to store the metadata about the Redis backed Bloom filter

func (*BloomFilter) GetNumHashes

func (bloomFilter *BloomFilter) GetNumHashes() uint

GetNumHashes returns the number of hash functions used in the bloom filter

func (*BloomFilter) Import

func (bloomFilter *BloomFilter) Import(data []byte) error

Import JSON unmarshals the _data_ into the BloomFilter

func (*BloomFilter) Insert

func (bloomFilter *BloomFilter) Insert(data []byte) *BloomFilter

Insert writes new _data_ in the bloom filter

func (*BloomFilter) InsertString

func (bloomFilter *BloomFilter) InsertString(data string) *BloomFilter

InsertString accepts string value as _data_ for inserting into the Bloom filter

func (*BloomFilter) Lookup

func (bloomFilter *BloomFilter) Lookup(data []byte) bool

Lookup returns true if the corresponding bits in the bitset for _data_ is set, otherwise false

func (*BloomFilter) LookupString

func (bloomFilter *BloomFilter) LookupString(data string) bool

LookupString accepts string value as _data_ to lookup the Bloom filter

func (*BloomFilter) ReadFrom

func (bloomFilter *BloomFilter) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads the BloomFilter from the specified _stream_ and returns the number of bytes read. It can be used to read from disk (using a file stream) or from network. It's not implemented for Redis backed Bloom filter (BitSetRedis) as data for a Redis backed Bloom Filter is already there in Redis. NewRedisBloomFilterFromKey method can be used to import or create a BloomFilter instead

func (*BloomFilter) WriteTo

func (bloomFilter *BloomFilter) WriteTo(stream io.Writer) (int64, error)

WriteTo writes the BloomFilter onto the specified _stream_ and returns the number of bytes written. It can be used to write to disk (using a file stream) or to network. It's not implemented for Redis backed Bloom filter (BitSetRedis) as data for a Redis backed Bloom Filter is already there in Redis.

type BucketMem

type BucketMem struct {
	*AbstractBucket
	// contains filtered or unexported fields
}

BucketMem is in-memmory data structure holding the entries of the bucket used for cuckoo filters. _elements_ is the string slice which holds the actual values _length_ is used to track the number of non-empty/valied entries in the bucket

type BucketRedis

type BucketRedis struct {
	*AbstractBucket
	// contains filtered or unexported fields
}

BucketRedis is data structure holding the key to the entries of the bucket saved in redis used for cuckoo filters. BucketRedis is implemented using Redis Lists. _key_ is the redis key to the list which holds the actual values _key_len is used to track the number of non-empty/valied entries in the bucket as a key-value pair in Redis. Lua scripts are used wherever possible to make the read/write operations from Redis atomic.

type CountMinSketch

type CountMinSketch struct {
	AbstractCountMinSketch
	// contains filtered or unexported fields
}

CountMinSketch struct. This is an in-memory implementation of Count-Min Sketch. It's mainly governed by a 2-d slice _matrix_ which holds the count of hashed items at different hashed locations _lock_ is used to synchronize concurrent read/writes

func NewCountMinSketch

func NewCountMinSketch(rows, columns uint) (*CountMinSketch, error)

NewCountMinSketch creates CountMinSketch with _rows_ and _columns_

func NewCountMinSketchFromEstimates

func NewCountMinSketchFromEstimates(errorRate, delta float64) (*CountMinSketch, error)

NewCountMinSketchFromEstimates creates a new CountMinSketch based upon the desired _errorRate_ and _delta_ rows and columns are calculated based upon these supplied values

func (*CountMinSketch) Count

func (cms *CountMinSketch) Count(data []byte) uint64

Count estimates the count of the _data_ (byte slice) in the Count-Min Sketch data structure

func (*CountMinSketch) CountString

func (cms *CountMinSketch) CountString(data string) uint64

CountString estimates the count of the _data_ (string) in the Count-Min Sketch data structure

func (*CountMinSketch) Equals

func (cms *CountMinSketch) Equals(cms1 *CountMinSketch) bool

Equals checks if two CountMinSketch are equal

func (*CountMinSketch) Export

func (cms *CountMinSketch) Export() ([]byte, error)

Export JSON marshals the CountMinSketch and returns a byte slice containing the data

func (*CountMinSketch) Import

func (cms *CountMinSketch) Import(data []byte) error

Import JSON unmarshals the _data_ into the CountMinSketch

func (*CountMinSketch) Merge

func (cms *CountMinSketch) Merge(cms1 *CountMinSketch) error

Merge merges two Count-Min Sketch data structures

func (*CountMinSketch) ReadFrom

func (cms *CountMinSketch) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads the CountMinSketch from the specified _stream_ and returns the number of bytes read. It can be used to read from disk (using a file stream) or from network.

func (*CountMinSketch) Update

func (cms *CountMinSketch) Update(data []byte, count uint64)

Update increments the count of _data_ (byte slice) in Count-Min Sketch by value _count_ passed

func (*CountMinSketch) UpdateOnce

func (cms *CountMinSketch) UpdateOnce(data []byte)

UpdateOnce increments the count of _data_ in Count-Min Sketch by 1

func (*CountMinSketch) UpdateString

func (cms *CountMinSketch) UpdateString(data string, count uint64)

UpdateString increments the count of _data_ (string) in Count-Min Sketch by value _count_ passed

func (*CountMinSketch) WriteTo

func (cms *CountMinSketch) WriteTo(stream io.Writer) (int64, error)

WriteTo writes the CountMinSketch onto the specified _stream_ and returns the number of bytes written. It can be used to write to disk (using a file stream) or to network.

type CountMinSketchRedis

type CountMinSketchRedis struct {
	AbstractCountMinSketch
	// contains filtered or unexported fields
}

CountMinSketchRedis is the Redis backed implementation of BaseCountMinSketch _key_ holds the Redis key to the list which has the Redis keys of rows of data _metadataKey_ is used to store the additional information about CountMinSketchRedis for retrieving the sketch by the Redis key

func NewCountMinSketchRedis

func NewCountMinSketchRedis(rows, columns uint) (*CountMinSketchRedis, error)

NewCountMinSketchRedis creates CountMinSketchRedis with _rows_ and _columns_

func NewCountMinSketchRedisFromEstimates

func NewCountMinSketchRedisFromEstimates(errorRate, delta float64) (*CountMinSketchRedis, error)

NewCountMinSketchRedisFromEstimates creates a new CountMinSketchRedis based upon the desired _errorRate_ and _delta_ rows and columns are calculated based upon these supplied values

func NewCountMinSketchRedisFromKey

func NewCountMinSketchRedisFromKey(metadataKey string) (*CountMinSketchRedis, error)

NewCountMinSketchRedisFromKey is used to create a new Redis backed CountMinSketchRedis from the _metadataKey_ (the Redis key used to store the metadata about the count-min sketch) passed. For this to work, value should be present in Redis at _key_

func (*CountMinSketchRedis) Count

func (cms *CountMinSketchRedis) Count(data []byte) (uint64, error)

Count estimates the count of the _data_ (byte slice) in the CountMinSketchRedis

func (*CountMinSketchRedis) CountString

func (cms *CountMinSketchRedis) CountString(data string) (uint64, error)

CountString estimates the count of the _data_ (string) in the CountMinSketchRedis

func (*CountMinSketchRedis) Equals

func (cms *CountMinSketchRedis) Equals(cms1 *CountMinSketchRedis) (bool, error)

Equals checks if two CountMinSketchRedis are equal

func (*CountMinSketchRedis) Export

func (cms *CountMinSketchRedis) Export() ([]byte, error)

Export JSON marshals the CountMinSketchRedis and returns a byte slice containing the data

func (*CountMinSketchRedis) Import

func (cms *CountMinSketchRedis) Import(data []byte, withNewKey bool) error

Import JSON unmarshals the _data_ into the CountMinSketchRedis

func (*CountMinSketchRedis) Merge

func (cms *CountMinSketchRedis) Merge(cms1 *CountMinSketchRedis) error

Merge merges two Count-Min Sketch data structures

func (*CountMinSketchRedis) MetadataKey

func (cms *CountMinSketchRedis) MetadataKey() string

MetadataKey returns the metadataKey

func (*CountMinSketchRedis) Update

func (cms *CountMinSketchRedis) Update(data []byte, count uint64) error

Update increments the count of _data_ (byte slice) in CountMinSketchRedis by value _count_ passed

func (*CountMinSketchRedis) UpdateOnce

func (cms *CountMinSketchRedis) UpdateOnce(data []byte)

UpdateOnce increments the count of _data_ in CountMinSketchRedis by 1

func (*CountMinSketchRedis) UpdateString

func (cms *CountMinSketchRedis) UpdateString(data string, count uint64) error

UpdateString increments the count of _data_ (string) in CountMinSketchRedis by value _count_ passed

type CuckooFilter

type CuckooFilter struct {
	*AbstractCuckooFilter
	// contains filtered or unexported fields
}

CuckooFilter is the in-memory implementation of BaseCuckooFilter _buckets_ is a slice of BucketMem _length_ represents the number of entries present in the Cuckoo Filter _lock_ is used to synchronize concurrent read/writes

func NewCuckooFilter

func NewCuckooFilter(size, bucketSize, fingerPrintLength uint64) *CuckooFilter

NewCuckooFilter creates a new in-memory CuckooFilter _size_ is the size of the BucketMem slice _bucketSize_ is the size of the individual buckets inside the bucket slice _fingerPrintLength_ is fingerprint hash of the input to be inserted/removed/lookup

func NewCuckooFilterWithErrorRate

func NewCuckooFilterWithErrorRate(size, bucketSize, retries uint64, errorRate float64) *CuckooFilter

NewCuckooFilterWithErrorRate creates an in-memory CuckooFilter with a specified false positive rate : _errorRate_ _size_ is the size of the BucketMem slice _bucketSize_ is the size of the individual buckets inside the bucket slice _retries_ is the number of retries that the Cuckoo filter makes if the first two indices obtained _errorRate_ is the desired false positive rate of the filter. fingerPrintLength is calculated according to this error rate.

func NewCuckooFilterWithRetries

func NewCuckooFilterWithRetries(size, bucketSize, fingerPrintLength, retries uint64) *CuckooFilter

NewCuckooFilterWithRetries creates new in-memory CuckooFilter with specified _retries_ _size_ is the size of the BucketMem slice _bucketSize_ is the size of the individual buckets inside the bucket slice _fingerPrintLength_ is fingerprint hash of the input to be inserted/removed/lookup _retries_ is the number of retries that the Cuckoo filter makes if the first two indices obtained after hashing the input is already occupied in the filter

func (*CuckooFilter) Equals

func (aFilter *CuckooFilter) Equals(bFilter *CuckooFilter) bool

Equals checks if two CuckooFilter are same or not

func (*CuckooFilter) Export

func (cuckooFilter *CuckooFilter) Export() ([]byte, error)

Export JSON marshals the CuckooFilter and returns a byte slice containing the data

func (*CuckooFilter) Import

func (cuckooFilter *CuckooFilter) Import(data []byte) error

Import JSON unmarshals the _data_ into the CuckooFilter

func (*CuckooFilter) Insert

func (cuckooFilter *CuckooFilter) Insert(data []byte, destructive bool) bool

Insert writes the _data_ in the Cuckoo Filter for future lookup _destructive_ parameter is used to specify if the previous ordering of the present entries is to be preserved after the retries (if that case arises)

func (*CuckooFilter) Length

func (cuckooFilter *CuckooFilter) Length() uint64

Length returns the current length of the Cuckoo Filter or the current number of entries present in the Cuckoo Filter

func (*CuckooFilter) Lookup

func (cuckooFilter *CuckooFilter) Lookup(data []byte) bool

Lookup returns true if the _data_ is present in the Cuckoo Filter, else false

func (*CuckooFilter) ReadFrom

func (cuckooFilter *CuckooFilter) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads the CuckooFilter from the specified _stream_ and returns the number of bytes read. It can be used to read from disk (using a file stream) or from network.

func (*CuckooFilter) Remove

func (cuckooFilter *CuckooFilter) Remove(data []byte) bool

Remove deletes the _data_ from the Cuckoo Filter

func (*CuckooFilter) WriteTo

func (cuckooFilter *CuckooFilter) WriteTo(stream io.Writer) (int64, error)

WriteTo writes the CuckooFilter onto the specified _stream_ and returns the number of bytes written. It can be used to write to disk (using a file stream) or to network.

type CuckooFilterRedis

type CuckooFilterRedis struct {
	*AbstractCuckooFilter
	// contains filtered or unexported fields
}

CuckooFilterRedis is the Redis backed implementation of BaseCuckooFilter _buckets_ is a slice of BucketRedis _key_ holds the Redis key to the list which has the Redis keys of all buckets _metadataKey_ is used to store the additional information about CuckooFilterRedis for retrieving the filter by the Redis key

func NewCuckooFilterRedis

func NewCuckooFilterRedis(size, bucketSize, fingerPrintLength uint64) (*CuckooFilterRedis, error)

NewCuckooFilter creates a new CuckooFilterRedis _size_ is the size of the BucketRedis slice _bucketSize_ is the size of the individual buckets inside the bucket slice _fingerPrintLength_ is fingerprint hash of the input to be inserted/removed/lookup

func NewCuckooFilterRedisFromKey

func NewCuckooFilterRedisFromKey(metadataKey string) (*CuckooFilterRedis, error)

NewCuckooFilterRedisFromKey is used to create a new Redis backed Cuckoo Filter from the _metadataKey_ (the Redis key used to store the metadata about the cuckoo filter) passed For this to work, value should be present in Redis at _key_

func NewCuckooFilterRedisWithErrorRate

func NewCuckooFilterRedisWithErrorRate(size, bucketSize, retries uint64, errorRate float64) (*CuckooFilterRedis, error)

NewCuckooFilterWithErrorRate creates an CuckooFilterRedis with a specified false positive rate : _errorRate_ _size_ is the size of the BucketRedis slice _bucketSize_ is the size of the individual buckets inside the bucket slice _retries_ is the number of retries that the Cuckoo filter makes if the first two indices obtained _errorRate_ is the desired false positive rate of the filter. fingerPrintLength is calculated according to this error rate.

func NewCuckooFilterRedisWithRetries

func NewCuckooFilterRedisWithRetries(size, bucketSize, fingerPrintLength, retries uint64) (*CuckooFilterRedis, error)

NewCuckooFilterWithRetries creates new CuckooFilterRedis with specified _retries_ _size_ is the size of the BucketRedis slice _bucketSize_ is the size of the individual buckets inside the bucket slice _fingerPrintLength_ is fingerprint hash of the input to be inserted/removed/lookup _retries_ is the number of retries that the Cuckoo filter makes if the first two indices obtained after hashing the input is already occupied in the filter

func (CuckooFilterRedis) Equals

func (aFilter CuckooFilterRedis) Equals(bFilter CuckooFilterRedis) (bool, error)

func (*CuckooFilterRedis) Export

func (filter *CuckooFilterRedis) Export() ([]byte, error)

Export JSON marshals the CuckooFilterRedis and returns a byte slice containing the data

func (*CuckooFilterRedis) Import

func (filter *CuckooFilterRedis) Import(data []byte, withNewRedisKey bool) error

Import JSON unmarshals the _data_ into the CuckooFilterRedis

func (*CuckooFilterRedis) Insert

func (cuckooFilter *CuckooFilterRedis) Insert(data []byte, destructive bool) bool

Insert writes the _data_ in the Cuckoo Filter for future lookup _destructive_ parameter is used to specify if the previous ordering of the present entries is to be preserved after the retries (if that case arises)

func (CuckooFilterRedis) Key

func (cuckooFilter CuckooFilterRedis) Key() string

Key returns the value of the _key_ to the Redis list where all bucket keys are stored

func (*CuckooFilterRedis) Length

func (cuckooFilter *CuckooFilterRedis) Length() uint64

Length returns the current length of the Cuckoo Filter or the current number of entries present in the Cuckoo Filter. In CuckooFilterRedis, length is tracked using a key-value pair in Redis and modified using INCRBY Redis command

func (*CuckooFilterRedis) Lookup

func (cuckooFilter *CuckooFilterRedis) Lookup(data []byte) (bool, error)

Lookup returns true if the _data_ is present in the Cuckoo Filter, else false

func (CuckooFilterRedis) MetadataKey

func (cuckooFilter CuckooFilterRedis) MetadataKey() string

MetadataKey return _metadataKey_

func (*CuckooFilterRedis) Remove

func (cuckooFilter *CuckooFilterRedis) Remove(data []byte) (bool, error)

Remove deletes the _data_ from the Cuckoo Filter

type HyperLogLog

type HyperLogLog struct {
	AbstractHyperLogLog
	// contains filtered or unexported fields
}

HyperLogLog struct. This is an in-memory implementation of HyperLogLog. It's mainly governed by a 1-d slice _registers_ which holds the count of hashed items at different hashed locations _numRegisters_ is used to specify the size of the _registers_ slice _lock_ is used to synchronize concurrent read/writes

func NewHyperLogLog

func NewHyperLogLog(numRegisters uint64) (*HyperLogLog, error)

NewHyperLogLog creates new HyperLogLog with the specified _numRegisters_

func (*HyperLogLog) Count

func (h *HyperLogLog) Count(withCorrection, withRoundingOff bool) uint64

Count returns the number of distinct elements so far _withCorrection_ is used to specify if correction is to be done for large registers _withRoundingOff_ is used to specify if rounding off is required for estimation

func (*HyperLogLog) Equals

func (h *HyperLogLog) Equals(g *HyperLogLog) bool

Equals checks if two Hyperloglog data structures are equal

func (*HyperLogLog) Export

func (h *HyperLogLog) Export() ([]byte, error)

Export JSON marshals the HyperLogLog and returns a byte slice containing the data

func (*HyperLogLog) Import

func (h *HyperLogLog) Import(data []byte) error

Import JSON unmarshals the _data_ into the HyperLogLog

func (*HyperLogLog) Merge

func (h *HyperLogLog) Merge(g *HyperLogLog) error

Merge merges two Hyperloglog data structures

func (*HyperLogLog) ReadFrom

func (h *HyperLogLog) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads the BloomFilter from the specified _stream_ and returns the number of bytes read. It can be used to read from disk (using a file stream) or from network.

func (*HyperLogLog) Reset

func (h *HyperLogLog) Reset()

Reset sets all values in the _registers_ slice to zero

func (*HyperLogLog) Update

func (h *HyperLogLog) Update(data []byte)

Update sets the count of the passed _data_ (byte slice) to the hashed location in the _registers_ slice

func (*HyperLogLog) WriteTo

func (h *HyperLogLog) WriteTo(stream io.Writer) (int64, error)

WriteTo writes the HyperLogLog onto the specified _stream_ and returns the number of bytes written. It can be used to write to disk (using a file stream) or to network.

type HyperLogLogRedis

type HyperLogLogRedis struct {
	AbstractHyperLogLog
	// contains filtered or unexported fields
}

HyperLogLogRedis is the Redis backed implementation of BaseHyperLogLog _key_ holds the Redis key to the list which has the registers _metadataKey_ is used to store the additional information about HyperLogLogRedis for retrieving the sketch by the Redis key

func NewHyperLogLogRedis

func NewHyperLogLogRedis(numRegisters uint64) (*HyperLogLogRedis, error)

NewHyperLogLogRedis creates new HyperLogLogRedis with the specified _numRegisters_

func NewHyperLogLogRedisFromKey

func NewHyperLogLogRedisFromKey(metadataKey string) (*HyperLogLogRedis, error)

NewHyperLogLogRedisFromKey is used to create a new Redis backed HyperLogLogRedis from the _metadataKey_ (the Redis key used to store the metadata about the hyperloglog) passed. For this to work, value should be present in Redis at _key_

func (*HyperLogLogRedis) Count

func (h *HyperLogLogRedis) Count(withCorrection bool, withRoundingOff bool) (uint64, error)

Count returns the number of distinct elements so far _withCorrection_ is used to specify if correction is to be done for large registers _withRoundingOff_ is used to specify if rounding off is required for estimation

func (*HyperLogLogRedis) Equals

func (h *HyperLogLogRedis) Equals(g *HyperLogLogRedis) (bool, error)

Equals checks if two HyperLogLogRedis data structures are equal

func (*HyperLogLogRedis) Export

func (h *HyperLogLogRedis) Export() ([]byte, error)

Export JSON marshals the HyperLogLogRedis and returns a byte slice containing the data

func (*HyperLogLogRedis) Import

func (h *HyperLogLogRedis) Import(data []byte, withNewKey bool) error

Import JSON unmarshals the _data_ into the HyperLogLogRedis

func (*HyperLogLogRedis) Merge

Merge merges two HyperLogLogRedis data structures

func (*HyperLogLogRedis) MetadataKey

func (h *HyperLogLogRedis) MetadataKey() string

MetadataKey returns the metadataKey

func (*HyperLogLogRedis) Update

func (h *HyperLogLogRedis) Update(data []byte) error

Update sets the count of the passed _data_ (byte slice) to the hashed location in the Redis list at _key_

type IBitSet

type IBitSet interface {
	// contains filtered or unexported methods
}

type RedisConnOptions

type RedisConnOptions struct {
	DB                int
	Network           string
	Address           string
	Username          string
	Password          string
	ConnectionTimeout time.Duration
	ReadTimeout       time.Duration
	WriteTimeout      time.Duration
	PoolSize          int
	TLSConfig         *tls.Config
}

func ParseRedisURI

func ParseRedisURI(uri string) (*RedisConnOptions, error)

type TopK

type TopK struct {
	// contains filtered or unexported fields
}

In-memory TopK struct. _k_ is the number of top elements to track _errorRate_ is the acceptable error rate in topk estimation _accuracy_ is the delta in the error rate _sketch_ is the in-memory count-min sketch used to keep the estimated track of counts _heap_ is a min heap

func NewTopK

func NewTopK(k uint, errorRate, accuracy float64) *TopK

NewTopK creates new TopK _k_ is the number of top elements to track _errorRate_ is the acceptable error rate in topk estimation _accuracy_ is the delta in the error rate

func (*TopK) Equals

func (t *TopK) Equals(u *TopK) (bool, error)

Equals checks if two TopK structures are equal

func (*TopK) Export

func (t *TopK) Export() ([]byte, error)

Export JSON marshals the TopK and returns a byte slice containing the data

func (*TopK) Import

func (t *TopK) Import(data []byte) error

Import JSON unmarshals the _data_ into the TopK

func (*TopK) Insert

func (t *TopK) Insert(data []byte, count uint64)

Insert puts the _data_ (byte slice) in the TopK data structure with _count_ _data_ is the element to be inserted _count_ is the count of the element

func (*TopK) ReadFrom

func (t *TopK) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads the TopK from the specified _stream_ and returns the number of bytes read. It can be used to read from disk (using a file stream) or from network.

func (*TopK) Values

func (t *TopK) Values() []TopKElement

Values returns the top _k_ elements in the TopK data structure

func (*TopK) WriteTo

func (t *TopK) WriteTo(stream io.Writer) (int64, error)

WriteTo writes the TopK onto the specified _stream_ and returns the number of bytes written. It can be used to write to disk (using a file stream) or to network.

type TopKElement

type TopKElement struct {
	// contains filtered or unexported fields
}

TopKElement is the struct used to return the results of the TopK

type TopKRedis

type TopKRedis struct {
	// contains filtered or unexported fields
}

In-memory TopKRedis struct. _k_ is the number of top elements to track _errorRate_ is the acceptable error rate in topk estimation _accuracy_ is the delta in the error rate _sketch_ is the redis backed count-min sketch used to keep the estimated track of counts _heapKey_ is a key to Redis sorted set _metadataKey_ is used to store the additional information about TopKRedis

func NewTopKRedis

func NewTopKRedis(k uint, errorRate, accuracy float64) *TopKRedis

NewTopKRedis creates new TopKRedis _k_ is the number of top elements to track _errorRate_ is the acceptable error rate in topk estimation _accuracy_ is the delta in the error rate

func NewTopKRedisFromKey

func NewTopKRedisFromKey(metadataKey string) *TopKRedis

NewTopKRedisFromKey is used to create a new Redis backed TopKRedis from the _metadataKey_ (the Redis key used to store the metadata about the TopK) passed. For this to work, value should be present in Redis at _heapKey_

func (*TopKRedis) Equals

func (t *TopKRedis) Equals(u *TopKRedis) (bool, error)

Equals checks if two TopKRedis structures are equal

func (*TopKRedis) Export

func (t *TopKRedis) Export() ([]byte, error)

Export JSON marshals the TopKRedis and returns a byte slice containing the data

func (*TopKRedis) Import

func (t *TopKRedis) Import(data []byte, withNewKey bool) error

Import JSON unmarshals the _data_ into the TopKRedis

func (*TopKRedis) Insert

func (t *TopKRedis) Insert(data []byte, count uint64) error

Insert puts the _data_ (byte slice) in the TopKRedis data structure with _count_ _data_ is the element to be inserted _count_ is the count of the element

func (*TopKRedis) MetadataKey

func (t *TopKRedis) MetadataKey() string

MetadataKey returns the metadataKey

func (*TopKRedis) Values

func (t *TopKRedis) Values() ([]TopKElement, error)

Values returns the top _k_ elements in the TopKRedis data structure

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL