sprout

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2022 License: MIT Imports: 11 Imported by: 0

README

Sprout

A bloom filter is a probabilistic data structure that is used to determine if an element is present in a set. Bloom filters are fast and space efficient. They allow for false positives, but mitigate the probability with an expected false positive rate. An error rate of 0.001 implies that the probability of a false positive is 1 in 1000. Bloom filters don't store the elements themselves, but instead use a set of hash functions to determine the presence of an element.

To fulfil the false positive rate, bloom filters can be initialized with a capacity. The capacity is the number of elements that can be inserted into the bloom filter, and this cannot be changed.

Sprout implements a bloom filter in Go. The bits of the filter are stored in a memory-mapped file. Sprout also allows attaching a persistent storage (boltdb and badgerdb) to store the key value pairs.

Sprout also implement a scalable bloom filter described in a paper written by P. Almeida, C.Baquero, N. Preguiça, D. Hutchison.

A scalable bloom filter allows you to grow the filter beyond the initial filter capacity, while preserving the desired false positive rate.

Storage Size

Bloom filters are space efficient, as they only store the bits returned from the hash function. For a filter with a capacity of 2,000,000 and a error rate of 0.001, the storage size is approximately 3.4MB. That implies that there are approximately 1.8 bytes (~14 bits) per element. The number of bits per element is as a result of the number of hash functions, which is derived from the capacity and the error rate.

Scalable Bloom Filters: A scalable bloom filter initialized with a capacity of 200,000 and an error rate of 0.001, when grown to a capacity of 2,000,000, the total storage size is approximately 4.3MB.

Comparison to Key-Value stores

Adding 2 million elements (with a single byte value)

Database Size
BoltDB 108MB
BadgerDB 128MB
Sprout 3.4MB
Sprout (Sbf) 4.3MB

Installation

go get github.com/dsa0x/sprout

Usage

Sprout contains implementation of both the normal and scalable bloom filter via the methods below:

opts := &sprout.BloomOptions{
		Err_rate: 0.001,
		Capacity: 100000,
	}
// Normal Bloom Filter
bf := sprout.NewBloom(opts)

// Scalable Bloom Filter
sbf := sprout.NewScalableBloom(opts)

With a persistent store

Sprout supports boltdb and badgerdb as persistent storage. Using them is very simple. Sprout exposes methods that initializes the database and then they can be attached to the bloom filter.

Using Boltdb

// initialize boltdb
db := sprout.NewBolt("/tmp/test.db", 0600)

// the bolt store can be configured as defined in the boltdb documentations
opts := bbolt.Options{
		Timeout: 10 * time.Second,
	}
db = sprout.NewBolt("/tmp/test.db", 0600, opts)
defer db.Close()

opts := &sprout.BloomOptions{
		Err_rate: 0.01,
		Path:     "bloom.db",
		Capacity: 100,
	}
bf := sprout.NewBloom(opts)

Example

package main

import (
	"fmt"

	"github.com/dgraph-io/badger/v3"
	"github.com/dsa0x/sprout"
)

func main() {

	opts := &sprout.BloomOptions{
		Err_rate: 0.001,
		Path:     "bloom.db",
		Capacity: 100000,
	}
	bf := sprout.NewBloom(opts)
	defer bf.Close()
	bf.Add([]byte("foo"))
	fmt.Println(bf.Contains([]byte("foo")))

	// with a persistent store
	badgerOpts := badger.DefaultOptions("/tmp/store.db")
	db := sprout.NewBadger(badgerOpts)
	opts.Database = db
	bf := sprout.NewScalableBloom(opts)

	bf.Put([]byte("key"), []byte("bar"))
	fmt.Printf("%s\n", bf.Get([]byte("key")))
}

References

  1. P. Almeida, C.Baquero, N. Preguiça, D. Hutchison
  2. Austin Appleby Murmur hash Source Code

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultBloomOptions = BloomOptions{
	Path:       "bloom.db",
	Err_rate:   0.001,
	Capacity:   10000,
	GrowthRate: 2,
	Database:   nil,
}
View Source
var ErrKeyNotFound = fmt.Errorf("Key not found")

Functions

This section is empty.

Types

type BadgerStore

type BadgerStore struct {
	// contains filtered or unexported fields
}

func NewBadger

func NewBadger(opts ...badger.Options) *BadgerStore

NewBadger instantiates a new BadgerStore.

func (*BadgerStore) Close

func (store *BadgerStore) Close() error

func (*BadgerStore) DB

func (store *BadgerStore) DB() interface{}

func (*BadgerStore) Get

func (store *BadgerStore) Get(key []byte) ([]byte, error)

func (*BadgerStore) Put

func (store *BadgerStore) Put(key, value []byte) error

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

func NewBloom

func NewBloom(opts *BloomOptions) *BloomFilter

NewBloom creates a new bloom filter. err_rate is the desired false error rate. e.g. 0.001 implies 1 false positive in 1000 lookups

capacity is the number of entries intended to be added to the filter

database is the persistent store to attach to the filter. can be nil.

func (*BloomFilter) Add

func (bf *BloomFilter) Add(key []byte) error

Add adds the key to the bloom filter

func (*BloomFilter) Capacity

func (bf *BloomFilter) Capacity() int

Capacity returns the total capacity of the scalable bloom filter

func (*BloomFilter) Clear

func (bf *BloomFilter) Clear()

Clear resets all bits in the bloom filter

func (*BloomFilter) Close

func (bf *BloomFilter) Close() error

Close flushes the file to disk and closes the file handle to the filter

func (*BloomFilter) Contains

func (bf *BloomFilter) Contains(key []byte) bool

Contains checks if the key exists in the bloom filter

func (*BloomFilter) Count

func (bf *BloomFilter) Count() int

Count returns the number of items added to the bloom filter

func (*BloomFilter) DB

func (bf *BloomFilter) DB() interface{}

DB returns the underlying persistent store

func (*BloomFilter) FilterSize

func (bf *BloomFilter) FilterSize() int

FilterSize returns the size of the bloom filter

func (*BloomFilter) Get

func (bf *BloomFilter) Get(key []byte) []byte

Get gets the key from the underlying persistent store

func (*BloomFilter) Merge

func (bf *BloomFilter) Merge(bf2 *BloomFilter) error

Merge merges the filter with another bloom filter. Both filters must have the same capacity and error rate. merging increases the false positive rate of the resulting filter

func (*BloomFilter) Put

func (bf *BloomFilter) Put(key, val []byte) error

Put adds the key to the bloom filter, and also stores it in the persistent store

func (*BloomFilter) Stats

func (bf *BloomFilter) Stats() BloomFilterStats

Stats returns the stats of the bloom filter

type BloomFilter2

type BloomFilter2 struct {
	// contains filtered or unexported fields
}

func NewBloom2

func NewBloom2(opts *BloomOptions) *BloomFilter2

NewBloom2 creates a new bloom filter in-memory err_rate is the desired false positive rate. e.g. 0.1 error rate implies 1 in 1000

capacity is the number of entries intended to be added to the filter

database is the persistent store to attach to the filter. can be nil.

func (*BloomFilter2) Add

func (bf *BloomFilter2) Add(key, val []byte)

Add adds the key to the bloom filter

func (*BloomFilter2) Capacity

func (bf *BloomFilter2) Capacity() int

Capacity returns the total capacity of the scalable bloom filter

func (*BloomFilter2) Close

func (bf *BloomFilter2) Close() error

Close closes the file handle to the filter and the persistent store (if any)

func (*BloomFilter2) Contains

func (bf *BloomFilter2) Contains(key []byte) bool

Find checks if the key exists in the bloom filter

func (*BloomFilter2) Count

func (bf *BloomFilter2) Count() int

Count returns the number of items added to the bloom filter

func (*BloomFilter2) FilterSize

func (bf *BloomFilter2) FilterSize() int

FilterSize returns the size of the bloom filter

func (*BloomFilter2) Get

func (bf *BloomFilter2) Get(key []byte) []byte

Get Gets the key from the underlying persistent store

type BloomFilterStats

type BloomFilterStats struct {
	Capacity int
	Count    int
	Size     int
	M        int
	K        int

	// Prob is the error probability of the filter
	Prob float64
}

type BloomOptions

type BloomOptions struct {

	// path to the filter
	Path string

	// The desired false positive rate
	Err_rate float64

	// the number of items intended to be added to the bloom filter (n)
	Capacity int

	// persistent storage
	Database Store

	// growth rate of the bloom filter (valid values are 2 and 4)
	GrowthRate GrowthRate
	// contains filtered or unexported fields
}

BloomOptions is the options for creating a new bloom filter

type BoltStore

type BoltStore struct {
	// contains filtered or unexported fields
}

func NewBolt

func NewBolt(filePath string, filemode os.FileMode, opts ...bolt.Options) *BoltStore

NewBolt instantiates a new BoltStore.

func (*BoltStore) Close

func (store *BoltStore) Close() error

func (*BoltStore) DB

func (store *BoltStore) DB() interface{}

func (*BoltStore) Get

func (store *BoltStore) Get(key []byte) ([]byte, error)

func (*BoltStore) Put

func (store *BoltStore) Put(key []byte, value []byte) error

type GrowthRate

type GrowthRate uint
var (
	// GrowthSmall represents a small expected set growth
	GrowthSmall GrowthRate = 2
	// GrowthLarge represents a large expected set growth
	GrowthLarge GrowthRate = 4
)

type ScalableBloomFilter

type ScalableBloomFilter struct {
	// contains filtered or unexported fields
}

func NewScalableBloom

func NewScalableBloom(opts *BloomOptions) *ScalableBloomFilter

NewScalableBloom creates a new scalable bloom filter. err_rate is the desired false error rate. e.g. 0.001 implies 1 false positive in 1000 lookups initial_capacity is the initial capacity of the bloom filter. When the number of items exceed the initial capacity, a new filter is created.

The growth rate defaults to 2.

func (*ScalableBloomFilter) Add

func (sbf *ScalableBloomFilter) Add(key []byte)

Add adds a key to the scalable bloom filter Complexity: O(k)

func (*ScalableBloomFilter) Capacity

func (sbf *ScalableBloomFilter) Capacity() int

Size returns the total capacity of the scalable bloom filter

func (*ScalableBloomFilter) Clear

func (sbf *ScalableBloomFilter) Clear()

Clear resets all bits in the bloom filter

func (*ScalableBloomFilter) Close

func (sbf *ScalableBloomFilter) Close() error

Close closes the scalable bloom filter

func (*ScalableBloomFilter) Contains

func (sbf *ScalableBloomFilter) Contains(key []byte) bool

Contains checks if the key is in the bloom filter Complexity: O(k*n)

func (*ScalableBloomFilter) Count

func (sbf *ScalableBloomFilter) Count() int

Count returns the number of items added to the bloom filter

func (*ScalableBloomFilter) DB

func (sbf *ScalableBloomFilter) DB() Store

DB returns the store used by the scalable bloom filter

func (*ScalableBloomFilter) Get

func (sbf *ScalableBloomFilter) Get(key []byte) []byte

Get returns the value associated with the key

func (*ScalableBloomFilter) Put

func (sbf *ScalableBloomFilter) Put(key, val []byte) error

Put adds a key to the scalable bloom filter, and puts the value in the database

func (*ScalableBloomFilter) Stats

Stats returns the stats of the bloom filter

func (*ScalableBloomFilter) Top

func (sbf *ScalableBloomFilter) Top() *BloomFilter

Top returns the top filter in the scalable bloom filter

type Store

type Store interface {
	Close() error
	Get(key []byte) ([]byte, error)
	Put(key, value []byte) error

	DB() interface{}
	// contains filtered or unexported methods
}

Directories

Path Synopsis
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL