disk_bloom

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 27, 2021 License: Apache-2.0 Imports: 9 Imported by: 6

README

disk-bloom-filter

Disk-based bloom filter. It operates the bloom filter on the disk for long-term anti-replay protection.

func doubleFNVFactory(salt []byte) func (b []byte) (uint64, uint64) {
	return func(b []byte) (uint64, uint64) {
		hx := fnv.New64()
		hx.Write(b)
		hx.Write(salt)
		x := hx.Sum64()
		hy := fnv.New64a()
		hy.Write(b)
		hy.Write(salt)
		y := hy.Sum64()
		return x, y
	}
}

const (
    n         = 1e8
    expectFPR = 1e-6
)

bf, err := disk_bloom.NewGroup("testfile*", disk_bloom.FsyncModeEverySec, n, expectFPR, doubleFNVFactory([]byte("some_salt")))
if err != nil {
    panic(err)
}
bf.ExistOrAdd([]byte("hello"))
bf.Exist([]byte("hello"))
bf.ExistOrAdd([]byte("world"))

Note that it is not recommended to use doubleFNV directly, please be sure to add user-personalized salt to prevent active detection attacks based on hash collisions.

Benchmark

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
disk: LENSE20256GMSP34MEAT2TA NVME

pkg: github.com/mzz2017/disk-bloom
BenchmarkFilterGroup_ExistOrAdd
BenchmarkFilterGroup_ExistOrAdd-8   	   32413	     33660 ns/op	     521 B/op	       9 allocs/op
BenchmarkFilterGroup_Exist
BenchmarkFilterGroup_Exist-8        	  631995	      1871 ns/op	     304 B/op	       7 allocs/op
BenchmarkDiskFilter_ExistOrAdd
BenchmarkDiskFilter_ExistOrAdd-8    	   34406	     34097 ns/op	     520 B/op	       9 allocs/op
BenchmarkDiskFilter_Exist
BenchmarkDiskFilter_Exist-8         	  626137	      1857 ns/op	     304 B/op	       7 allocs/op

pkg: github.com/riobard/go-bloom
BenchmarkAdd
BenchmarkAdd-8    	 5635690	       186.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkTest
BenchmarkTest-8   	15413245	        74.68 ns/op	       0 B/op	       0 allocs/op

Thanks

Documentation

Index

Constants

View Source
const LenOfMetadataSize = 2

Variables

View Source
var InconsistentMetadataSizeErr = fmt.Errorf("inconsistent metadata size")
View Source
var (
	InvalidPatternErr = fmt.Errorf("invalid pattern")
)

Functions

func OptimalParam

func OptimalParam(n uint64, p float64) (slots uint8, bits uint64)

n is the expected number of entries. p is the expected false positive rate.

Types

type Controller

type Controller struct {
	Fsync FsyncMode
	// Size in bytes
	MetadataSize uint16
	// Control will be invoked every second.
	//
	// | len of metadata size(2 bytes) | metadata | bloom filter |
	Control func(f *os.File, modified bool)
	// GetParam will be invoked when New.
	//
	// | len of metadata size(2 bytes) | metadata | bloom filter |
	GetParam func(metadata []byte) (param FilterParam, updatedMetadata []byte)
}

type DiskFilter

type DiskFilter struct {
	// contains filtered or unexported fields
}

Disk-based Classic Bloom Filter

func New

func New(filename string, controller Controller) (*DiskFilter, error)

New creates a classic Bloom Filter. h is a double hash that takes an entry and returns two different hashes.

func (*DiskFilter) Close

func (f *DiskFilter) Close() error

Close should be invoked if the filter is not needed anymore

func (*DiskFilter) Controller

func (f *DiskFilter) Controller() Controller

func (*DiskFilter) Exist

func (f *DiskFilter) Exist(b []byte) bool

Exist returns if an entry is in the filter

func (*DiskFilter) ExistOrAdd

func (f *DiskFilter) ExistOrAdd(b []byte) (exist bool)

ExistOrAdd returns whether the entry was in the filter, and adds an entry to the filter if it was not in.

func (*DiskFilter) FilterParam

func (f *DiskFilter) FilterParam() FilterParam

func (*DiskFilter) Size

func (f *DiskFilter) Size() uint64

Size returns the size of the filter in bytes

type FilterGroup

type FilterGroup struct {
	// contains filtered or unexported fields
}

FilterGroup manages a group of DiskFilter. To make sure each filter meets the best performance, it will add new filters to the group if necessary.

func NewGroup

func NewGroup(pattern string, fsync FsyncMode, n uint64, p float64, hash func([]byte) (uint64, uint64)) (*FilterGroup, error)

NewGroup returns a FilterGroup, each filter is a file. The filenames are generated by taking pattern and adding a index to the end. the Pattern should includes a "*", and the index replaces the last "*". n is the expected number of entries in single file. p is the expected false positive rate.

func (*FilterGroup) Exist

func (g *FilterGroup) Exist(b []byte) (exist bool)

Exist returns if an entry is in the filterGroup

func (*FilterGroup) ExistOrAdd

func (g *FilterGroup) ExistOrAdd(b []byte) (exist bool)

ExistOrAdd returns whether the entry was in the filter, and adds an entry to the filterGroup if it was not in. TODO: batch?

type FilterParam

type FilterParam struct {
	Slots uint8
	Bits  uint64
	Hash  func([]byte) (uint64, uint64)
}

type FsyncMode

type FsyncMode int
const (
	FsyncModeAlways FsyncMode = iota
	FsyncModeEverySec
	FsyncModeNo
)

type Metadata

type Metadata struct {
	Added    uint64
	Expected uint64
	Slots    uint8
	Bits     uint64
}

|added entries(8)|expected max entries(8)|slots(1)|bits(8)|reserved(39)|

func (Metadata) Encode

func (m Metadata) Encode() []byte

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL