Go Rumorlog
Package rumorlog tries to implement a fast way to map hashed identifiers to sequence numbers.
These sequence numbers can then be used to build integer bitmaps.
These bitmaps can then be used for set operations.
The API can be seen on godoc.
Installation
go get go.mindeco.de/rumorlog
Simple usage examples can be seen on the trie implementation package.
Implementations
A very naive first draft for the idea is done as a SQLite implementation with this schema:
CREATE TABLE simple_index (
seq integer PRIMARY KEY,
key blob unique
);
CREATE INDEX key_tree ON simple_index (key);
The idea is to use a trie data strucutre together with an append-only log to optimize both mapping directions.
The offset log will impose an (arbitrary) sequence on the hashes and the trie is used to find the sequence number for a given hash.
For comparisson there also is a flat mkv but the GetKey
lookup is implemented with a naive search loop.
Benchmark
We insert a fixed size set of identifiers before running benchmarks of GetKey
and GetSequence
to see if the lookups are stable with growing set sizes.
$ go test ./tests -timeout 20m -bench=. -benchtime=3s -benchmem | tee tests/first-draft.bench
goos: linux
goarch: amd64
pkg: go.mindeco.de/rumorlog/tests
BenchmarkAll/add-trie+ofst-8 683858 5073 ns/op 1304 B/op 13 allocs/op
BenchmarkAll/fill/trie+ofst-20-8 4797 706998 ns/op 87296 B/op 1064 allocs/op
BenchmarkAll/fill/trie+ofst-100-8 1308 2729179 ns/op 327041 B/op 5055 allocs/op
BenchmarkAll/fill/trie+ofst-1000-8 141 25566906 ns/op 2909403 B/op 50053 allocs/op
BenchmarkAll/fill/trie+ofst-10000-8 13 267205672 ns/op 29858106 B/op 495028 allocs/op
BenchmarkAll/getSeq/trie+ofst-20-8 1658960 2071 ns/op 655 B/op 6 allocs/op
BenchmarkAll/getSeq/trie+ofst-100-8 1733563 2121 ns/op 655 B/op 6 allocs/op
BenchmarkAll/getSeq/trie+ofst-1000-8 1649378 2176 ns/op 655 B/op 6 allocs/op
BenchmarkAll/getSeq/trie+ofst-10000-8 1592440 2293 ns/op 655 B/op 6 allocs/op
BenchmarkAll/getKey/trie+ofst-20-8 417561 8318 ns/op 2911 B/op 17 allocs/op
BenchmarkAll/getKey/trie+ofst-100-8 441717 8705 ns/op 2911 B/op 17 allocs/op
BenchmarkAll/getKey/trie+ofst-1000-8 385377 8540 ns/op 2911 B/op 17 allocs/op
BenchmarkAll/getKey/trie+ofst-10000-8 421308 9080 ns/op 2911 B/op 17 allocs/op
BenchmarkAll/add-modernkv-8 16825 251873 ns/op 62567 B/op 258 allocs/op
BenchmarkAll/fill/modernkv-20-8 61 58563973 ns/op 116278 B/op 1382 allocs/op
BenchmarkAll/fill/modernkv-100-8 75 47505093 ns/op 717073 B/op 8895 allocs/op
BenchmarkAll/fill/modernkv-1000-8 21 156441060 ns/op 25172910 B/op 130513 allocs/op
BenchmarkAll/fill/modernkv-10000-8 2 1891127095 ns/op 278148252 B/op 1695248 allocs/op
BenchmarkAll/getSeq/modernkv-20-8 537592 6851 ns/op 3940 B/op 48 allocs/op
BenchmarkAll/getSeq/modernkv-100-8 356325 10230 ns/op 10701 B/op 65 allocs/op
BenchmarkAll/getSeq/modernkv-1000-8 111956 28992 ns/op 73344 B/op 96 allocs/op
BenchmarkAll/getSeq/modernkv-10000-8 82911 38719 ns/op 57426 B/op 123 allocs/op
BenchmarkAll/getKey/modernkv-20-8 431254 8504 ns/op 3978 B/op 80 allocs/op
BenchmarkAll/getKey/modernkv-100-8 136754 25516 ns/op 14417 B/op 321 allocs/op
BenchmarkAll/getKey/modernkv-1000-8 7161 536460 ns/op 138602 B/op 3338 allocs/op
BenchmarkAll/getKey/modernkv-10000-8 631 4929963 ns/op 935402 B/op 28195 allocs/op
BenchmarkAll/add-sqlite-8 793 4979380 ns/op 2296 B/op 45 allocs/op
BenchmarkAll/fill/sqlite-20-8 32 114943073 ns/op 63338 B/op 656 allocs/op
BenchmarkAll/fill/sqlite-100-8 7 524763195 ns/op 193181 B/op 2739 allocs/op
BenchmarkAll/fill/sqlite-1000-8 1 5066513896 ns/op 1623088 B/op 26122 allocs/op
BenchmarkAll/fill/sqlite-10000-8 1 50756266937 ns/op 16168112 B/op 260126 allocs/op
BenchmarkAll/getSeq/sqlite-20-8 83376 43069 ns/op 2063 B/op 50 allocs/op
BenchmarkAll/getSeq/sqlite-100-8 83034 45102 ns/op 2063 B/op 50 allocs/op
BenchmarkAll/getSeq/sqlite-1000-8 84038 43742 ns/op 2063 B/op 50 allocs/op
BenchmarkAll/getSeq/sqlite-10000-8 82750 44007 ns/op 2063 B/op 50 allocs/op
BenchmarkAll/getKey/sqlite-20-8 177464 21390 ns/op 1439 B/op 29 allocs/op
BenchmarkAll/getKey/sqlite-100-8 170108 20689 ns/op 1439 B/op 29 allocs/op
BenchmarkAll/getKey/sqlite-1000-8 171957 20939 ns/op 1439 B/op 29 allocs/op
BenchmarkAll/getKey/sqlite-10000-8 176138 21546 ns/op 1439 B/op 29 allocs/op
PASS
ok go.mindeco.de/rumorlog/tests 707.135s
We can see that while sqlite is stable, it's orders of magnitude slower compared to the specialzied approach of trie+ofset.
ps: These results were generated on an Intel Core i7-4770 CPU @ 3.40GHz with 16Gb RAM.
License
MIT