roaring64

package module
v0.0.0-...-b4a3812 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 9, 2021 License: Apache-2.0 Imports: 11 Imported by: 0

README

roaring Build Status Coverage Status GoDoc Go Report Card Build Status

This is a go version of the Roaring bitmap data structure, for 64-bit integers. It uses the same approach as the C++ and Rust version to build a BTree map of bitmaps.

Roaring bitmaps are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Apache Druid (Incubating), LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, Pilosa, Microsoft Visual Studio Team Services (VSTS), and eBay's Apache Kylin.

Roaring bitmaps are found to work well in many important applications:

Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)

The roaring Go library is used by

This library is used in production in several systems, it is part of the Awesome Go collection.

There are also Java and C/C++ versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps from a Java program and load them back in Go, and vice versa. We have a format specification.

This code is licensed under Apache License, Version 2.0 (ASL2.0).

Copyright 2016-... by the authors.

References
  • Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 arXiv:1709.07821
  • Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), 2016. http://arxiv.org/abs/1402.6407 This paper used data from http://lemire.me/data/realroaring2014.html
  • Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. http://arxiv.org/abs/1603.06549
Dependencies

Dependencies are fetched automatically by giving the -t flag to go get.

they include

  • github.com/willf/bitset
  • github.com/mschoch/smat
  • github.com/glycerine/go-unsnap-stream
  • github.com/philhofer/fwd
  • github.com/jtolds/gls

Note that the smat library requires Go 1.6 or better.

Installation
  • go get -t github.com/RoaringBitmap/roaring
Example

Here is a simplified but complete example:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring==")
    rb1 := roaring64.New(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring64.New(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring64.New()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)

    // computes union of the three bitmaps in parallel using 4 workers  
    roaring.ParOr(4, rb1, rb2, rb3)
    // computes intersection of the three bitmaps in parallel using 4 workers  
    roaring.ParAnd(4, rb1, rb2, rb3)


    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring.New()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
    	fmt.Println("I wrote the content to a byte stream and read it back.")
    }
    // you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}

If you wish to use serialization and handle errors, you might want to consider the following sample of code:

	rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
	buf := new(bytes.Buffer)
	size,err:=rb.WriteTo(buf)
	if err != nil {
		t.Errorf("Failed writing")
	}
	newrb:= New()
	size,err=newrb.ReadFrom(buf)
	if err != nil {
		t.Errorf("Failed reading")
	}
	if ! rb.Equals(newrb) {
		t.Errorf("Cannot retrieve serialized version")
	}

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call BoundSerializedSizeInBytes for a more precise estimate.

Documentation

Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring

Goroutine safety

In general, it should not generally be considered safe to access the same bitmaps using different goroutines--they are left unsynchronized for performance. Should you want to access a Bitmap from more than one goroutine, you should provide synchronization. Typically this is done by using channels to pass the *Bitmap around (in Go style; so there is only ever one owner), or by using sync.Mutex to serialize operations on Bitmaps.

Coverage

We test our software. For a report on our test coverage, see

https://coveralls.io/github/RoaringBitmap/roaring?branch=master

Benchmark

Type

     go test -bench Benchmark -run -

To run benchmarks on Real Roaring Datasets run the following:

go get github.com/RoaringBitmap/real-roaring-datasets
BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -
Iterative use

You can use roaring with gore:

  • go get -u github.com/motemen/gore
  • Make sure that $GOPATH/bin is in your $PATH.
  • go get github.com/RoaringBitmap/roaring
$ gore
gore version 0.2.6  :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"
Fuzzy testing

You can help us test further the library with fuzzy testing:

     go get github.com/dvyukov/go-fuzz/go-fuzz
     go get github.com/dvyukov/go-fuzz/go-fuzz-build
     go test -tags=gofuzz -run=TestGenerateSmatCorpus
     go-fuzz-build github.com/RoaringBitmap/roaring
     go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200

Let it run, and if the # of crashers is > 0, check out the reports in the workdir where you should be able to find the panic goroutine stack traces.

Alternative in Go

There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring

For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.

Mailing list/discussion group

https://groups.google.com/forum/#!forum/roaring-bitmaps

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BTreemap

type BTreemap struct {
	// contains filtered or unexported fields
}

func FastAnd

func FastAnd(bitmaps ...*BTreemap) *BTreemap

FastAnd computes the intersection between many bitmaps quickly Compared to the And function, it can take many bitmaps as input, thus saving the trouble of manually calling "And" many times.

func FastOr

func FastOr(bitmaps ...*BTreemap) *BTreemap

FastOr computes the union between many bitmaps quickly, as opposed to having to call Or repeatedly. It might also be faster than calling Or repeatedly.

func FastXor

func FastXor(bitmaps ...*BTreemap) *BTreemap

FastXor computes the symmetric difference between many bitmaps quickly, as opposed to having to call Or repeatedly. It might also be faster than calling Xor repeatedly.

func HeapOr

func HeapOr(bitmaps ...*BTreemap) *BTreemap

HeapOr computes the union between many bitmaps quickly using a heap. It might be faster than calling Or repeatedly.

func HeapXor

func HeapXor(bitmaps ...*BTreemap) *BTreemap

HeapXor computes the symmetric difference between many bitmaps quickly (as opposed to calling Xor repeated). Internally, this function uses a heap. It might be faster than calling Xor repeatedly.

func New

func New(values ...uint64) *BTreemap

func (*BTreemap) Add

func (tm *BTreemap) Add(value uint64)

func (*BTreemap) AddInt

func (tm *BTreemap) AddInt(value int)

func (*BTreemap) AddMany

func (tm *BTreemap) AddMany(values []uint64)

func (*BTreemap) AddRange

func (tm *BTreemap) AddRange(rangeStart, rangeEnd uint64)

func (*BTreemap) And

func (tm *BTreemap) And(other *BTreemap)

func (*BTreemap) AndCardinality

func (tm *BTreemap) AndCardinality(other *BTreemap) uint64

func (*BTreemap) AndNot

func (tm *BTreemap) AndNot(other *BTreemap)

func (*BTreemap) CheckedAdd

func (tm *BTreemap) CheckedAdd(value uint64) bool

func (*BTreemap) CheckedRemove

func (tm *BTreemap) CheckedRemove(value uint64) bool

func (*BTreemap) Clear

func (tm *BTreemap) Clear()

func (*BTreemap) Clone

func (tm *BTreemap) Clone() *BTreemap

func (*BTreemap) Contains

func (tm *BTreemap) Contains(value uint64) bool

func (*BTreemap) ContainsInt

func (tm *BTreemap) ContainsInt(x int) bool

func (*BTreemap) Equals

func (tm *BTreemap) Equals(o interface{}) bool

func (*BTreemap) Flip

func (tm *BTreemap) Flip(rangeStart, rangeEnd uint64)

func (*BTreemap) FlipInt

func (tm *BTreemap) FlipInt(rangeStart, rangeEnd int)

func (*BTreemap) FromBase64

func (tm *BTreemap) FromBase64(str string) (int64, error)

func (*BTreemap) FromBuffer

func (tm *BTreemap) FromBuffer(buf []byte) (p int64, err error)

func (*BTreemap) GetCardinality

func (tm *BTreemap) GetCardinality() uint64

func (*BTreemap) GetSerializedSizeInBytes

func (tm *BTreemap) GetSerializedSizeInBytes() uint64

func (*BTreemap) GetSizeInBytes

func (tm *BTreemap) GetSizeInBytes() uint64

func (*BTreemap) Intersects

func (tm *BTreemap) Intersects(other *BTreemap) bool

func (*BTreemap) IsEmpty

func (tm *BTreemap) IsEmpty() bool

func (*BTreemap) Iterate

func (tm *BTreemap) Iterate(cb func(x uint64) bool)

func (*BTreemap) Iterator

func (tm *BTreemap) Iterator() IntPeekable

func (*BTreemap) ManyIterator

func (tm *BTreemap) ManyIterator() ManyIntIterable

func (*BTreemap) MarshalBinary

func (tm *BTreemap) MarshalBinary() ([]byte, error)

func (*BTreemap) Maximum

func (tm *BTreemap) Maximum() uint64

func (*BTreemap) Minimum

func (tm *BTreemap) Minimum() uint64

func (*BTreemap) Or

func (tm *BTreemap) Or(other *BTreemap)

func (*BTreemap) OrCardinality

func (tm *BTreemap) OrCardinality(other *BTreemap) uint64

func (*BTreemap) Rank

func (tm *BTreemap) Rank(value uint64) uint64

func (*BTreemap) ReadFrom

func (tm *BTreemap) ReadFrom(reader io.Reader) (p int64, err error)

func (*BTreemap) Remove

func (tm *BTreemap) Remove(value uint64)

func (*BTreemap) RemoveRange

func (tm *BTreemap) RemoveRange(rangeStart, rangeEnd uint64)

func (*BTreemap) ReverseIterator

func (tm *BTreemap) ReverseIterator() IntIterable

func (*BTreemap) RunOptimize

func (tm *BTreemap) RunOptimize()

func (*BTreemap) Select

func (tm *BTreemap) Select(value uint64) (uint64, error)

func (*BTreemap) Stats

func (tm *BTreemap) Stats() (stats roaring.Statistics)

func (*BTreemap) String

func (tm *BTreemap) String() string

func (*BTreemap) ToArray

func (tm *BTreemap) ToArray() []uint64

func (*BTreemap) ToBase64

func (tm *BTreemap) ToBase64() (string, error)

func (*BTreemap) ToBytes

func (tm *BTreemap) ToBytes() ([]byte, error)

func (*BTreemap) UnmarshalBinary

func (tm *BTreemap) UnmarshalBinary(data []byte) error

func (*BTreemap) WithCppSerializer

func (tm *BTreemap) WithCppSerializer() *BTreemap

serializer that is compatible with C++ version found in CRoaring at https://github.com/RoaringBitmap/CRoaring/blob/master/cpp/roaring64map.hh

func (*BTreemap) WithJvmSerializer

func (tm *BTreemap) WithJvmSerializer() *BTreemap

serializer that is compatible with JVM version of Treemap found in RoaringBitmap Java implementation at: https://github.com/RoaringBitmap/RoaringBitmap/blob/master/roaringbitmap/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java

func (*BTreemap) WriteTo

func (tm *BTreemap) WriteTo(stream io.Writer) (int64, error)

func (*BTreemap) Xor

func (tm *BTreemap) Xor(other *BTreemap)

type Bitmap64

type Bitmap64 interface {
	ToBase64() (string, error)
	FromBase64(str string) (int64, error)
	WriteTo(stream io.Writer) (int64, error)
	ToBytes() ([]byte, error)
	ReadFrom(reader io.Reader) (p int64, err error)
	FromBuffer(buf []byte) (p int64, err error)
	RunOptimize()
	MarshalBinary() ([]byte, error)
	UnmarshalBinary(data []byte) error
	Clear()
	ToArray() []uint64
	GetSizeInBytes() uint64
	GetSerializedSizeInBytes() uint64
	String() string
	Iterate(cb func(x uint64) bool)
	Iterator() IntPeekable
	ReverseIterator() IntIterable
	ManyIterator() ManyIntIterable
	Clone() *BTreemap
	Minimum() uint64
	Maximum() uint64
	Contains(x uint64) bool
	ContainsInt(x int) bool
	Equals(o interface{}) bool
	Add(x uint64)
	CheckedAdd(x uint64) bool
	AddInt(x int)
	Remove(x uint64)
	CheckedRemove(x uint64) bool
	IsEmpty() bool
	GetCardinality() uint64
	And(other *BTreemap)
	OrCardinality(other *BTreemap) uint64
	AndCardinality(other *BTreemap) uint64
	Intersects(other *BTreemap) bool
	Xor(other *BTreemap)
	Or(other *BTreemap)
	AndNot(other *BTreemap)
	AddMany(dat []uint64)
	Rank(x uint64) uint64
	Select(x uint64) (uint64, error)
	Flip(rangeStart, rangeEnd uint64)
	FlipInt(rangeStart, rangeEnd int)
	AddRange(rangeStart, rangeEnd uint64)
	RemoveRange(rangeStart, rangeEnd uint64)
	Stats() roaring.Statistics
}

type IntIterable

type IntIterable interface {
	HasNext() bool
	Next() uint64
}

IntIterable allows you to iterate over the values in a Bitmap

type IntPeekable

type IntPeekable interface {
	IntIterable
	// PeekNext peeks the next value without advancing the iterator
	PeekNext() uint64
	// AdvanceIfNeeded advances as long as the next value is smaller than minval
	AdvanceIfNeeded(minval uint64)
}

IntPeekable allows you to look at the next value without advancing and advance as long as the next value is smaller than minval

type ManyIntIterable

type ManyIntIterable interface {
	// pass in a buffer to fill up with values, returns how many values were returned
	NextMany([]uint64) int
}

ManyIntIterable allows you to iterate over the values in a Bitmap

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL