roaring

package
v0.0.0-...-0cf3de4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2016 License: Apache-2.0, MIT Imports: 9 Imported by: 0

README

roaring Build StatusGoDoc

This is a go port of the Roaring bitmap data structure.

Roaring is used by Apache Spark (https://spark.apache.org/), Apache Kylin (http://kylin.io), Druid.io (http://druid.io/), Whoosh (https://pypi.python.org/pypi/Whoosh/) and Apache Lucene (http://lucene.apache.org/) (as well as supporting systems such as Solr and Elastic).

The original java version can be found at https://github.com/RoaringBitmap/RoaringBitmap

The Java and Go version are meant to be binary compatible: you can save bitmaps from a Java program and load them back in Go, and vice versa.

This code is licensed under Apache License, Version 2.0 (ASL2.0).

Contributors: Todd Gruben (@tgruben), Daniel Lemire (@lemire), Elliot Murphy (@statik), Bob Potter (@bpot), Tyson Maly (@tvmaly), Will Glynn (@willglynn), Brent Pedersen (@brentp)

References
Dependencies
  • go get github.com/smartystreets/goconvey/convey
  • go get github.com/willf/bitset

Naturally, you also need to grab the roaring code itself:

  • go get github.com/RoaringBitmap/roaring
Example

Here is a simplified but complete example:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring==")
    rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring.BitmapOf(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring.NewBitmap()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)

    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring.NewBitmap()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
    	fmt.Println("I wrote the content to a byte stream and read it back.")
    }
}

If you wish to use serialization and handle errors, you might want to consider the following sample of code:

	rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
	buf := new(bytes.Buffer)
	size,err:=rb.WriteTo(buf)
	if err != nil {
		t.Errorf("Failed writing")
	}
	newrb:= NewBitmap()
	size,err=newrb.ReadFrom(buf)
	if err != nil {
		t.Errorf("Failed reading")
	}
	if ! rb.Equals(newrb) {
		t.Errorf("Cannot retrieve serialized version")
	}

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call BoundSerializedSizeInBytes for a more precise estimate.

Documentation

Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring

Benchmark

Type

     go test -bench Benchmark -run -
Compatibility with Java RoaringBitmap library

You can read bitmaps in Go (resp. Java) that have been serialized in Java (resp. Go) with the caveat that the Go library does not yet support run containers. So if you plan to read bitmaps serialized from Java in Go, you might want to call removeRunCompression prior to serializing your Java instances. This is a temporary limitation: we plan to add support for run containers to the Go library.

Alternative

For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.

Documentation

Overview

Package roaring is an implementation of Roaring Bitmaps in Go. They provide fast compressed bitmap data structures (also called bitset). They are ideally suited to represent sets of integers over relatively small ranges. See http://roaringbitmap.org for details.

Example (Roaring)

Example_roaring demonstrates how to use the roaring library.

package main

import (
	"bytes"
	"fmt"
	"github.com/RoaringBitmap/roaring"
)

func main() {
	// example inspired by https://github.com/fzandona/goroar
	fmt.Println("==roaring==")
	rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
	fmt.Println(rb1.String())

	rb2 := roaring.BitmapOf(3, 4, 1000)
	fmt.Println(rb2.String())

	rb3 := roaring.NewBitmap()
	fmt.Println(rb3.String())

	fmt.Println("Cardinality: ", rb1.GetCardinality())

	fmt.Println("Contains 3? ", rb1.Contains(3))

	rb1.And(rb2)

	rb3.Add(1)
	rb3.Add(5)

	rb3.Or(rb1)

	// prints 1, 3, 4, 5, 1000
	i := rb3.Iterator()
	for i.HasNext() {
		fmt.Println(i.Next())
	}
	fmt.Println()

	// next we include an example of serialization
	buf := new(bytes.Buffer)
	size, err := rb1.WriteTo(buf)
	if err != nil {
		fmt.Println("Failed writing")
		return
	} else {
		fmt.Println("Wrote ", size, " bytes")
	}
	newrb := roaring.NewBitmap()
	size, err = newrb.ReadFrom(buf)
	if err != nil {
		fmt.Println("Failed reading")
		return
	}
	if !rb1.Equals(newrb) {
		fmt.Println("I did not get back to original bitmap?")
		return
	} else {
		fmt.Println("I wrote the content to a byte stream and read it back.")
	}
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func BoundSerializedSizeInBytes

func BoundSerializedSizeInBytes(cardinality uint64, universe_size uint64) uint64

BoundSerializedSizeInBytes returns an upper bound on the serialized size in bytes assuming that one wants to store "cardinality" integers in [0, universe_size)

Types

type Bitmap

type Bitmap struct {
	// contains filtered or unexported fields
}

Bitmap represents a compressed bitmap where you can add integers.

func And

func And(x1, x2 *Bitmap) *Bitmap

And computes the intersection between two bitmaps and returns the result

func AndNot

func AndNot(x1, x2 *Bitmap) *Bitmap

AndNot computes the difference between two bitmaps and returns the result

func BitmapOf

func BitmapOf(dat ...uint32) *Bitmap

BitmapOf generates a new bitmap filled with the specified integer

func FastAnd

func FastAnd(bitmaps ...*Bitmap) *Bitmap

FastAnd computes the intersection between many bitmaps quickly Compared to the And function, it can take many bitmaps as input, thus saving the trouble of manually calling "And" many times.

func FastOr

func FastOr(bitmaps ...*Bitmap) *Bitmap

FastOr computes the union between many bitmaps quickly, as opposed to having to call Or repeatedly. It might also be faster than calling Or repeatedly.

func Flip

func Flip(bm *Bitmap, rangeStart, rangeEnd uint32) *Bitmap

Flip negates the bits in the given range, any integer present in this range and in the bitmap is removed, and any integer present in the range and not in the bitmap is added, a new bitmap is returned leaving the current bitmap unchanged

func FlipInt

func FlipInt(bm *Bitmap, rangeStart, rangeEnd int) *Bitmap

FlipInt calls Flip after casting the parameters to uint32 (convenience method)

func HeapOr

func HeapOr(bitmaps ...*Bitmap) *Bitmap

HeapOr computes the union between many bitmaps quickly using a heap. It might be faster than calling Or repeatedly.

func HeapXor

func HeapXor(bitmaps ...*Bitmap) *Bitmap

HeapXor computes the symmetric difference between many bitmaps quickly (as opposed to calling Xor repeated). Internally, this function uses a heap. It might be faster than calling Xor repeatedly.

func NewBitmap

func NewBitmap() *Bitmap

NewBitmap creates a new empty Bitmap

func Or

func Or(x1, x2 *Bitmap) *Bitmap

Or computes the union between two bitmaps and returns the result

func Xor

func Xor(x1, x2 *Bitmap) *Bitmap

Xor computes the symmetric difference between two bitmaps and returns the result

func (*Bitmap) Add

func (rb *Bitmap) Add(x uint32)

Add the integer x to the bitmap

func (*Bitmap) AddInt

func (rb *Bitmap) AddInt(x int)

AddInt adds the integer x to the bitmap (convenience method: the parameter is casted to uint32 and we call Add)

func (*Bitmap) AddRange

func (rb *Bitmap) AddRange(rangeStart, rangeEnd uint32)

AddRange adds the integers in [rangeStart, rangeEnd) to the bitmap

func (*Bitmap) And

func (rb *Bitmap) And(x2 *Bitmap)

And computes the intersection between two bitmaps and stores the result in the current bitmap

func (*Bitmap) AndCardinality

func (rb *Bitmap) AndCardinality(x2 *Bitmap) uint64

AndCardinality returns the cardinality of the intersection between two bitmaps, bitmaps are not modified

func (*Bitmap) AndNot

func (rb *Bitmap) AndNot(x2 *Bitmap)

AndNot computes the difference between two bitmaps and stores the result in the current bitmap

func (*Bitmap) CheckedAdd

func (rb *Bitmap) CheckedAdd(x uint32) bool

CheckedAdd adds the integer x to the bitmap and return true if it was added (false if the integer was already present)

func (*Bitmap) CheckedRemove

func (rb *Bitmap) CheckedRemove(x uint32) bool

CheckedRemove removes the integer x from the bitmap and return true if the integer was effectively remove (and false if the integer was not present)

func (*Bitmap) Clear

func (rb *Bitmap) Clear()

Clear removes all content from the Bitmap and frees the memory

func (*Bitmap) Clone

func (rb *Bitmap) Clone() *Bitmap

Clone creates a copy of the Bitmap

func (*Bitmap) Contains

func (rb *Bitmap) Contains(x uint32) bool

Contains returns true if the integer is contained in the bitmap

func (*Bitmap) ContainsInt

func (rb *Bitmap) ContainsInt(x int) bool

ContainsInt returns true if the integer is contained in the bitmap (this is a convenience method, the parameter is casted to uint32 and Contains is called)

func (*Bitmap) Equals

func (rb *Bitmap) Equals(o interface{}) bool

Equals returns true if the two bitmaps contain the same integers

func (*Bitmap) Flip

func (rb *Bitmap) Flip(rangeStart, rangeEnd uint32)

Flip negates the bits in the given range, any integer present in this range and in the bitmap is removed, and any integer present in the range and not in the bitmap is added

func (*Bitmap) FlipInt

func (rb *Bitmap) FlipInt(rangeStart, rangeEnd int)

FlipInt calls Flip after casting the parameters to uint32 (convenience method)

func (*Bitmap) FromBase64

func (rb *Bitmap) FromBase64(str string) (int64, error)

FromBase64 deserializes a bitmap from Base64

func (*Bitmap) GetCardinality

func (rb *Bitmap) GetCardinality() uint64

GetCardinality returns the number of integers contained in the bitmap

func (*Bitmap) GetSerializedSizeInBytes

func (rb *Bitmap) GetSerializedSizeInBytes() uint64

GetSerializedSizeInBytes computes the serialized size in bytes the Bitmap. It should correspond to the number of bytes written when invoking WriteTo

func (*Bitmap) GetSizeInBytes

func (rb *Bitmap) GetSizeInBytes() uint64

GetSizeInBytes estimates the memory usage of the Bitmap. Note that this might differ slightly from the amount of bytes required for persistent storage

func (*Bitmap) Intersects

func (rb *Bitmap) Intersects(x2 *Bitmap) bool

Intersects checks whether two bitmap intersects, bitmaps are not modified

func (*Bitmap) IsEmpty

func (rb *Bitmap) IsEmpty() bool

IsEmpty returns true if the Bitmap is empty (it is faster than doing (GetCardinality() == 0))

func (*Bitmap) Iterator

func (rb *Bitmap) Iterator() IntIterable

Iterator creates a new IntIterable to iterate over the integers contained in the bitmap, in sorted order

func (*Bitmap) Or

func (rb *Bitmap) Or(x2 *Bitmap)

Or computes the union between two bitmaps and stores the result in the current bitmap

func (*Bitmap) OrCardinality

func (rb *Bitmap) OrCardinality(x2 *Bitmap) uint64

OrCardinality returns the cardinality of the union between two bitmaps, bitmaps are not modified

func (*Bitmap) Rank

func (rb *Bitmap) Rank(x uint32) uint32

Rank returns the number of integers that are smaller or equal to x (Rank(infinity) would be GetCardinality())

func (*Bitmap) ReadFrom

func (rb *Bitmap) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads a serialized version of this bitmap from stream

func (*Bitmap) Remove

func (rb *Bitmap) Remove(x uint32)

Remove the integer x from the bitmap

func (*Bitmap) RemoveRange

func (rb *Bitmap) RemoveRange(rangeStart, rangeEnd uint32)

RemoveRange removes the integers in [rangeStart, rangeEnd) from the bitmap

func (*Bitmap) Select

func (rb *Bitmap) Select(x uint32) (uint32, error)

Select returns the xth integer in the bitmap

func (*Bitmap) String

func (rb *Bitmap) String() string

String creates a string representation of the Bitmap

func (*Bitmap) ToArray

func (rb *Bitmap) ToArray() []uint32

ToArray creates a new slice containing all of the integers stored in the Bitmap in sorted order

func (*Bitmap) ToBase64

func (rb *Bitmap) ToBase64() (string, error)

ToBase64 serializes a bitmap as Base64

func (*Bitmap) WriteTo

func (rb *Bitmap) WriteTo(stream io.Writer) (int64, error)

WriteTo writes a serialized version of this bitmap to stream

func (*Bitmap) Xor

func (rb *Bitmap) Xor(x2 *Bitmap)

Xor computes the symmetric difference between two bitmaps and stores the result in the current bitmap

type IntIterable

type IntIterable interface {
	HasNext() bool
	Next() uint32
}

IntIterable allows you to iterate over the values in a Bitmap

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL