bitmap

package module
v1.5.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 6, 2023 License: MIT Imports: 12 Imported by: 19

README

kelindar/bitmap
Go Version PkgGoDev Go Report Card License Coverage

SIMD-Vectorized Bitmap (Bitset) in Go

This package contains a bitmap implementation, backed by a slice of []uint64 and designed for dense small or medium collections. This implementation focuses on high performance by avoiding heap allocations, unrolling loops and implementing SIMD vectorization in assembly.

Features

  • Optimized for zero heap allocation for all of the important methods of the bitmap.
  • Optimized by vectorized instructions (SIMD) used for certain operations such as boolean algebra.
  • Support for boolean algebra that makes it perfect to implement bitmap indexes.
  • Support for bit counting with operations such Min(), Max(), Count() and more.
  • Support for fast iteration over bits set to one by using an unrolled loop.
  • Support for in-place filtering based on a user-defined predicate.
  • Support for binary encoding and can be read/written and has a no-copy slice conversion.
  • Support for reusability by providing Clone() and Clear() operations.

Documentation

The general idea of this package is to have a dead simple way of creating bitmaps (bitsets) that provide maximum performance on the modern hardware by using vectorized single-instruction multiple data (SIMD) operations. As opposed to something as roaring bitmaps which are excellent for sparse data, this implementation is designed to be used for small or medium dense bit sets. I've used this package to build a columnar in-memory store, so if you want to see how it can be used for indexing, have a look at kelindar/column. I'd like to specifically point out the indexing part and how bitmaps can be used as a good alternative to B*Trees and Hash Maps.

First, here's what you need to do in order to import this package.

import "github.com/kelindar/bitmap"

Boolean Algebra

Perhaps one of the most useful features of this package is the vectorized implementation of boolean operations allowing us to perform boolean algebra on multiple bitmaps. For example, let's imagine that we have a dataset containing books, and four bitmaps defining one of the four properties of each book. In the figure below, you can imagine that our books can be on "columns" and each bit in a bitmap defines whether this attribute exists on a book or not.

kelindar/bitmap

Now, if we want to find all books that were recently published and have an ebook available, we can use an And() method on our two bitmaps in order to combine them. In the example below we retrieve 3 hypothetical bitmaps and combine them to answer our query by calling and And() method to mutate the books bitmap twice.

books  := bitmapFor("books")           // bitmap.Bitmap
recent := bitmapFor("books_recent")    // bitmap.Bitmap
ebooks := bitmapFor("books_has_ebook") // bitmap.Bitmap

// And operation actually mutates our "books" bitmap
books.And(recent)
books.And(ebooks)

kelindar/bitmap

Now, what if we want to find recently published books which has e-book available but are not best-sellers? In that case, we could use binary AndNot() operation that hardware exposes. In the example below we combine

books.And(recent)
books.And(ebooks)
books.AndNot(bestsellers) 

kelindar/bitmap

Single Bit Operations

When dealing with single elements, this package supports simple single-bit operations. They include Set() and Remove() to set a bit to one and to zero respectively, as well as Contans() to check for a presence (value set to one) of a certain bit. These methods are simple to use and setting a bit which is out of range would automatically resize the bitmap.

In the example below we're creating a bitmap, setting one bit to one, checking its presence and setting it back to zero after.

var books bitmap.Bitmap

books.Set(3)                 // Set the 3rd bit to '1'
hasBook := books.Contains(3) // Returns 'true'
books.Remove(3)              // Set the 3rd bit to '0'

When using a bitmap for indexing or free-list purposes, you will often find yourself in need of counting how many bits are set in a bitmap. This operation actually has a specialized hardware instruction POPCNT and an efficient implementation is included in this library. The example below shows how you can simply count the number of bits in a bitmap by calling the Count() method.

// Counts number of bits set to '1'
numBooks := books.Count()

On the other hand, you might want to find a specific bit either set to one or to zero, the methods Min(), Max() allow you to find first or last bit set to one while MinZero() and MaxZero() allow you to find first or last bit set to zero. The figure below demonstrates an example of that.

kelindar/bitmap

Iterate and Filter

The bits in the bitmap can also be iterated over using the Range method. It is a simple loop which iterates over and calls a callback. If the callback returns false, then the iteration is halted (similar to sync.Map).

// Iterate over the bits in the bitmap
bitmap.Range(func(x uint32) bool {
    println(x)
    return true
})

Another way of iterating is using the Filter method. It iterates similarly to Range but the callback returns a boolean value, and if it returns false then the current bit will be cleared in the underlying bitmap. You could accomplish the same using Range and Remove but Filter is significantly faster.

// Filter iterates over the bits and applies a callback
bitmap.Filter(func(x uint32) bool {
    return x % 2 == 0
})

Example Usage

In its simplest form, you can use the bitmap as a bitset, set and remove bits. This is quite useful as an index (free/fill-list) for an array of data.

import "github.com/kelindar/bitmap"
var books := bitmap.Bitmap
books.Set(300)      // sets 300-th bit
books.Set(400)      // sets 400-th bit
books.Set(600)      // sets 600-th bit (auto-resized)
books.Contains(300) // returns true
books.Contains(301) // returns false
books.Remove(400)   // clears 400-th bit

// Min, Max, Count
min, ok := books.Min() // returns 300
max, ok := books.Max() // returns 600
count := books.Count() // returns 2

// Boolean algebra
var other bitmap.Bitmap
other.Set(300)
books.And(other)      // Intersection
count = books.Count() // Now returns 1

Benchmarks

Benchmarks below were run on a pre-allocated bitmap of 100,000 elements containing with around 50% bits set to one.

cpu: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
BenchmarkBitmap/set-8         552331321    4.319 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/remove-8     1000000000    1.621 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/contains-8   1000000000    1.309 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/clear-8        26083383    90.45 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/ones-8          6751939    347.9 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/min-8         757831477    3.137 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/max-8        1000000000    1.960 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/min-zero-8    776620110    3.081 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/max-zero-8   1000000000    1.536 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/count-8         6071037    382.5 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/count-to-8     82777459    28.85 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/clone-8        20654008    111.5 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/and-8          16813963    143.6 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/andnot-8       16961106    141.9 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/or-8           16999562    141.7 ns/op    0 B/op    0 allocs/op
BenchmarkBitmap/xor-8          16954036    144.7 ns/op    0 B/op    0 allocs/op
BenchmarkRange/range-8            18225   131908 ns/op    0 B/op    0 allocs/op
BenchmarkRange/filter-8           25636    93630 ns/op    0 B/op    0 allocs/op

Contributing

We are open to contributions, feel free to submit a pull request and we'll review it as quickly as we can. This library is maintained by Roman Atachiants

License

Tile is licensed under the MIT License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Max added in v1.4.0

func Max[T simd.Number](src []T, filter Bitmap) (max T, hit bool)

Max finds the largest value in a slice, filtered by the provided bitmap

func Min added in v1.4.0

func Min[T simd.Number](src []T, filter Bitmap) (min T, hit bool)

Min finds the smallest value in a slice, filtered by the provided bitmap

func Sum added in v1.4.0

func Sum[T simd.Number](src []T, filter Bitmap) (sum T)

Sum computes a horizontal sum of a slice, filtered by the provided bitmap

Types

type Bitmap

type Bitmap []uint64

Bitmap represents a scalar-backed bitmap index

func FromBytes

func FromBytes(buffer []byte) (out Bitmap)

FromBytes reads a bitmap from a byte buffer without copying the buffer.

func ReadFrom

func ReadFrom(r io.Reader) (Bitmap, error)

ReadFrom reads the bitmap from the reader.

func (*Bitmap) And

func (dst *Bitmap) And(other Bitmap, extra ...Bitmap)

And computes the intersection between two bitmaps and stores the result in the current bitmap

func (*Bitmap) AndNot

func (dst *Bitmap) AndNot(other Bitmap, extra ...Bitmap)

AndNot computes the difference between two bitmaps and stores the result in the current bitmap. Operation works as set subtract: dst - b

func (*Bitmap) Clear

func (dst *Bitmap) Clear()

Clear clears the bitmap and resizes it to zero.

func (Bitmap) Clone

func (dst Bitmap) Clone(into *Bitmap) Bitmap

Clone clones the bitmap. If a destination bitmap is provided, the bitmap will be cloned inside, otherwise a new Bitmap will be allocated and returned

func (Bitmap) Contains

func (dst Bitmap) Contains(x uint32) bool

Contains checks whether a value is contained in the bitmap or not.

func (Bitmap) Count

func (dst Bitmap) Count() int

Count returns the number of elements in this bitmap

func (Bitmap) CountTo added in v1.0.4

func (dst Bitmap) CountTo(until uint32) int

CountTo counts the number of elements in the bitmap up until the specified index. If until is math.MaxUint32, it will return the count. The count is non-inclusive of the index.

func (*Bitmap) Filter

func (dst *Bitmap) Filter(f func(x uint32) bool)

Filter iterates over the bitmap elements and calls a predicate provided for each containing element. If the predicate returns false, the bitmap at the element's position is set to zero.

func (*Bitmap) Grow added in v1.0.7

func (dst *Bitmap) Grow(desiredBit uint32)

Grow grows the bitmap size until we reach the desired bit.

func (Bitmap) MarshalJSON added in v1.2.1

func (dst Bitmap) MarshalJSON() ([]byte, error)

MarshalJSON returns encoded string representation for the bitmap

func (Bitmap) Max

func (dst Bitmap) Max() (uint32, bool)

Max get the largest value stored in this bitmap, assuming the bitmap is not empty.

func (Bitmap) MaxZero added in v1.0.10

func (dst Bitmap) MaxZero() (uint32, bool)

MaxZero get the last zero bit and return its index, assuming bitmap is not empty

func (Bitmap) Min

func (dst Bitmap) Min() (uint32, bool)

Min get the smallest value stored in this bitmap, assuming the bitmap is not empty.

func (Bitmap) MinZero added in v1.0.10

func (dst Bitmap) MinZero() (uint32, bool)

MinZero finds the first zero bit and returns its index, assuming the bitmap is not empty.

func (Bitmap) Ones

func (dst Bitmap) Ones()

Ones sets the entire bitmap to one.

func (*Bitmap) Or

func (dst *Bitmap) Or(other Bitmap, extra ...Bitmap)

Or computes the union between two bitmaps and stores the result in the current bitmap

func (Bitmap) Range

func (dst Bitmap) Range(fn func(x uint32))

Range iterates over all of the bits set to one in this bitmap.

func (*Bitmap) ReadFrom added in v1.1.4

func (dst *Bitmap) ReadFrom(r io.Reader) (int64, error)

ReadFrom reads data from r until EOF or error. The return value n is the number of bytes read. Any error except EOF encountered during the read is also returned.

func (*Bitmap) Remove

func (dst *Bitmap) Remove(x uint32)

Remove removes the bit x from the bitmap, but does not shrink it.

func (*Bitmap) Set

func (dst *Bitmap) Set(x uint32)

Set sets the bit x in the bitmap and grows it if necessary.

func (*Bitmap) ToBytes

func (dst *Bitmap) ToBytes() (out []byte)

ToBytes converts the bitmap to binary representation without copying the underlying data. The output buffer should not be modified, since it would also change the bitmap.

func (*Bitmap) UnmarshalJSON added in v1.2.1

func (dst *Bitmap) UnmarshalJSON(data []byte) (err error)

UnmarshalJSON decodes the received bytes and loads it to bitmap object

func (*Bitmap) WriteTo

func (dst *Bitmap) WriteTo(w io.Writer) (int64, error)

WriteTo writes the bitmap to a specified writer.

func (*Bitmap) Xor

func (dst *Bitmap) Xor(other Bitmap, extra ...Bitmap)

Xor computes the symmetric difference between two bitmaps and stores the result in the current bitmap

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL