bitindex

package module
v0.0.0-...-04d3b0d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 10, 2015 License: BSD-2-Clause Imports: 8 Imported by: 0

README

bitindex

Build Status GoDoc

A small library for building and operating on bit indexes. A bit index of composed of a domain and a table. A domain maps a set of uint32 members to bit positions. The table contains a set of sparse bit arrays for each key in the index.

For example, given a set of fruit choices (called the domain):

  • Apples
  • Cherries
  • Peaches
  • Grapes

and a set of people (referred to as keys) with the fruit they enjoy (referred to at the item set):

  • Bob: [Apples, Peaches]
  • Sue: [Grapes]
  • Joe: [Grapes, Cherries, Peaches]

The resulting matrix would look like this:

|Apples|Cherries|Peaches|Grapes ----|------|--------|-------|------ Bob|1|0|1|0 Sue|0|0|0|1 Joe|0|1|1|1

Once the index is built, lookup operations can be performed to find the items that match certain criteria:

  • any - Find all keys that have membership for any item in the lookup set.
  • all - Find all keys that have membership for all items in the lookup set.
  • not any - Find all keys that do not have membership for any item in the lookup set.
  • not all - Find all keys that do not have membership for all items in the lookup set.

Applying these operations to the example:

  • People who enjoy Apples or Cherries (an any operation) would match Bob and Joe.
  • People who enjoy Apples and Peaches (an all operation) would match Bob.
  • People who do not enjoy Peaches or Apples (a not any operation) would match Sue.
  • People who do not enjoy Grapes and Cherries (a not all operation) would match Bob and Sue.

Install

Download a release from the releases page for your architecture.

Source

Clone the repository and run the following to install bitindex in your $GOPATH/bin directory.

$ make install build

Usage

Build an index

To build an index, the input stream must be in one of the supported formats. For this example, we will use a CSV format and assuming the fruit and people have the following IDs.

ID Fruit
1 Apples
2 Cherries
3 Peaches
4 Grapes
ID Person
100 Bob
101 Sue
102 Joe

An index requires use of unsigned 32-bit integers, so we can use the IDs to encoded the labels.

$ cat << EOF > fruit.csv
person,fruit
100,1
100,3
101,4
102,4
102,2
102,3
EOF

To build the index, use the build command. Since we included a CSV header, we add the flag to denote that. The index is written to stdout by default, but we included the --output option to specify a filename.

$ bitindex build --format=csv --csv-header --output=fruit.bitx fruit.csv

This will output the following

Build time: 22.684µs
Statistics
* Domain size: 4
* Table size: 3
* Sparsity: 0

The domain size is equal to the number of fruit and the table size is the number of people.

Interfaces

Command Line

As noted above, the four operations that are supported are any, all, nany, and nall. Any or all of these flags can be passed with a comma-separated list of fruit IDs. Below query the index for the questions asked above.

Apples or Cherries

$ bitindex query --any=1,2 fruit.bitx
100
102

Apples and Peaches

$ bitindex query --all=1,3 fruit.bitx
100

Not (Peaches or Apples)

$ bitindex query --nany=3,1 fruit.bitx
101

Not (Grapes and Cherries)

$ bitindex query --nall=4,2 fruit.bitx
100
101
HTTP

To use the index in a real environment, the bitindex server would be started and RPC requests could be use to query the index.

Start the server by using the serve command.

$ bitindex serve fruit.bitx
Listening on 127.0.0.0:7000...

Domain

curl 127.0.0.1:7000/domain
{
    "complement": false,
    "items": [1, 2, 3, 4]
}

Keys

curl 127.0.0.1:7000/keys
{
    "complement": false,
    "item": [100, 101, 102]
}

Query

curl -X POST 127.0.0.0:7000/query -d '{"any": [1, 2]}'
{
    "complement": false,
    "items": [100, 102]
}

Formats

Currently, the only supported format is CSV.

Deployment

Docker

The Dockerfile defaults to running the HTTP service. The volume is required if the image is used as is.

docker run -v /path/to/data.bitx:/data.bitx -p 7000:7000 -d dbhi/bitindex

An alternate method would be to extend the image and copy the index into the image itself.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DumpIndex

func DumpIndex(w io.Writer, idx *Index) error

DumpIndex writes an Index to it binary representation.

Types

type Array

type Array map[uint32]byte

Array represents an array of bytes.

func NewArray

func NewArray() Array

NewArray initializes a new array.

func (Array) All

func (a Array) All(bits ...uint32) bool

All returns true if all of the bits are set.

func (Array) Any

func (a Array) Any(bits ...uint32) bool

Any returns true any of the bits set.

func (Array) Bytes

func (a Array) Bytes() int

Size returns the number of bytes used by the array.

func (Array) Clear

func (a Array) Clear(bit uint32)

Clear sets the bit to 0.

func (Array) Flip

func (a Array) Flip(bit uint32) bool

Flip flips the bit.

func (Array) Has

func (a Array) Has(bit uint32) bool

Has returns true if the bit is set.

func (Array) Loc

func (a Array) Loc(bit uint32) Loc

Loc returns the location of the bit in this array.

func (Array) NotAll

func (a Array) NotAll(bits ...uint32) bool

NotAll returns true if all of the bits are not set.

func (Array) NotAny

func (a Array) NotAny(bits ...uint32) bool

NotAny returns true if any of the bits are not set.

func (Array) Set

func (a Array) Set(bit uint32)

Set sets the bit to 1.

type CSVIndexer

type CSVIndexer struct {
	*csv.Reader

	// If true, the first line will be skipped.
	Header bool

	// A function that takes a CSV row and returns the key and member
	// to be index.
	Parse func([]string) (uint32, uint32, error)
}

CSVIndexer is an indexer for CSV structured data.

func NewCSVIndexer

func NewCSVIndexer(r io.Reader) *CSVIndexer

NewCSVIndexer initializes a new CSV parser for building an index.

func (*CSVIndexer) Index

func (p *CSVIndexer) Index() (*Index, error)

Build implements the Indexer interface and builds an index from a CSV file.

type Domain

type Domain struct {
	// contains filtered or unexported fields
}

Domain maps a member to a position in the bit array.

func LoadDomain

func LoadDomain(r io.Reader) (*Domain, error)

LoadDomain loads only the domain from an io.Reader.

func NewDomain

func NewDomain(ms []uint32) *Domain

func (*Domain) Add

func (d *Domain) Add(m uint32) uint32

Add adds a member to the domain.

func (*Domain) Bit

func (d *Domain) Bit(m uint32) uint32

Bit returns the bit of the member.

func (*Domain) Bytes

func (d *Domain) Bytes() int

Bytes returns the number of bytes required for the domain.

func (*Domain) Mask

func (d *Domain) Mask(ms ...uint32) ([]uint32, error)

Mask builds a bitmask for a subset of members in the domain.

func (*Domain) Member

func (d *Domain) Member(b uint32) uint32

Member returns the member for the bit.

func (*Domain) Members

func (d *Domain) Members() []uint32

Members returns the members in the domain.

func (*Domain) Size

func (d *Domain) Size() int

Size returns the size of the domain, which is also the number of bits.

type Index

type Index struct {
	Domain *Domain
	Table  Table
}

Index combines and domain

func LoadIndex

func LoadIndex(r io.Reader) (*Index, error)

LoadIndex loads the index from an io.Reader.

func NewIndex

func NewIndex(d []uint32) *Index

NewIndex initializes a new index.

func (*Index) Add

func (ix *Index) Add(k uint32, m uint32)

Add adds sets the bit for key `k` for member `m` in the domain.

func (*Index) All

func (ix *Index) All(ms ...uint32) ([]uint32, error)

All returns all keys that match all of the passed members.

func (*Index) Any

func (ix *Index) Any(ms ...uint32) ([]uint32, error)

Any returns all keys that match any of the passed members.

func (*Index) Has

func (ix *Index) Has(k uint32, m uint32) bool

Has returns true if the key has the member.

func (*Index) NotAll

func (ix *Index) NotAll(ms ...uint32) ([]uint32, error)

NotAll returns all keys that do not match all of the passed members.

func (*Index) NotAny

func (ix *Index) NotAny(ms ...uint32) ([]uint32, error)

NotAny returns all keys that do not match any of the passed members.

func (*Index) Query

func (ix *Index) Query(any, all, nany, nall []uint32) (*Result, error)

func (*Index) Sparsity

func (ix *Index) Sparsity() float32

Sparsity returns the proportion of bits being represented in the domain to the bytes being allocated in the index.

type Indexer

type Indexer interface {
	Index() (*Index, error)
}

Indexer is an interface for defines a method for building an index from a source.

type Loc

type Loc [2]uint32

Loc is the location of the bit in array consisting of the byte position and the relative bit.

type Result

type Result struct {
	// contains filtered or unexported fields
}

func (*Result) Complement

func (r *Result) Complement() []uint32

func (*Result) Items

func (r *Result) Items() []uint32

func (*Result) Len

func (r *Result) Len() int

func (*Result) Smallest

func (r *Result) Smallest(thres float32) bool

type Table

type Table map[uint32]Array

Table is an map of keys to bit arrays.

func (Table) Bytes

func (t Table) Bytes() int

Bytes returns the number of bytes allocated in the table.

func (Table) Get

func (t Table) Get(k uint32) Array

Get gets the bit array for the key.

func (Table) Keys

func (t Table) Keys() []uint32

Keys returns all keys in the table.

func (Table) Set

func (t Table) Set(k uint32, b uint32)

Set sets the bit for key `k`. The array for `k` is created if it does not exist.

func (Table) Size

func (t Table) Size() int

Size returns the number of items in the table.

type Uint32Array

type Uint32Array []uint32

func (Uint32Array) Len

func (a Uint32Array) Len() int

func (Uint32Array) Less

func (a Uint32Array) Less(i, j int) bool

func (Uint32Array) Swap

func (a Uint32Array) Swap(i, j int)

type Uint32Set

type Uint32Set map[uint32]struct{}

func (Uint32Set) Add

func (s Uint32Set) Add(is ...uint32)

func (Uint32Set) Clear

func (s Uint32Set) Clear()

func (Uint32Set) Contains

func (s Uint32Set) Contains(i uint32) bool

func (Uint32Set) Intersect

func (s Uint32Set) Intersect(b Uint32Set) Uint32Set

func (Uint32Set) Items

func (s Uint32Set) Items() []uint32

func (Uint32Set) Len

func (s Uint32Set) Len() int

func (Uint32Set) Remove

func (s Uint32Set) Remove(is ...uint32)

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL