bbhash

package module
v0.0.0-...-efb52d0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2023 License: MIT Imports: 12 Imported by: 0

README

BBHash

BBHash is a fast Go implementation of a minimal perfect hash function for large key sets.

Installing the package for use in your own project

% go get github.com/relab/bbhash

How to use the package

import (
	"fmt"

	"github.com/relab/bbhash"
)

func ExampleBBHash_Find() {
	keys := []uint64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	bb, err := bbhash.NewSequential(1.5, 0, keys)
	if err != nil {
		panic(err)
	}
	for _, key := range keys {
		hashIndex := bb.Find(key)
		fmt.Printf("%d, ", hashIndex)
	}
	fmt.Println()
	// Output:
	// 2, 6, 8, 3, 5, 7, 1, 9, 10, 4,
}

Credits

Implemented by Hein Meling.

The implementation is mainly inspired by Sudhi Herle's Go implementation. Damian Gryski also has a Go implementation.

The algorithm is described in the paper: Fast and Scalable Minimal Perfect Hashing for Massive Key Sets Antoine Limasset, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. Their C++ implementation is available here.

Documentation

Overview

Package bbhash implements the BBHash algorithm for minimal perfect hash functions.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BBHash

type BBHash struct {
	// contains filtered or unexported fields
}

BBHash represents a minimal perfect hash for a set of keys.

func NewParallel

func NewParallel(gamma float64, keys []uint64) (*BBHash, error)

NewParallel creates a new BBHash for the given keys. The keys must be unique. This creates the BBHash using multiple goroutines. The gamma parameter is the expansion factor for the bit vector; the paper recommends a value of 2.0. The larger the value the more memory will be consumed by the BBHash.

func NewSequential

func NewSequential(gamma float64, keys []uint64) (*BBHash, error)

NewSequential creates a new BBHash for the given keys. The keys must be unique. This creates the BBHash in a single goroutine. The gamma parameter is the expansion factor for the bit vector; the paper recommends a value of 2.0. The larger the value the more memory will be consumed by the BBHash.

func NewSequentialWithKeymap

func NewSequentialWithKeymap(gamma float64, keys []uint64) (*BBHash, []uint64, error)

NewSequentialWithKeymap is similar to NewSequential, but in addition returns the reverse map.

func (BBHash) BitVectors

func (bb BBHash) BitVectors() string

BitVectors returns a Go slice for BBHash's per-level bit vectors. This is intended for testing and debugging; no guarantees are made about the format.

func (BBHash) BitsPerKey

func (bb BBHash) BitsPerKey() float64

BitsPerKey returns the number of bits per key in the minimal perfect hash.

func (*BBHash) Find

func (bb *BBHash) Find(key uint64) uint64

Find returns a unique index representing the key in the minimal hash set.

The return value is meaningful ONLY for keys in the original key set (provided at the time of construction of the minimal hash set).

If the key is in the original key set, the return value is guaranteed to be in the range [1, len(keys)].

If the key is not in the original key set, two things can happen: 1. The return value is 0, representing that the key was not in the original key set. 2. The return value is in the expected range [1, len(keys)], but is a false positive.

Example
package main

import (
	"fmt"

	"github.com/relab/bbhash"
)

func main() {
	keys := []uint64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	bb, err := bbhash.NewSequential(1.5, keys)
	if err != nil {
		panic(err)
	}
	for _, key := range keys {
		hashIndex := bb.Find(key)
		fmt.Printf("%d, ", hashIndex)
	}
	fmt.Println()
}
Output:

2, 6, 1, 4, 9, 3, 8, 5, 10, 7,

func (BBHash) LevelVectors

func (bb BBHash) LevelVectors() [][]uint64

LevelVectors returns a slice representation of the BBHash's per-level bit vectors.

func (BBHash) Levels

func (bb BBHash) Levels() int

Levels returns the number of Levels in the minimal perfect hash.

func (BBHash) MarshalBinary

func (bb BBHash) MarshalBinary() ([]byte, error)

MarshalBinary implements the encoding.BinaryMarshaler interface.

func (*BBHash) ReverseMap

func (bb *BBHash) ReverseMap(dbKeys [][]byte, mphfKeys []uint64) ReverseMap

ReverseMap returns a reverse map from the proof's MPHF index to the chunk's content address (or database key).

func (BBHash) String

func (bb BBHash) String() string

String returns a string representation of BBHash with overall and per-level statistics.

func (*BBHash) UnmarshalBinary

func (bb *BBHash) UnmarshalBinary(data []byte) error

UnmarshalBinary implements the encoding.BinaryUnmarshaler interface.

type BBHash2

type BBHash2 struct {
	// contains filtered or unexported fields
}

func New

func New(gamma float64, partitions int, keys []uint64) (*BBHash2, error)

New creates a new BBHash2 for the given keys. The keys must be unique. If partitions is 1 or less, then a single BBHash is created, wrapped in a BBHash2. Otherwise, the keys are partitioned into the given the number partitions, and multiple BBHashes are created in parallel.

The gamma parameter is the expansion factor for the bit vector; the paper recommends a value of 2.0. The larger the value the more memory will be consumed by the BBHash.

func NewParallel2

func NewParallel2(gamma float64, numPartitions int, keys []uint64) (*BBHash2, error)

NewParallel2 creates a new BBHash2 for the given keys. The keys must be unique. For small key sets, you may want to use NewSequential instead, since it will likely be faster. NewParallel2 allocates more memory than NewSequential, but will be faster for large key sets.

This partitions the input and creates multiple BBHashes using multiple goroutines. The gamma parameter is the expansion factor for the bit vector; the paper recommends a value of 2.0. The larger the value the more memory will be consumed by the BBHash.

func (BBHash2) BitsPerKey

func (bb BBHash2) BitsPerKey() float64

func (*BBHash2) Find

func (bb *BBHash2) Find(key uint64) uint64

Find returns a unique index representing the key in the minimal hash set.

The return value is meaningful ONLY for keys in the original key set (provided at the time of construction of the minimal hash set).

If the key is in the original key set, the return value is guaranteed to be in the range [1, len(keys)].

If the key is not in the original key set, two things can happen: 1. The return value is 0, representing that the key was not in the original key set. 2. The return value is in the expected range [1, len(keys)], but is a false positive.

func (BBHash2) MaxMinLevels

func (bb BBHash2) MaxMinLevels() (max, min int)

func (BBHash2) String

func (bb BBHash2) String() string

type ReverseMap

type ReverseMap [][]byte

func (ReverseMap) Lookup

func (rm ReverseMap) Lookup(index uint64) []byte

Lookup returns the chunk's content address (or database key) for the given MPHF index.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL