offheap

package module
v0.0.0-...-0a843ca Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 27, 2022 License: MIT Imports: 5 Imported by: 0

README

offheap (formerly go-offheap-hashtable)

TODO update .. doesnt match the desc anymore

Docs:

http://godoc.org/github.com/glycerine/offheap

When GC pauses are long because you've got big hash tables in Go, you need an off-heap hash-table.

The purpose here is to have a hash table that can work away from Go's Garbage Collector, to avoid long GC pause times.

We accomplish this by writing our own Malloc() and Free() implementation (see malloc.go) which requests memory directly from the OS. The keys, the values, and the entire hash table itself is kept in this off-heap storage. This storage can also optionally be backed by a memory mapped file for speedy persistence and fast startup times.

Update: the cgo-malloc branch of this github repo has an implementation that uses CGO to call the malloc/calloc/free functions in the C stdlib. Using CGO gives up the save-to-disk instantly feature and creates a portability issue where you have linked against a specific version of the C stdlib. However if you are making/destroying alot of tables, the CGO apporach may be faster.

See offheap.go for all the interesting code. Modify val_t to hold you values, and key_t to contain your keys. Current sample code for three types of keys (int64, []byte, and strings) is provided (see bytekey.go). For the hashing function itself, the incredibly fast xxhash64 is used to produce uint64 hashes of strings and []byte.

Note that all your key and values should be inline in the Cell. If you point back into the go-heap, such values maybe garbage collected by the Go runtime without notice.

Initial HashTable implementation inspired by the public domain C++ code of

https://github.com/preshing/CompareIntegerMaps

See also

http://preshing.com/20130107/this-hash-table-is-faster-than-a-judy-array/

for performance studies of the C++ code.

installation

 go get -t github.com/glycerine/go-offheap-hashtable

testing

go test -v

The implementation was test driven and includes over 4500 correctness checks.

Copyright (C) 2015 by Jason E. Aten, Ph.D.

License: MIT.

Documentation

Overview

TODO : update

Offheap

An off-heap hash-table for Go (golang). Originally called go-offheap-hashtable, but now shortened to just offheap.

The purpose here is to have a hash table that can work away from Go's Garbage Collector, to avoid long GC pause times.

We accomplish this by writing our own Malloc() and Free() implementation (see malloc.go) which requests memory directly from the OS.

The keys, values, and entire hash table is kept on off-heap storage. This storage can also optionally be backed by memory mapped file for speedy persistence and fast startup times.

Initial HashTable implementation inspired by the public domain C++ code of

https://github.com/preshing/CompareIntegerMaps

See also

http://preshing.com/20130107/this-hash-table-is-faster-than-a-judy-array/

for performance studies of the C++ code.

HashTable

The implementation is mostly in offheap.go, read that to start.

Maps pointer-sized integers to Cell structures, which in turn hold Val_t as well as Key_t structures.

Uses open addressing with linear probing. This makes it very cache friendly and thus very fast.

In the t.Cells array, UnHashedKey = 0 is reserved to indicate an unused cell. Actual value for key 0 (if any) is stored in t.ZeroCell. The hash table automatically doubles in size when it becomes 75% full. The hash table never shrinks in size, even after Clear(), unless you explicitly call Compact().

Basic operations: Lookup(), Insert(), DeleteKey(). These are the equivalent of the builtin map[uint64]interface{}.

As an example of how to specialize for a map[string]*Cell equivalent, see the following functions in the bytekey.go file:

func (t *HashTable) InsertStringKey(strkey string, value interface{}) bool
func (t *HashTable) LookupStringKey(strkey string) (Val_t, bool)
func (t *HashTable) DeleteStringKey(strkey string) bool

Example use:

h := offheap.NewHashTable(2)

// basic three operations are:
h.InsertStringKey("My number", 43)
val, ok := h.LookupStringKey("My number")
h.DeleteStringKey("My number")

Note that this library is only a starting point of source code, and not intended to be used without customization. Users of the HashTable will have to customize it by changing the definitions of Key_t and Val_t to suite their needs. I'm experimenting next with storing objects in Capnproto serialized format, but this branch (branch capnp) isn't quite ready for use.

Related ideas:

https://gist.github.com/mish15/9822474 (using CGO)

https://groups.google.com/forum/#!topic/golang-nuts/kCQP6S6ZGh0

not fully off-heap, but using a slice instead of a map appears to help GC quite alot too:

https://github.com/cespare/kvcache/blob/master/refmap.go

Index

Constants

View Source
const MAGIC_NUMBERInt = 0x123456789ABCDEF

Variables

This section is empty.

Functions

This section is empty.

Types

type CellInt

type CellInt struct {
	Value int
	// contains filtered or unexported fields
}

func (*CellInt) ZeroValue

func (cell *CellInt) ZeroValue()

ZeroValue sets the cell's value to all zeros.

type CustomMetadataIntTest

type CustomMetadataIntTest struct {
	A int
	B float64
	C [10]byte
}

type EmptyStructInt

type EmptyStructInt struct{}

type HashTableCustomMetadataInt

type HashTableCustomMetadataInt struct {
	CustomMetadataIntTest
}

type HashTableInt

type HashTableInt struct {
	*HashTableMetadataInt
	*HashTableCustomMetadataInt
	// contains filtered or unexported fields
}

func NewHashTableInt

func NewHashTableInt(initialSize uint64) *HashTableInt

Create a new hash table, able to hold initialSize count of keys.

func NewHashTableIntFileBacked

func NewHashTableIntFileBacked(initialSize uint64, filepath string) (*HashTableInt, error)

func OpenHashTableIntFileBacked

func OpenHashTableIntFileBacked(filepath string) (*HashTableInt, error)

func (*HashTableInt) Bytes

func (t *HashTableInt) Bytes() []byte

func (*HashTableInt) Clear

func (t *HashTableInt) Clear()

Clear does not resize the table, but zeroes-out all entries and the custom metadata

func (*HashTableInt) Compact

func (t *HashTableInt) Compact()

Compact will compress the hashtable so that it is at most 75% full.

func (*HashTableInt) DeleteCellInt

func (t *HashTableInt) DeleteCellInt(cell *CellInt)

DeleteCellInt deletes the cell pointed to by cell.

func (*HashTableInt) DeleteKey

func (t *HashTableInt) DeleteKey(key uint64)

DeleteKey will delete the contents of the cell associated with key.

func (*HashTableInt) DestroyHashTable

func (t *HashTableInt) DestroyHashTable()

DestroyHashTable frees the memory-mapping, returning the memory containing the hash table and its cells to the OS. By default the save-to-file-on-disk functionality in malloc.go is not used, but that can be easily activated. See malloc.go. Deferencing any cells/pointers into the hash table after destruction will result in crashing your process, almost surely.

func (*HashTableInt) Insert

func (t *HashTableInt) Insert(key uint64) (*CellInt, bool)

func (*HashTableInt) InsertInt

func (h *HashTableInt) InsertInt(k uint64, v int)

func (*HashTableInt) Lookup

func (t *HashTableInt) Lookup(key uint64) *CellInt

Lookup a cell based on a uint64 key value. Returns nil if key not found.

func (*HashTableInt) LookupInt

func (h *HashTableInt) LookupInt(key uint64) (int, bool)

func (*HashTableInt) NewIterator

func (h *HashTableInt) NewIterator() *IntIterator

NewIterator creates a new iterator for HashTable tab.

func (*HashTableInt) Repopulate

func (t *HashTableInt) Repopulate(desiredSize uint64)

Repopulate expands the hashtable to the desiredSize count of cells.

func (*HashTableInt) Save

func (t *HashTableInt) Save()

Save syncs the memory mapped file to disk using MmapMalloc::BlockUntilSync()

func (*HashTableInt) UpdateChecksum

func (t *HashTableInt) UpdateChecksum()

type HashTableMetadataInt

type HashTableMetadataInt struct {
	MagicNumber uint64
	ArraySize   uint64
	Population  uint64
	Checksum    uint32
}

type IntIterator

type IntIterator struct {
	H   *HashTableInt
	Pos int64
	Cur *CellInt
}

IntIterator

sample use: given a HashTable h, enumerate h's contents with:

for it := offheap.NewIterator(h); it.Cur != nil; it.Next() {
  found = append(found, it.Cur.unHashedKey)
}

func (*IntIterator) Done

func (it *IntIterator) Done() bool

Done checks to see if we have already iterated through all cells in the table. Equivalent to checking it.Cur == nil.

func (*IntIterator) Next

func (it *IntIterator) Next() *CellInt

Next advances the iterator so that it.Cur points to the next filled cell in the table, and returns that cell. Returns nil once there are no more cells to be visited.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL