offheap: github.com/glycerine/offheap Index | Files | Directories

package offheap

import "github.com/glycerine/offheap"

Offheap

An off-heap hash-table for Go (golang). Originally called go-offheap-hashtable, but now shortened to just offheap.

The purpose here is to have a hash table that can work away from Go's Garbage Collector, to avoid long GC pause times.

We accomplish this by writing our own Malloc() and Free() implementation (see malloc.go) which requests memory directly from the OS.

The keys, values, and entire hash table is kept on off-heap storage. This storage can also optionally be backed by memory mapped file for speedy persistence and fast startup times.

Initial HashTable implementation inspired by the public domain C++ code of

https://github.com/preshing/CompareIntegerMaps

See also

http://preshing.com/20130107/this-hash-table-is-faster-than-a-judy-array/

for performance studies of the C++ code.

HashTable

The implementation is mostly in offheap.go, read that to start.

Maps pointer-sized integers to Cell structures, which in turn hold Val_t as well as Key_t structures.

Uses open addressing with linear probing. This makes it very cache friendly and thus very fast.

In the t.Cells array, UnHashedKey = 0 is reserved to indicate an unused cell. Actual value for key 0 (if any) is stored in t.ZeroCell. The hash table automatically doubles in size when it becomes 75% full. The hash table never shrinks in size, even after Clear(), unless you explicitly call Compact().

Basic operations: Lookup(), Insert(), DeleteKey(). These are the equivalent of the builtin map[uint64]interface{}.

As an example of how to specialize for a map[string]*Cell equivalent, see the following functions in the bytekey.go file:

func (t *HashTable) InsertStringKey(strkey string, value interface{}) bool
func (t *HashTable) LookupStringKey(strkey string) (Val_t, bool)
func (t *HashTable) DeleteStringKey(strkey string) bool

Example use:

h := offheap.NewHashTable(2)

// basic three operations are:
h.InsertStringKey("My number", 43)
val, ok := h.LookupStringKey("My number")
h.DeleteStringKey("My number")

Note that this library is only a starting point of source code, and not intended to be used without customization. Users of the HashTable will have to customize it by changing the definitions of Key_t and Val_t to suite their needs.

On Save(), serialization of the HashTable itself is done using msgpack to write bytes to the first page (4k bytes) of the memory mapped file. This uses github.com/tinylib/msgp which is a blazing fast msgpack serialization library. It is fast because it avoids reflection and pre-computes the serializations (using go generate based inspection of your go source). If you need to serialize your values into the Val_t, I would suggest evaluating the msgp for serialization and deserialization. The author, Philip Hofer, has done a terrific job and put alot of effort into tuning it for performance. If you are still pressed for speed, consider also ommitting the field labels using the '//msgp:tuple MyValueType' annotation. As Mr. Hofer says, "For smaller objects, tuple encoding can yield serious performance improvements." [https://github.com/tinylib/msgp/wiki/Preprocessor-Directives].

Related ideas:

https://gist.github.com/mish15/9822474 (using CGO)

CGO note: the cgo-malloc branch of this github repo has an implementation that uses CGO to call the malloc/calloc/free functions in the C stdlib. Using CGO gives up the save-to-disk instantly feature and creates a portability issue where you have linked against a specific version of the C stdlib. However if you are making/destroying alot of tables, the CGO apporach may be faster. This is because calling malloc and free in the standard C library are much faster than making repeated system calls to mmap().

more related ideas:

https://groups.google.com/forum/#!topic/golang-nuts/kCQP6S6ZGh0

not fully off-heap, but using a slice instead of a map appears to help GC quite alot too:

https://github.com/cespare/kvcache/blob/master/refmap.go

Index

Package Files

bytekey.go doc.go exists.go malloc.go offheap.go offheap_gen.go util.go verbose.go

Constants

const MetadataHeaderMaxBytes = 4096

metadata serialization size can never grow bigger than MetadataHeaderMaxBytes (currently one 4K page), without impacting backwards compatibility. We reserve this many bytes at the beginning of the memory mapped file for the metadata.

type ByteKeyHashTable Uses

type ByteKeyHashTable HashTable

ByteKeyHashTable shows how to specialize HashTable to handle the []byte type as a key. At present, given the current definition of Key_t (you should redefine Key_t if needed -- see offheap.go), only the first 64 bytes are used in the hash of the key.

func NewByteKeyHashTable Uses

func NewByteKeyHashTable(initialSize int64) *ByteKeyHashTable

NewByteKeyHashTable produces a new ByteKeyHashTable, one specialized for handling []byte as keys.

func (*ByteKeyHashTable) DeleteBK Uses

func (t *ByteKeyHashTable) DeleteBK(bytekey []byte) bool

DeleteBK removes an entry with a []byte key. By default only len(Key_t) bytes are used in the key.

func (*ByteKeyHashTable) InsertBK Uses

func (t *ByteKeyHashTable) InsertBK(bytekey []byte, value interface{}) bool

InsertBK is the insert function for []byte keys. By default only len(Key_t) bytes are used in the key.

func (*ByteKeyHashTable) LookupBK Uses

func (t *ByteKeyHashTable) LookupBK(bytekey []byte) (Val_t, bool)

LookupBK is the lookup function for []byte keys. By default only len(Key_t) bytes are used in the key.

type Cell Uses

type Cell struct {
    UnHashedKey uint64
    ByteKey     Key_t
    Value       Val_t // customize this to hold your value's data type entirely here.
}

Cell is the basic payload struct, stored inline in the HashTable. The cell is returned by the fundamental Lookup() function. The member Value is where the value that corresponds to the key (in ByteKey) is stored. Both the key (in ByteKey) and the value (in Value) are stored inline inside the hashtable, so that all storage for the hashtable is in the same offheap segment. The uint64 key given to fundamental Insert() method is stored in UnHashedKey. The hashed value of the UnHashedKey is not stored in the Cell, but rather computed as needed by the basic Insert() and Lookup() methods.

func (*Cell) DecodeMsg Uses

func (z *Cell) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Cell) EncodeMsg Uses

func (z *Cell) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Cell) GetInt Uses

func (cell *Cell) GetInt() int

GetInt retreives an integer value from the cell.

func (*Cell) GetString Uses

func (cell *Cell) GetString() string

GetString retreives a string value from the cell.Value.

func (*Cell) MarshalMsg Uses

func (z *Cell) MarshalMsg(b []byte) (o []byte, err error)

MarshalMsg implements msgp.Marshaler

func (*Cell) Msgsize Uses

func (z *Cell) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Cell) SetInt Uses

func (cell *Cell) SetInt(n int)

SetInt stores an integer value in the cell.

func (*Cell) SetString Uses

func (cell *Cell) SetString(s string)

SetString stores string s (up to val_t length, currently 56 bytes) in cell.Value.

func (*Cell) SetValue Uses

func (cell *Cell) SetValue(v interface{})

SetValue stores any value v in the Cell. Note that users of the library will need to extend this for their type. Only strings of length less than 56, and integers are handled by default.

func (*Cell) UnmarshalMsg Uses

func (z *Cell) UnmarshalMsg(bts []byte) (o []byte, err error)

UnmarshalMsg implements msgp.Unmarshaler

func (*Cell) ZeroValue Uses

func (cell *Cell) ZeroValue()

ZeroValue sets the cell's value to all zeros.

type HashTable Uses

type HashTable struct {
    MagicNumber   int     // distinguish between reading from empty file versus an on-disk HashTable.
    Cells         uintptr `msg:"-"`
    CellSz        uint64
    ArraySize     uint64
    Population    uint64
    ZeroUsed      bool
    ZeroCell      Cell
    OffheapHeader []byte     `msg:"-"`
    OffheapCells  []byte     `msg:"-"`
    Mmm           MmapMalloc `msg:"-"`
}

HashTable represents the off-heap hash table. Create a new one with NewHashTable(), then use Lookup(), Insert(), and DeleteKey() on it. HashTable is meant to be customized by the user, to reflect your choice of key and value types. See StringHashTable and ByteKeyHashTable for examples of this specialization process.

func NewHashFileBacked Uses

func NewHashFileBacked(initialSize int64, filepath string) *HashTable

func NewHashTable Uses

func NewHashTable(initialSize int64) *HashTable

Create a new hash table, able to hold initialSize count of keys.

func (*HashTable) CellAt Uses

func (t *HashTable) CellAt(pos uint64) *Cell

CellAt: fetch the cell at a given index. E.g. t.CellAt(pos) replaces t.Cells[pos]

func (*HashTable) Clear Uses

func (t *HashTable) Clear()

Clear does not resize the table, but zeroes-out all entries.

func (*HashTable) Compact Uses

func (t *HashTable) Compact()

Compact will compress the hashtable so that it is at most 75% full.

func (*HashTable) DecodeMsg Uses

func (z *HashTable) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*HashTable) DeleteCell Uses

func (t *HashTable) DeleteCell(cell *Cell)

DeleteCell deletes the cell pointed to by cell.

func (*HashTable) DeleteKey Uses

func (t *HashTable) DeleteKey(key uint64)

DeleteKey will delete the contents of the cell associated with key.

func (*HashTable) DestroyHashTable Uses

func (t *HashTable) DestroyHashTable()

DestroyHashTable frees the memory-mapping, returning the memory containing the hash table and its cells to the OS. By default the save-to-file-on-disk functionality in malloc.go is not used, but that can be easily activated. See malloc.go. Deferencing any cells/pointers into the hash table after destruction will result in crashing your process, almost surely.

func (*HashTable) Dump Uses

func (t *HashTable) Dump()

Dump provides a diagnostic dump of the full HashTable contents.

func (*HashTable) EncodeMsg Uses

func (z *HashTable) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*HashTable) Insert Uses

func (t *HashTable) Insert(key uint64) (*Cell, bool)

Insert a key and get back the Cell for that key, so as to enable assignment of Value within that Cell, for the specified key. The 2nd return value is false if key already existed (and thus required no addition); if the key already existed you can inspect the existing value in the *Cell returned.

func (*HashTable) InsertBK Uses

func (t *HashTable) InsertBK(bytekey []byte, value interface{}) bool

InsertBK is the insert function for []byte keys. By default only len(Key_t) bytes are used in the key.

func (*HashTable) InsertIntValue Uses

func (t *HashTable) InsertIntValue(key uint64, value int) bool

InsertIntValue inserts value under key in the table.

func (*HashTable) Lookup Uses

func (t *HashTable) Lookup(key uint64) *Cell

Lookup a cell based on a uint64 key value. Returns nil if key not found.

func (*HashTable) LookupBK Uses

func (t *HashTable) LookupBK(bytekey []byte) (Val_t, bool)

func (*HashTable) LookupBKInt Uses

func (t *HashTable) LookupBKInt(bytekey []byte) (int, bool)

func (*HashTable) MarshalMsg Uses

func (z *HashTable) MarshalMsg(b []byte) (o []byte, err error)

MarshalMsg implements msgp.Marshaler

func (*HashTable) Msgsize Uses

func (z *HashTable) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*HashTable) NewIterator Uses

func (tab *HashTable) NewIterator() *Iterator

NewIterator creates a new iterator for HashTable tab.

func (*HashTable) Repopulate Uses

func (t *HashTable) Repopulate(desiredSize uint64)

Repopulate expands the hashtable to the desiredSize count of cells.

func (*HashTable) Save Uses

func (t *HashTable) Save(background bool) error

Save syncs the memory mapped file to disk using MmapMalloc::BlockUntilSync(). If background is true, the save using BackgroundSync() instead of blocking.

func (*HashTable) UnmarshalMsg Uses

func (z *HashTable) UnmarshalMsg(bts []byte) (o []byte, err error)

UnmarshalMsg implements msgp.Unmarshaler

type Iterator Uses

type Iterator struct {
    Tab *HashTable
    Pos int64
    Cur *Cell // will be set to nil when done with iteration.
}

Iterator

sample use: given a HashTable h, enumerate h's contents with:

for it := offheap.NewIterator(h); it.Cur != nil; it.Next() {
  found = append(found, it.Cur.UnHashedKey)
}

func (*Iterator) DecodeMsg Uses

func (z *Iterator) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Iterator) Done Uses

func (it *Iterator) Done() bool

Done checks to see if we have already iterated through all cells in the table. Equivalent to checking it.Cur == nil.

func (*Iterator) EncodeMsg Uses

func (z *Iterator) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Iterator) MarshalMsg Uses

func (z *Iterator) MarshalMsg(b []byte) (o []byte, err error)

MarshalMsg implements msgp.Marshaler

func (*Iterator) Msgsize Uses

func (z *Iterator) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Iterator) Next Uses

func (it *Iterator) Next() *Cell

Next advances the iterator so that it.Cur points to the next filled cell in the table, and returns that cell. Returns nil once there are no more cells to be visited.

func (*Iterator) UnmarshalMsg Uses

func (z *Iterator) UnmarshalMsg(bts []byte) (o []byte, err error)

UnmarshalMsg implements msgp.Unmarshaler

type Key_t Uses

type Key_t [64]byte

Key_t is the basic type for keys. Users of the library will probably redefine this.

func (*Key_t) DecodeMsg Uses

func (z *Key_t) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Key_t) EncodeMsg Uses

func (z *Key_t) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Key_t) MarshalMsg Uses

func (z *Key_t) MarshalMsg(b []byte) (o []byte, err error)

MarshalMsg implements msgp.Marshaler

func (*Key_t) Msgsize Uses

func (z *Key_t) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Key_t) UnmarshalMsg Uses

func (z *Key_t) UnmarshalMsg(bts []byte) (o []byte, err error)

UnmarshalMsg implements msgp.Unmarshaler

type MmapMalloc Uses

type MmapMalloc struct {
    Path         string
    File         *os.File
    Fd           int
    FileBytesLen int64
    BytesAlloc   int64
    MMap         gommap.MMap // equiv to Mem, just avoids casts everywhere.
    Mem          []byte      // equiv to Mmap
}

The MmapMalloc struct represents either an anonymous, private region of memory (if path was "", or a memory mapped file if path was supplied to Malloc() at creation.

Malloc() creates and returns an MmapMalloc struct, which can then be later Free()-ed. Malloc() calls request memory directly from the kernel via mmap(). Memory can optionally be backed by a file for simplicity/efficiency of saving to disk.

For use when the Go GC overhead is too large, and you need to move the hash table off-heap.

func Growmap Uses

func Growmap(oldmap *MmapMalloc, newSize int64) (newmap *MmapMalloc, err error)

Growmap grows a memory mapping

func Malloc Uses

func Malloc(numBytes int64, path string) *MmapMalloc

Malloc() creates a new memory region that is provided directly by OS via the mmap() call, and is thus not scanned by the Go garbage collector.

If path is not empty then we memory map to the given path. Otherwise it is just like a call to malloc(): an anonymous memory allocation, outside the realm of the Go Garbage Collector. If numBytes is -1, then we take the size from the path file's size. Otherwise the file is expanded or truncated to be numBytes in size. If numBytes is -1 then a path must be provided; otherwise we have no way of knowing the size to allocate, and the function will panic.

The returned value's .Mem member holds a []byte pointing to the returned memory (as does .MMap, for use in other gommap calls).

func (*MmapMalloc) BackgroundSync Uses

func (mm *MmapMalloc) BackgroundSync()

BackgroundSync() schedules a sync to disk, but may return before it is done. Without a call to either BackgroundSync() or BlockUntilSync(), there is no guarantee that file has ever been written to disk at any point before the munmap() call that happens during Free(). See the man pages msync(2) and mmap(2) for details.

func (*MmapMalloc) BlockUntilSync Uses

func (mm *MmapMalloc) BlockUntilSync()

BlockUntilSync() returns only once the file is synced to disk.

func (*MmapMalloc) Free Uses

func (mm *MmapMalloc) Free()

Free eleases the memory allocation back to the OS by removing the (possibly anonymous and private) memroy mapped file that was backing it. Warning: any pointers still remaining will crash the program if dereferenced.

func (*MmapMalloc) TruncateTo Uses

func (mm *MmapMalloc) TruncateTo(newSize int64)

TruncateTo enlarges or shortens the file backing the memory map to be size newSize bytes. It only impacts the file underlying the mapping, not the mapping itself at this point.

type StringHashTable Uses

type StringHashTable HashTable

StringHashTable shows how to specialize HashTable to handle strings as keys. At present, given the current definition of Key_t (you should redefine Key_t if needed -- see offheap.go), only the first 64 bytes are used in the hash of the key.

func NewStringHashTable Uses

func NewStringHashTable(initialSize int64) *StringHashTable

NewStringHashTable produces a new StringHashTable, one specialized for handling keys of type string.

func (*StringHashTable) DeleteStringKey Uses

func (t *StringHashTable) DeleteStringKey(strkey string) bool

DeleteStringKey deletes the cell (if there is one) that has been previously inserted with the given strkey string key.

func (*StringHashTable) DumpStringKey Uses

func (t *StringHashTable) DumpStringKey()

DumpStringKey provides a diagnostic printout of the contents of a hashtable that is using strings as keys.

func (*StringHashTable) InsertStringKey Uses

func (t *StringHashTable) InsertStringKey(strkey string, value interface{}) bool

InsertStringKey inserts a value with a key that is a string.

func (*StringHashTable) LookupStringKey Uses

func (t *StringHashTable) LookupStringKey(strkey string) (Val_t, bool)

LookupStringKey looks up a value based on a key that is a string.

type Val_t Uses

type Val_t [56]byte

Val_t is the basic type for values stored in the cells in the table. Users of the library will probably redefine this to be a different size at the very least.

func (*Val_t) DecodeMsg Uses

func (z *Val_t) DecodeMsg(dc *msgp.Reader) (err error)

DecodeMsg implements msgp.Decodable

func (*Val_t) EncodeMsg Uses

func (z *Val_t) EncodeMsg(en *msgp.Writer) (err error)

EncodeMsg implements msgp.Encodable

func (*Val_t) GetInt Uses

func (v *Val_t) GetInt() int

GetInt gets an int value for Val_t v.

func (*Val_t) GetString Uses

func (v *Val_t) GetString() string

GetString retreives a string value for Val_t v.

func (*Val_t) MarshalMsg Uses

func (z *Val_t) MarshalMsg(b []byte) (o []byte, err error)

MarshalMsg implements msgp.Marshaler

func (*Val_t) Msgsize Uses

func (z *Val_t) Msgsize() (s int)

Msgsize returns an upper bound estimate of the number of bytes occupied by the serialized message

func (*Val_t) SetInt Uses

func (v *Val_t) SetInt(n int)

SetInt sets an int value for Val_t v.

func (*Val_t) SetString Uses

func (v *Val_t) SetString(s string)

SetString sets a string value for Val_t v.

func (*Val_t) UnmarshalMsg Uses

func (z *Val_t) UnmarshalMsg(bts []byte) (o []byte, err error)

UnmarshalMsg implements msgp.Unmarshaler

Directories

PathSynopsis
keyval
ohser

Package offheap imports 9 packages (graph). Updated 2017-04-17. Refresh now. Tools for package owners.