gonudb

package module
v0.4.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2023 License: BSL-1.0 Imports: 4 Imported by: 2

README

Gonudb

Gonudb is an append-only key/value datastore written in Go.

Check Status Test Status Go Report Card go.dev reference

Overview

Gonudb is a port of NuDB, a C++ key/value store.

A Gonudb datastore comprises a data file holding keys and values stored sequentially and an accompanying key file which forms an on-disk hash table indexing the values stored in the data file.

During commits a log file is created to store bookkeeping information that may be used to repair the datastore in the event of a failure.

The data file and key file are independent and a new key file may be rebuilt from the data file if necessary, potentially with an alternate hashing scheme.

Installation

Execute go get github.com/iand/gonudb within a Go module directory to add it to your module.

Usage

Gonudb is primarily a library. Import package github.com/iand/gonudb to use. A sample application that demonstrates some simple inserts and fetches is provided in cmd/gonudbsample.

An admin tool can be found in cmd/gonudbadmin which provides some commands for inspecting and validating the files that comprise a store.

Install by executing go install github.com/iand/gonudb/cmd/gonudbadmin from the root of the repository.

  • gonudbadmin info can be used to view charactistic information about any of the three files used by gonudb (data, key and log files).
  • gonudbadmin verify verifies the consistency of data and key files and shows some statistics on the data they hold.

Design

Gonudb shares the design ideals that motivated NuDB (but see Status below):

  1. Writes should not block reads.
  2. Reads should be limited only by the SSD's IOPS limit.
  3. A read for a non-present key should require one IOP.
  4. A read for a present key whose data can be read in a single IOP should only require two IOPs, one to figure out where it is and one to read it in.

Keys and values are stored sequentially in an append only data file. The data file begins with a header that contains characteristic information about the file such as the version of the encoding scheme, a datastore identifier and an application identifier. Data records follow immediately on from the header. Each record comprises the size of the value, followed by the size of the key, followed by the key, followed by the value data. The data file is considered to be immutable and there are no delete or mutate operations.

Inserts are buffered in memory and periodically committed to disk. Clients are throttled based on the rate at which data is flushed to disk. Values are immediately discoverable via their key and may be read from memory or disk.

Keys are hashed and written to buckets stored in the key file. As with the data file, the key file begins with a header containing characteristic information. The key file's version, datastore identifier and application identifier must match those in the data file header. Additionally the key file header contains the hash salt, the block size of each bucket and the target load factor which determines when a bucket should be split. Buckets are a fixed size and written sequentially after the header which enables them to the be easily located by index.

Each bucket is assigned a range of hash values and entries within a bucket are ordered by hash. When the number of entries in a bucket exceeds the load factor it undergoes a split and its entries are rehashed across the pair of buckets using the linear hashing algorithm. When a bucket exceeds its capacity it is spilled to the data file and replaced with an empty bucket containing a pointer to the spill record. A spilled bucket may spill multiple times with the resulting spill records forming a linked list in the data file.

In the best case reading a record from the datastore requires one read from the key file to load the relevant bucket and a read from the data file to access the value. Additional reads from the data file may be required to resolve hash collisions and to load spill records. Read performance is independent of the size of the datastore and the size of buckets in the key file may be tuned to the block size of the underlying physical media so loading a bucket may only take a single IOP.

Status

Version 0.1.0 is an alpha quality functional port of the original NuDB suitable for testing with expendable loads. Correctness and safety has been prioritised over performance. Locks are broad in scope and treat reads and writes with equal priority. Future work will tune the locking bahaviour to better meet the goal of writes not blocking reads.

High priority tasks include:

  • Add recover from partial writes
  • Add rekey admin function.
  • Tune locking strategy

Additional features under consideration:

  • Allow alternate hashing functions to be specified.

Author

Go port written by:

License

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE or copy at http://www.boost.org/LICENSE_1_0.txt)

Documentation

Index

Constants

View Source
const (
	LogLevelDiagnostics = 1 // log level increment for diagnostics logging
	LogLevelTrace       = 2 // log level increment for verbose tracing
)

Variables

View Source
var (
	ErrAppNumMismatch     = internal.ErrAppNumMismatch
	ErrDataMissing        = internal.ErrDataMissing
	ErrDataTooLarge       = internal.ErrDataTooLarge
	ErrDifferentVersion   = internal.ErrDifferentVersion
	ErrHashMismatch       = internal.ErrHashMismatch
	ErrInvalidBlockSize   = internal.ErrInvalidBlockSize
	ErrInvalidBucketCount = internal.ErrInvalidBucketCount
	ErrInvalidCapacity    = internal.ErrInvalidCapacity
	ErrInvalidDataRecord  = internal.ErrInvalidDataRecord
	ErrInvalidKeySize     = internal.ErrInvalidKeySize
	ErrInvalidLoadFactor  = internal.ErrInvalidLoadFactor
	ErrInvalidRecordSize  = internal.ErrInvalidRecordSize
	ErrInvalidSpill       = internal.ErrInvalidSpill
	ErrKeyExists          = internal.ErrKeyExists
	ErrKeyMismatch        = internal.ErrKeyMismatch
	ErrKeyMissing         = internal.ErrKeyMissing
	ErrKeyNotFound        = internal.ErrKeyNotFound
	ErrKeySizeMismatch    = internal.ErrKeySizeMismatch
	ErrKeyTooLarge        = internal.ErrKeyTooLarge
	ErrKeyWrongSize       = internal.ErrKeyWrongSize // deprecated: use ErrKeyMissing and ErrKeyTooLarge instead
	ErrNotDataFile        = internal.ErrNotDataFile
	ErrNotKeyFile         = internal.ErrNotKeyFile
	ErrNotLogFile         = internal.ErrNotLogFile
	ErrShortKeyFile       = internal.ErrShortKeyFile
	ErrUIDMismatch        = internal.ErrUIDMismatch
)

Functions

func CreateStore

func CreateStore(datPath, keyPath, logPath string, appnum, salt uint64, blockSize int, loadFactor float64) error

func NewSalt

func NewSalt() uint64

func Version

func Version() string

Types

type Bucket

type Bucket struct {
	// contains filtered or unexported fields
}

A Bucket contains a set of key entries that form part of the data store's index.

func (*Bucket) ActualSize

func (b *Bucket) ActualSize() int

ActualSize returns the serialized bucket size, excluding empty space

func (*Bucket) BlockSize

func (b *Bucket) BlockSize() int

BlockSize returns the physical size of a key file bucket.

func (*Bucket) Capacity

func (b *Bucket) Capacity() int

Capacity returns the maximum number of key entries that can be held in the bucket.

func (*Bucket) Count

func (b *Bucket) Count() int

Count returns the number of key entries in the bucket

func (*Bucket) Entry

func (b *Bucket) Entry(idx int) BucketEntry

Entry returns the record for a key entry

func (*Bucket) Has

func (b *Bucket) Has(h uint64) bool

Has reports whether the bucket contains an entry with the given hash.

func (*Bucket) HashRange

func (b *Bucket) HashRange() (uint64, uint64)

HashRange returns the range of hashed keys that are contained in the bucket.

func (*Bucket) IsEmpty

func (b *Bucket) IsEmpty() bool

IsEmpty reports whether the bucket has any key entries.

func (*Bucket) Spill

func (b *Bucket) Spill() int64

Spill returns offset in the store's data file of next spill record or 0 is there no spill.

type BucketEntry

type BucketEntry struct {
	// Offset is the position in the store's data file of the data record.
	Offset int64

	// Size is the size of the data value within the data record.
	Size int64

	// Hash is the hashed version of the key used to insert the data value.
	Hash uint64
}

type BucketScanner

type BucketScanner struct {
	// contains filtered or unexported fields
}

BucketScanner implements a sequential scan through a key file. Successive calls to the Next method will step through the buckets in the file, including spilled buckets in the data file.

func (*BucketScanner) Bucket

func (s *BucketScanner) Bucket() *Bucket

Bucket returns the current bucket. Should not be called until Next has been called. The bucket is backed by data that may be overwritten with a call to Next so should not be retained.

func (*BucketScanner) Close

func (s *BucketScanner) Close() error

Close closes the underlying reader used by the scanner.

func (*BucketScanner) Err

func (s *BucketScanner) Err() error

Err returns the first non-EOF error that was encountered by the BucketScanner.

func (*BucketScanner) Index

func (s *BucketScanner) Index() int

Index returns the index of the current bucket. Should not be called until Next has been called. Spill buckets share an index with their parent.

func (*BucketScanner) IsSpill

func (s *BucketScanner) IsSpill() bool

IsSpill reports whether the current bucket was read from a data store spill.

func (*BucketScanner) Next

func (s *BucketScanner) Next() bool

Next reads the next bucket in sequence, including spills to the data store. It returns false if it encounters an error or there are no more buckets to read.

type RecordScanner

type RecordScanner struct {
	// contains filtered or unexported fields
}

RecordScanner implements a sequential scan through a store's data file. Successive calls to the Next method will step through the records in the file. Note that the scanner does not include data buffered in memory. Call Flush to ensure all written data is visible to the scanner.

func (*RecordScanner) Close

func (s *RecordScanner) Close() error

func (*RecordScanner) Err

func (s *RecordScanner) Err() error

Err returns the first non-EOF error that was encountered by the RecordScanner.

func (*RecordScanner) IsData

func (s *RecordScanner) IsData() bool

IsData reports whether the current record is a data record

func (*RecordScanner) IsSpill

func (s *RecordScanner) IsSpill() bool

IsSpill reports whether the current record is a bucket spill

func (*RecordScanner) Key

func (s *RecordScanner) Key() string

Size returns the key of the current record

func (*RecordScanner) Next

func (s *RecordScanner) Next() bool

Next reads the next bucket in sequence, including spills to the data store. It returns false if it encounters an error or there are no more buckets to read.

func (*RecordScanner) Reader

func (s *RecordScanner) Reader() io.Reader

Reader returns an io.Reader that may be used to read the data from the record. Should not be called until Next has been called. The Reader is only valid for use until the next call to Next().

func (*RecordScanner) RecordSize

func (s *RecordScanner) RecordSize() int64

RecordSize returns the number of bytes occupied by the current record including its header

func (*RecordScanner) Size

func (s *RecordScanner) Size() int64

Size returns the size of the current record's data in bytes

type Store

type Store struct {
	// contains filtered or unexported fields
}

func OpenStore

func OpenStore(datPath, keyPath, logPath string, options *StoreOptions) (*Store, error)

func (*Store) AppNum

func (s *Store) AppNum() uint64

AppNum returns the store's application-defined integer constant.

func (*Store) BlockSize

func (s *Store) BlockSize() uint16

BlockSize returns the physical size of a key file bucket.

func (*Store) BucketScanner

func (s *Store) BucketScanner() *BucketScanner

BucketScanner returns a scanner that may be used to iterate the datastore's index of keys. The caller is responsible for calling Close on the scanner after use.

func (*Store) Close

func (s *Store) Close() error

func (*Store) DataSize added in v0.2.1

func (s *Store) DataSize(key string) (int64, error)

DataSize returns the size of the data record associated with a key.

func (*Store) Err

func (s *Store) Err() error

Err returns an error if the store is in an error state, nil otherwise

func (*Store) Exists added in v0.2.1

func (s *Store) Exists(key string) (bool, error)

Exists reports whether a data record is associated with a key.

func (*Store) Fetch

func (s *Store) Fetch(key string) ([]byte, error)

Fetch fetches the value associated with key from the store.

func (*Store) FetchReader

func (s *Store) FetchReader(key string) (io.Reader, error)

Fetch fetches a reader that may be used to read the value associated with a key.

func (*Store) Flush

func (s *Store) Flush() error

func (*Store) Insert

func (s *Store) Insert(key string, value []byte) error

Insert adds a key/value pair to the store. Zero length values are not supported.

func (*Store) Rate added in v0.2.0

func (s *Store) Rate() float64

Rate returns the data write rate in bytes per second.

func (*Store) RecordCount added in v0.2.0

func (s *Store) RecordCount() int

RecordCount returns the number of data records in the store.

func (*Store) RecordScanner

func (s *Store) RecordScanner() *RecordScanner

RecordScanner returns a scanner that may be used to iterate the datastore's values. The caller is responsible for calling Close on the scanner after use.

func (*Store) UID

func (s *Store) UID() uint64

AppNum returns the store's unique id that was generated on creation.

func (*Store) Version

func (s *Store) Version() uint16

Version returns the version number of the store's data format.

type StoreOptions

type StoreOptions struct {
	Logger                 logr.Logger
	BackgroundSyncInterval time.Duration
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL