bubt

package
v0.0.0-...-b47ea92 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 10, 2021 License: MIT Imports: 21 Imported by: 2

README

Bottoms up Btree

GoDoc

The short version bubt means Bottoms-up-Btree. It simply says how the btree was built, which is bottoms up. The idea behind this implementation is that sorted index of key,value entries are built once, marked as immutable and available for concurrent readers.

Bottoms up construction

  • Building a bottoms-up tree starts with an iterator object that supply a stream of key,value entries, and associated fields for each entry.
  • Incoming key,value entries are populated in a leaf node, called z-node.
  • z-nodes can be stored in a separate file.
  • Once an z-node is fully utilized, a new entry, called m-entry, is composed with:
    • Key element which is the z-node first entry's key element.
    • Value element which is the file position of the z-node.
  • m-entries are inserted in the intermediate node called m-node.
  • For every z-node, fully utilized, a m-entry is created and added into the m-node.
  • As and when z-nodes are fully utilized they are flushed to disk in append only mode.
  • When a m-node is fully utilized another level of intermediate node is created which points down to the fully utilized m-node.
  • As and when m-nodes are fully utilized they are flushed to disk in append only mode.
  • Size of z-node is same across the tree and configurable with each build.
  • Size of m-node is same across the tree and configurable with each build.
  • Finally the root node is flushed.
  • After the root node, a single info-block of MarkerBlocksize is flushed. Infoblock contains arguments used to build the snapshot and also some statistics about the snapshot.
  • After info-block, one or more blocks of index metadata (blocksize same as m-node) is flushed.
  • After metadata, a single block, (blocksize is MarkerBlocksize) of marker-block is flushed.

** TODO: block diagram of disk format**

Value log

To optimize on the write-amplification, Bubt instances can be constructed with values (from each key,value entry) can be stored in separate file. Note that this might have some negative impact on disk-amplication and in come cases can decrease the throughput of random Get operations.

Metadata, info-block

Applications can attach an opaque blob of metadata with every bubt index. This can be supplied as argument to the Build() API. It is upto the application to interpret this metadata.

Similarly info-block saves all the arguments / parameters supplied to the Build() API, along with useful statistics about the snapshit as JSON property. The size of info-block cannot exceed MarkerBlocksize.

** TODO: shape of info-block property**

Background routines

While building the btree, separate go-routines are spawned to flush data into file. There will be one go-routine for each index file.

A fully formed bubt instance can be opened using OpenSnapshot for read only access and can be shared between go-routines. Snapshots can be opened across multiple process without the danger of any race conditions. None of the snapshots spawn go-routines. Everytime a snapshot is shared with another go-routine, its reference count should be bumped. Only when all snapshot references are released, snapshot can be closed. Likewise, only when all snapshots are closed, the last reference can destory the snapshot.

Panic and Recovery

Panics are to expected when APIs are misused. Programmers might choose to ignore the errors, but not panics. For example:

  • For disk errors while building the tree or reading from snapshots.
  • If input iterator returns error other than io.EOF.
  • If bytes required to encode a key,value entry is more than the zblock's size.
  • Using mutation APIs, like Set, Delete, Commit, on View object.
  • Using mutation APIs like BeginTxn, Set, SetCAS, Delete, on Snapshot.
  • Using mutation APIs, like Set, Delete, Delcursor, on Cursor object.
  • Validate() API will panic, if:
    • keys in the bubt instance are not in sort order.
    • number of entries in the bubt instance does not match with header.
    • disk footprint is unreasonably larger.

None of the panics will automatically recover. It is upto the caller to recover or fail-quick as the case may be.

Documentation

Overview

Package bubt builds Btree bottoms up and keeps it immutable. By having it as immutable, it is possible to attain near 100% node utilization, and allow concurrent reads on fully built tree, without locks.

Index

Constants

View Source
const MarkerBlocksize = 4096

MarkerBlocksize to close snapshot file.

View Source
const MarkerByte = 0xAB

MarkerByte to populate Markerblock.

Variables

This section is empty.

Functions

func LogComponents

func LogComponents(components ...string)

LogComponents enable logging. By default logging is disabled, if applications want log information for bubt components call this function with "self" or "all" or "bubt" as argument.

func PurgeSnapshot

func PurgeSnapshot(name string, paths []string)

PurgeSnapshot remove disk footprint of this snapshot.

Types

type Bubt

type Bubt struct {
	// contains filtered or unexported fields
}

Bubt instance can be used to persist sorted {key,value} entries in immutable btree, built bottoms up and not updated there after.

func NewBubt

func NewBubt(
	name string, paths []string,
	mblocksize, zblocksize, vblocksize int64) (tree *Bubt, err error)

NewBubt create a Bubt instance to build a new bottoms-up btree. If zblocksize == 0, then zblocksize will be same as mblocksize. if vblocksize == 0, then values will be stored in value log.

func (*Bubt) AppendValuelogs

func (tree *Bubt) AppendValuelogs(vblocksize int64, appendid string, valuelogs []string)

AppendValuelogs builder should use `valuelogs` files instead of creating a new set of value-logs corresponding to each z-index files, vblocksize should be same as used while creating `valuelogs`. appendid, should be same as the index.ID() whose value-logs are to be appended.

func (*Bubt) Build

func (tree *Bubt) Build(itere api.EntryIterator, metadata []byte) (err error)

Build starts building the tree from iterator, iterator is expected to be a full-table scan over another data-store.

func (*Bubt) Close

func (tree *Bubt) Close()

Close instance after building the btree. This will mark disk files as immutable for rest of its life-time. Use OpenSnapshot for reading.

func (*Bubt) TombstonePurge

func (tree *Bubt) TombstonePurge(what bool)

TombstonePurge to enable or disable purging tombstone entries while Building a bubt instance from an iterator.

func (*Bubt) Writemetadata

func (tree *Bubt) Writemetadata(metadata []byte) (int, error)

type Cursor

type Cursor struct {
	// contains filtered or unexported fields
}

Cursor object maintains an active pointer into index. Use OpenCursor on Txn object to create a new cursor.

func (*Cursor) Delcursor

func (cur *Cursor) Delcursor(lsm bool)

Delcursor not allowed.

func (*Cursor) Delete

func (cur *Cursor) Delete(key, oldvalue []byte, lsm bool) []byte

Delete not allowed.

func (*Cursor) GetNext

func (cur *Cursor) GetNext() (key, value []byte, deleted bool, err error)

GetNext move cursor to next entry and return its key, value, whether it is deleted, err will be io.EOF or any other disk error.

func (*Cursor) Key

func (cur *Cursor) Key() (key []byte, deleted bool)

Key return key at cursor.

func (*Cursor) Set

func (cur *Cursor) Set(key, value, oldvalue []byte) []byte

Set not allowed.

func (*Cursor) Value

func (cur *Cursor) Value() (value []byte)

Value return value at cursor.

func (*Cursor) YNext

func (cur *Cursor) YNext(fin bool) (key,
	value []byte, seqno uint64, deleted bool, err error)

YNext can be used for lsm-sort. Similar to GetNext, but includes the seqno at which the entry was created/updated/deleted.

type Snapshot

type Snapshot struct {
	// contains filtered or unexported fields
}

Snapshot to read index entries persisted using Bubt builder. Since no writes are allowed on the btree, any number of snapshots can be opened for reading.

func OpenSnapshot

func OpenSnapshot(
	name string, paths []string, mmap bool) (snap *Snapshot, err error)

OpenSnapshot from paths.

func (*Snapshot) BeginTxn

func (snap *Snapshot) BeginTxn(id uint64) api.Transactor

BeginTxn is not allowed.

func (*Snapshot) Close

func (snap *Snapshot) Close()

Close snapshot, will release all in-memory resources but will keep the disk files. All Opened-Snapshots must be closed before it can be destoryed.

func (*Snapshot) Count

func (snap *Snapshot) Count() int64

Count number of indexed entries.

func (*Snapshot) Delete

func (snap *Snapshot) Delete(key, oldvalue []byte, lsm bool) ([]byte, uint64)

Delete is not allowed.

func (*Snapshot) Destroy

func (snap *Snapshot) Destroy()

Destroy snapshot will remove disk footprint of the btree. Can be called only after Close is called on all OpenSnapshots.

func (*Snapshot) Footprint

func (snap *Snapshot) Footprint() int64

Footprint return the size occupied by this instance on disk.

func (*Snapshot) Get

func (snap *Snapshot) Get(
	key, value []byte) (actualvalue []byte, cas uint64, deleted, ok bool)

Get value for key, if value argument is not nil it will be used to copy the entry's value. Also returns entry's cas, whether entry is marked as deleted by LSM. If ok is false, then key is not found.

func (*Snapshot) Getseqno

func (snap *Snapshot) Getseqno() uint64

Getseqno return the latest mutation's seqno persisted in this snapshot.

func (*Snapshot) ID

func (snap *Snapshot) ID() string

ID of snapshot, same as name argument passed to OpenSnapshot.

func (*Snapshot) Info

func (snap *Snapshot) Info() s.Settings

Info return parameters used to build the snapshot and statistical information.

mfile      : m-index file name.
zfiles     : list of z-index file name.
vfiles     : list of value log files for each each z-index, if present.
zblocksize : block size used for z-index file.
mblocksize : block size used for m-index file.
vblocksize : block size used for value log.
buildtime  : time taken, in nanoseconds, to build this snapshot.
epoch      : snapshot born time, in nanosec, after January 1, 1970 UTC.
seqno      : maximum seqno contained in this snapshot.
keymem     : total payload size for all keys.
valmem     : total payload size for all values.
paddingmem : total bytes used for padding m-block and z-block alignment.
numpaths   : number of paths for this instance.
n_zblocks  : total number of blocks in z-index files.
n_mblocks  : total number of blocks in m-index files.
n_ablocks  : total number of blocks in value log, before appending.
n_vblocks  : total number of blocks in value log.
n_count    : number of entries in this snapshot, includes deleted.
n_deleted  : number of entries marked as deleted.
footprint  : disk footprint for this snapshot.

func (*Snapshot) Log

func (snap *Snapshot) Log()

Log vital information

func (*Snapshot) Metadata

func (snap *Snapshot) Metadata() []byte

Metadata return metadata blob associated with this snapshot.

func (*Snapshot) Scan

func (snap *Snapshot) Scan() api.Iterator

Scan return a full table iterator, if iteration is stopped before reaching end of table (io.EOF), application should call iterator with fin as true. EG: iter(true)

func (*Snapshot) ScanEntries

func (snap *Snapshot) ScanEntries() api.EntryIterator

ScanEntry return a full table iterator, if iteration is stopped before reaching end of table (io.EOF), application should call iterator with fin as true. EG: iter(true)

func (*Snapshot) Set

func (snap *Snapshot) Set(key, value, oldvalue []byte) (ov []byte, cas uint64)

Set is not allowed.

func (*Snapshot) SetCAS

func (snap *Snapshot) SetCAS(
	key, value, oldvalue []byte, cas uint64) ([]byte, uint64, error)

SetCAS is not allowed.

func (*Snapshot) Validate

func (snap *Snapshot) Validate()

Validate snapshot on disk. This is a costly call, use it only for testing and administration purpose.

func (*Snapshot) Valuelogs

func (snap *Snapshot) Valuelogs() []string

func (*Snapshot) View

func (snap *Snapshot) View(id uint64) api.Transactor

View start a read only transaction. Any number of views can be created on this snapshot provided they are not concurrently accessed.

type View

type View struct {
	// contains filtered or unexported fields
}

View read only transaction instance.

func (*View) Abort

func (view *View) Abort()

Abort view, must be called once done with the view.

func (*View) Commit

func (view *View) Commit() error

Commit not allowed.

func (*View) Delete

func (view *View) Delete(key, oldvalue []byte, lsm bool) []byte

Delete not allowed.

func (*View) Get

func (view *View) Get(
	key, value []byte) (v []byte, cas uint64, deleted, ok bool)

Get value for key, if value argument is not nil it will be used to copy the entry's value. Also return whether entry is marked as deleted by LSM. If ok is false, then key is not found.

func (*View) ID

func (view *View) ID() uint64

ID return view-transaction id.

func (*View) OpenCursor

func (view *View) OpenCursor(key []byte) (api.Cursor, error)

OpenCursor open an active cursor, point at key, inside the index.

func (*View) Set

func (view *View) Set(key, value, oldvalue []byte) []byte

Set not allowed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL