btree

package module
v0.0.0-...-213e92d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 14, 2022 License: Apache-2.0 Imports: 16 Imported by: 2

Documentation

Overview

Supplies API to append/fetch key/value/docid from kv-file. kv-file is opened and managed by the WStore structure. entry format is,

| 4-byte size | size-byte value |

Maximum size of each entry is int32, that is 2^31.

translates btree blocks from persistant storage to in-memory data structure, called btree-node. A btree node can be a lnode (also called leaf node) or it can be a inode. `block` structure is fundamental to both type of nodes.

Btree indexing algorithm for json {key,docid,value} triplets. `keys` and `values` are expected to be in json, while `docid` is the primary key of json document which contains the key fragment. `value` can optionally be used to store fragment of a document. since keys generated for seconday indexes may not be unique, indexing a.k.a sorting is done on {key,docid}.

This module provides a lock free interface to cache data structure. The general idea is to create an array of pointers, each pointing to the head of DCacheItem linked list.

Since nodes are cached based on their file-position (fpos), fpos is hashed to index into the array. Refer indexFor() to know how.

Subsequenly we walk the linked list for a match in fpos and corresponding cached node.

For deleting we simply remove the node from the list.

For inserting, we prepend the new DCacheItem as the head of the list, and walk the remaining list to delete the item, if it is already present.

(fpos & hashmask) >> rshift

|
|   *--------*
*-->| head   | -->|DcacheItem|-->|DcacheItem|-->|DcacheItem|
    *--------*
|   *--------*
*-->| head   | -->|DcacheItem|-->|DcacheItem|-->|DcacheItem|
    *--------*

    ...

|   *--------*
*-->| head   | -->|DcacheItem|-->|DcacheItem|-->|DcacheItem|
    *--------*

defer goroutine that handles mvcc snapshots. Functionalites of this process augments the functionalit of MVCC process.

Manages list of free blocks in btree index-file.

Manages head sector of btree index-file. Head sector contains the following items,

rootFileposition int64
timestamp int64
sectorsize int64
flistsize int64
blocksize int64
maxkeys int64
pick int64
crc uint32

Index mutation due to {key,docid,value} insert.

MVCC controller process.

Handles all btree traversals except, insert() and remove()

The following is the general idea on cache structure.

                                |
*------------*    WRITE         |     READ       *------------*
|   inode    |       ^          |       ^        |   inode    |
| ping-cache |       |          |       |        | pong-cache |
|            |       *<-----*-----------*        |            |
|   lnode    |       |      |   |       *------->|   lnode    |
| ping-cache |       |      |   |     ncache()   | pong-cache |
*------------*       |  commitQ |                *------------*
      ^              V      ^   |        (Locked access using sync.Mutex)
      |           *------*  |   |

commits*-----------| MVCC |<-* | recyles *------* | reclaims | |

*----->ping2Pong() (atomic, no locking)
           |
           |

The cycle of ping-pong,

  • reads will always refer to the pong-cache.

  • reads will populate the cache from disk, when ever cache lookup fails.

  • writes will refer to the commitQ maintained by MVCC, if node is not in commitQ it will refer to pong-cache.

  • ping-cache is operated only by the MVCC controller.

  • MVCC controller will _populate_ the ping-cache when new nodes are generated due to index mutations.

  • MVCC controller will _evict_ the pong-cache as and when nodes become stale due to index mutations.

  • ping2Pong() happens when snapshot is flushed to disk.

  • pong becomes ping, and MVCC controller will _populate_ and _evict_ the newly flipped ping-cache based on commited, recycled and reclaimed node, before allowing further mutations.

Package btree Data store for btree, organised in two files, index-file and kv-file.

index-file,

contains head block, list of free blocks within index file,
and btree-blocks.

head,
  512 byte sector written at the head of the file. contains reference to
  the root bock, head-sector-size, free-list-size and block-size.
freelist,
  contains a list of 8-byte offset into the index file that contains
  free blocks.

kv-file,

contains key, value, docid bytes. They are always added in append
only mode, and a separate read-fd fetches them in random-access. Refer to
appendkv.go for more information.

Common functions used across test cases.

Contains necessary functions to do index writing.

Index

Constants

View Source
const (
	// FIXME : Is there a better way to learn sizeof a struct.
	BLK_KEY_SIZE   = 20 // bytes
	BLK_VALUE_SIZE = 8  // bytes
	BLK_OVERHEAD   = 16 // bytes, leaf+size field
	TRUE           = 1
	FALSE          = 0
)
View Source
const (
	DEFER_ADD byte = iota
	DEFER_DELETE
)
View Source
const (
	WS_SAYHI byte = iota
	WS_CLOSE      // {WS_CLOSE}

	// messages to mvcc goroutine
	WS_ACCESS      // {WS_ACCESS} -> timestamp int64
	WS_RELEASE     // {WS_RELEASE, timestamp} -> minAccess int64
	WS_SETSNAPSHOT // {WS_SETSNAPSHOT, offsets []int64, root int64, timestamp int64}

	// messages to defer routine
	WS_PINGCACHE    // {WS_PINGCACHE, what byte, fpos int64, node Node}
	WS_PINGKD       // {WS_PINGKD, fpos int64, key []byte}
	WS_MV           // {WS_MV, mv *MV}
	WS_SYNCSNAPSHOT // {WS_MV, minAccess int64}
)
View Source
const (
	IO_FLUSH byte = iota
	IO_APPEND
	IO_CLOSE
)
View Source
const (
	OFFSET_SIZE = 8                  // 64 bit offset
	SECTOR_SIZE = 512                // Disk drive sector size in bytes.
	FLIST_SIZE  = 1024 * OFFSET_SIZE // default free list size in bytes.
	BLOCK_SIZE  = 1024 * 64          // default block size in bytes.
)

constants that are relevant for index-file and kv-file

Variables

This section is empty.

Functions

func RandomKey

func RandomKey(rnd *rand.Rand) string

func RandomValue

func RandomValue(rnd *rand.Rand) string

func TestData

func TestData(count int, seed int64) ([]*TestKey, []*TestValue)

Types

type BTree

type BTree struct {
	Config
	// contains filtered or unexported fields
}

btree instance. Typical usage, where `conf` is Config structure.

bt = btree.NewBTree( btree.NewStore( conf ))

any number of BTree instances can be created.

func NewBTree

func NewBTree(store *Store) *BTree

Create a new instance of btree. `store` will be used to persist btree blocks, key-value data and associated meta-information.

func (*BTree) Check

func (bt *BTree) Check()

func (*BTree) Close

func (bt *BTree) Close()

Opposite of NewBTree() API, make sure to call this on every instance of BTree before exiting.

func (*BTree) Contains

func (bt *BTree) Contains(key Key) bool

func (*BTree) Count

func (bt *BTree) Count() int64

func (*BTree) DocidSet

func (bt *BTree) DocidSet() <-chan []byte

func (*BTree) Drain

func (bt *BTree) Drain()

func (*BTree) Equals

func (bt *BTree) Equals(key Key) bool

func (*BTree) Front

func (bt *BTree) Front() ([]byte, []byte, []byte)

func (*BTree) FullSet

func (bt *BTree) FullSet() <-chan []byte

func (*BTree) Insert

func (bt *BTree) Insert(key Key, v Value) bool

func (*BTree) KeySet

func (bt *BTree) KeySet() <-chan []byte

func (*BTree) LevelCount

func (bt *BTree) LevelCount() ([]int64, int64, int64)

func (*BTree) Lookup

func (bt *BTree) Lookup(key Key) chan []byte

func (*BTree) LookupDirty

func (bt *BTree) LookupDirty(key Key) chan []byte

func (*BTree) Remove

func (bt *BTree) Remove(key Key) bool

func (*BTree) Show

func (bt *BTree) Show()

func (*BTree) ShowKeys

func (bt *BTree) ShowKeys()

func (*BTree) Stats

func (bt *BTree) Stats(check bool)

func (*BTree) ValueSet

func (bt *BTree) ValueSet() <-chan []byte

type CheckContext

type CheckContext struct {
	// contains filtered or unexported fields
}

type Config

type Config struct {
	//-- file store
	Idxfile string
	Kvfile  string
	IndexConfig

	// maximum number of levels btree can grow, this information is used as a
	// cue in calculating couple of limits within the algorithm.
	Maxlevel int

	// if number of entries within a node goes below this threshold, then a
	// rebalance will be triggered on its parent node.
	RebalanceThrs int

	// when free nodes are not available to record btree mutations, then a new
	// set of btree blocks will be appended to the index file.
	//  count of appended blocks = freelist-size * AppendRatio
	AppendRatio float32

	// MVCC snapshots are flushed in batches. DrainRate defines the maximum
	// number of snapshots to accumulate in-memory, after which they are
	// flushed to disk.
	DrainRate int

	// all intermediate nodes are cached in memory, there are no upper limit
	// to that. But number of leaf nodes can be really large and
	// `MaxLeafCache` limits the number of leaf nodes to be cached.
	MaxLeafCache int

	// MVCC throttle rate in milliseconds
	MVCCThrottleRate time.Duration

	// enables O_SYNC flag for indexfile and kvfile.
	Sync bool

	// enables O_DIRECT flag for indexfile and kvfile.
	Nocache bool

	// Debug
	Debug bool
}

BTree configuration parameters, these parameters cannot change once the index-file and kv-file are created, for intance, when indexing server restarts on existing index files.

type DCache

type DCache struct {
	// contains filtered or unexported fields
}

func NewDCache

func NewDCache(blocksize, hashsize int64, hashmask int64) *DCache

type DCacheItem

type DCacheItem struct {
	// contains filtered or unexported fields
}

Singley linked list.

type DEFER

type DEFER struct {
	// contains filtered or unexported fields
}

type Emitter

type Emitter func([]byte) // Internal type

type FreeList

type FreeList struct {
	// contains filtered or unexported fields
}

Structure to manage the free list

type Head struct {
	// contains filtered or unexported fields
}

Structure to manage the head sector

type IO

type IO struct {
	// contains filtered or unexported fields
}

type IndexConfig

type IndexConfig struct {
	Sectorsize int64 // head sector-size in bytes.
	Flistsize  int64 // free-list size in bytes.
	Blocksize  int64 // btree block size in bytes.
}

Sub-structure to `Config` structure.

type Indexer

type Indexer interface {
	// Insert {key,value} pairs into the index. key type is expected to
	// implement `Key` interface and value type is expected to implement
	// `Value` interface. If the key is successfuly inserted it returns true.
	Insert(Key, Value) bool

	// Count number of key,value pairs in this index.
	Count() int64

	// Return key-bytes, docid-bytes, and value bytes of the first
	// element in the list.
	Front() ([]byte, []byte, []byte)

	// Check whether `key` is present in the index.
	Contains(Key) bool

	// Check whether `key` and `docid` is present in the index.
	Equals(Key) bool

	// Return a channel on which the caller can receive key bytes, docid-
	// bytes and value-bytes for each entry in the index.
	//      ch := bt.FullSet()
	//      keybytes := <-ch
	//      valbytes := <-ch
	//      docidbytes := <-ch
	FullSet() <-chan []byte

	// Return a channel on which the caller can receive key-bytes.
	KeySet() <-chan []byte

	// Return a channel on which the caller can receive docid-bytes
	DocidSet() <-chan []byte

	// Return a channel on which the caller can receive value-bytes
	ValueSet() <-chan []byte

	// Return a channel that will transmit all values associated with `key`,
	// make sure the `docid` is set to minimum value to lookup all values
	// greater that `key` && `docid`
	Lookup(Key) (chan []byte, error)

	// Remove an entry identified by {key,docid}
	Remove(Key) bool

	//-- Meant for debugging.
	Drain()      // flush the MVCC snapshots into disk.
	Check()      // check the btree data structure for anamolies.
	Show()       // displays in-memory btree structure on stdout.
	ShowKeys()   // list keys and docids inside the tree.
	Stats(bool)  // display statistics so far.
	LevelCount() // count number of inodes, knodes and number of entries.
}

interface made available to btree user.

type Key

type Key interface {
	// transform actual key content into byte slice, that can be persisted in
	// file.
	Bytes() []byte

	// every key carries the document-id that emitted this {key,value} tupele,
	// transform the document-id into byte slice, that can be persisted in file.
	Docid() []byte

	// this is the call-back hook that `Key` types can use to sort themself.
	// kfpos : file-position inside kv-file that contains key-content.
	// dfpos : file-position inside kv-file that contains docid-content.
	// isD : boolean that says whether comparision needs to be done on
	//       document-id as well
	//
	// Example:
	//
	//      otherkey = s.fetchKey(kfpos)
	//      if cmp = bytes.Compare(thiskey, otherkey); cmp == 0 && isD {
	//          otherdocid = s.fetchKey(dfpos)
	//          cmp = bytes.Compare(thisdocid, otherdocid)
	//          if cmp == 0 {
	//              return cmp, kfpos, dfpos
	//          } else {
	//              return cmp, kfpos, -1
	//          }
	//      } else if cmp == 0 {
	//          return cmp, kfpos, -1
	//      } else {
	//          return cmp, -1, -1
	//      }
	//
	// Returns:
	//    - cmp, result of comparision, either -1, 0, 1.
	//    - kfpos, if > -1, it means the keys are equal and specifies the
	//      offset in kv-file that contains the key.
	//    - dfpos, if > -1, it means the docids are equal and specifies the
	//      offset in kv-file that contains the docid.
	CompareLess(s *Store, kfpos int64, dfpos int64, isD bool) (int, int64, int64)

	// check whether both key and document-id compares equal.
	Equal([]byte, []byte) (bool, bool)
}

interfaces to be supported by key,value types.

type MV

type MV struct {
	// contains filtered or unexported fields
}

type MVCC

type MVCC struct {
	// contains filtered or unexported fields
}

type Node

type Node interface {
	// contains filtered or unexported methods
}

Node interface that is implemented by both `lnode` and `inode` structure.

type ReclaimData

type ReclaimData struct {
	// contains filtered or unexported fields
}

type RecycleData

type RecycleData ReclaimData

type Store

type Store struct {
	//Config
	*WStore // Reference to write-store.
	// contains filtered or unexported fields
}

func NewStore

func NewStore(conf Config) *Store

Construct a new `Store` object.

func (*Store) Close

func (store *Store) Close()

Close will release all resources maintained by store.

func (*Store) Destroy

func (store *Store) Destroy()

Destroy is opposite of Create, it cleans up the datafiles. Data files will be deleted only when all references to WStore is removed.

func (*Store) FetchMVCache

func (store *Store) FetchMVCache(fpos int64) Node

Fetch a node, identified by its file-position, from commitQ or from memory cache. If it is not available from memory fetch from disk. NOTE: multi-version fetches are only used from index mutations and they eventually end up in commitQ under a new file-position. They are not cached into memory until the snapshot is flushed into the disk.

func (*Store) FetchNCache

func (store *Store) FetchNCache(fpos int64) Node

Fetch a node, identified by its file-position, from cache. If it is not available from cache, fetch from disk and cache them in memory. To learn how nodes are cached, refer to cache.go

func (*Store) FetchNode

func (store *Store) FetchNode(fpos int64) Node

FetchNode Fetch the prestine block from disk and make a lnode or inode out of it.

func (*Store) OpEnd

func (store *Store) OpEnd(transaction bool, mv *MV, ts int64)

Opposite of OpStart() API.

func (*Store) OpStart

func (store *Store) OpStart(transaction bool) (Node, *MV, int64)

Fetch the root btree block from index-file. `transaction` must be true for write access. It is assumed that there will be only one outstanding transaction at any given time, so the caller has to make sure to acquire a transaction lock from MVCC controller.

func (*Store) OpStartDirty

func (store *Store) OpStartDirty(transaction bool) (Node, *MV, int64)

type TestKey

type TestKey struct {
	K  string
	Id int64
}

func (*TestKey) Bytes

func (tk *TestKey) Bytes() []byte

func (*TestKey) CompareLess

func (tk *TestKey) CompareLess(s *Store, kfp, dfp int64, isD bool) (int, int64, int64)

func (*TestKey) Docid

func (tk *TestKey) Docid() []byte

func (*TestKey) Equal

func (tk *TestKey) Equal(otherk []byte, otherd []byte) (bool, bool)

type TestValue

type TestValue struct {
	V string
}

func (*TestValue) Bytes

func (tv *TestValue) Bytes() []byte

type Value

type Value interface {
	// transform actual value content into byte slice, that can be persisted in
	// file.
	Bytes() []byte
}

type WStore

type WStore struct {
	Config

	MVCC  // MVCC concurrency control go-routine
	IO    // IO flusher
	DEFER // kv-cache

	WStoreStats
	// contains filtered or unexported fields
}

structure that handles write.

func OpenWStore

func OpenWStore(conf Config) *WStore

Main API to get or instantiate a write-store. If write-store for this index file is already created, it will bre returned after incrementing the reference count.

func (*WStore) CloseWStore

func (wstore *WStore) CloseWStore() bool

Close write-Store

func (*WStore) DestroyWStore

func (wstore *WStore) DestroyWStore()

Destroy is opposite of Create, it cleans up the datafiles.

type WStoreStats

type WStoreStats struct {
	MVloadCounts int64
	// contains filtered or unexported fields
}

Statistical counts

Directories

Path Synopsis
tests
tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL