lsmdb

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 15, 2022 License: Apache-2.0 Imports: 30 Imported by: 1

README

lsmdb

基于Wisckey论文的LSM数据存储引擎

heap包

heap的实现使用到了小根堆,下面先对堆做个简单说明

  1. 堆的概念
  • 堆是一种经过排序的完全二叉树,其中任一非终端节点的数据值均不大于(或不小于)其左孩子和右孩子节点的值
  • 最大堆和最小堆是二叉堆的两种形式
  • 最大堆:根结点的键值是所有堆结点键值中最大者
  • 最小堆:根结点的键值是所有堆结点键值中最小者
  1. heap
  • heap包提供了对任意类型(实现了heap.Interface接口)的堆操作。(最小)堆是具有“每个节点都是以其为根的子树中最小值”属性的树
  • 树的最小元素为其根元素,索引0的位置
  • heap是常用的实现优先队列的方法。要创建一个优先队列,实现一个具有使用(负的)优先级作为比较的依据的Less方法的Heap接口,如此一来可用Push添加项目而用Pop取出队列最高优先级的项目

a(-1) b(-2) c(-3) c的权重最大 c的权重值最小

成员函数

Init

func Init(h Interface) 一个堆在使用任何堆操作之前应先初始化。接受参数为实现了heap.Interface接口的对象。

Fix

func Fix(h Interface, i int) 在修改第i个元素后,调用本函数修复堆,比删除第i个元素后插入新元素更有效率。

Push&Pop

Push和Pop是一对标准堆操作,Push向堆添加一个新元素,Pop弹出并返回堆顶元素,而在push和pop操作不会破坏堆的结构

Remove

删除堆中的第i个元素,并保持堆的约束性

工业级实现badger

  • DB 初始化
  • 读写事务
  • 只读事务
  • 范围查询
  • GC
  • LSM日志合并
DB 初始化
  1. 构造配置参数对象
  2. 打开或创建工作目录并上锁
  3. 打开或创建ManifestFile文件
  4. 创建DB对象
    1. 创建内存表列表,用于预写日志
    2. 创建通知刷新任务的channel
    3. 创建写入任务的channel
    4. 配置对象
    5. manifest文件对象
    6. 目录锁
    7. value目录锁
    8. 创建oracle对象,用于并发事务的控制
  5. 创建一个块缓存
  6. 创建一个索引缓存
  7. 开启key注册器
  8. 打开memtable 并全部追加到 imm 的列表中
    1. 创建一个内存跳表对象
    2. 封装.mem文件为一个logFile对象取名为wal
    3. 打开 .mem结尾的文件并关联为mmap文件
  9. 创建一个激活的内存表
  10. 创建一个level管理器
    1. 创建一个tables的二维数组
    2. 初始化每一个.sst结尾的文件关联为一个mmap文件
    3. 根据manfist里的记录,初始化每一层的table对象
    4. 如果是L0层则按fid排序,否则按每个表中最小key进行排序
  11. vlog初始化
  12. 打开一个DISCARD的文件并关联为mmap文件
  13. 启动日志合并过程
  14. 获取DB的事务最大版本号
    1. 打开vlog文件, 关联mmap
    2. 按fid排序处理vlog文件
    3. 读取目录填充vlog file 的map
    4. 如果没有vlog 文件则创建一个新的
    5. 如果vlog文件是空的则删除
    6. 获取vlog文件的实际大小并截断文件
    7. 创建一个新的活跃的vlog文件
  15. 开启负责处理写请求的工作协程
  16. 启动vlog文件的GC过程
读写事务
只读事务
范围查询
GC
LSM日志合并

Documentation

Index

Constants

View Source
const (
	// ManifestFilename is the filename for the manifest file.
	ManifestFilename = "MANIFEST"
)

Variables

View Source
var (
	// ErrInvalidDir is returned when Badger cannot find the directory
	// from where it is supposed to load the key-value store.
	ErrInvalidDir = errors.New("Invalid Dir, directory does not exist")

	// ErrValueLogSize is returned when opt.ValueLogFileSize option is not within the valid
	// range.
	ErrValueLogSize = errors.New("Invalid ValueLogFileSize, must be between 1MB and 2GB")

	// ErrKeyNotFound is returned when key isn't found on a txn.Get.
	ErrKeyNotFound = errors.New("Key not found")

	// ErrTxnTooBig is returned if too many writes are fit into a single transaction.
	ErrTxnTooBig = errors.New("Txn is too big to fit into one request")

	// ErrConflict is returned when a transaction conflicts with another transaction. This can happen if
	// the read rows had been updated concurrently by another transaction.
	ErrConflict = errors.New("Transaction Conflict. Please retry")

	// ErrReadOnlyTxn is returned if an update function is called on a read-only transaction.
	ErrReadOnlyTxn = errors.New("No sets or deletes are allowed in a read-only transaction")

	// ErrDiscardedTxn is returned if a previously discarded transaction is re-used.
	ErrDiscardedTxn = errors.New("This transaction has been discarded. Create a new one")

	// ErrEmptyKey is returned if an empty key is passed on an update function.
	ErrEmptyKey = errors.New("Key cannot be empty")

	// ErrRetry is returned when a log file containing the value is not found.
	// This usually indicates that it may have been garbage collected, and the
	// operation needs to be retried.
	ErrRetry = errors.New("Unable to find log file. Please retry")

	// ErrThresholdZero is returned if threshold is set to zero, and value log GC is called.
	// In such a case, GC can't be run.
	ErrThresholdZero = errors.New(
		"Value log GC can't run because threshold is set to zero")
	// ErrNoRewrite is returned if a call for value log GC doesn't result in a log file rewrite.
	ErrNoRewrite = errors.New(
		"Value log GC attempt didn't result in any cleanup")

	// ErrRejected is returned if a value log GC is called either while another GC is running, or
	// after DB::Close has been called.
	ErrRejected = errors.New("Value log GC request rejected") // reject: 拒绝

	// ErrInvalidRequest is returned if the user request is invalid.
	ErrInvalidRequest = errors.New("Invalid request")

	// ErrManagedTxn is returned if the user tries to use an API which isn't
	// allowed due to external management of transactions, when using ManagedDB.
	ErrManagedTxn = errors.New(
		"Invalid API request. Not allowed to perform this action using ManagedDB")

	// ErrInvalidDump if a data dump made previously cannot be loaded into the database.
	ErrInvalidDump = errors.New("Data dump cannot be read")
)
View Source
var DefaultIteratorOptions = IteratorOptions{
	PrefetchValues: true,
	PrefetchSize:   100,
	Reverse:        false,
	AllVersions:    false,
}

DefaultIteratorOptions contains default options when iterating over lsmdb key-value stores.

Functions

This section is empty.

Types

type DB

type DB struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

func Open

func Open(opt Options) (db *DB, err error)

func (*DB) Backup

func (db *DB) Backup(w io.Writer, since uint64) (uint64, error)

func (*DB) Close

func (db *DB) Close() (err error)

func (*DB) Load

func (db *DB) Load(r io.Reader) error

func (*DB) NewTransaction

func (db *DB) NewTransaction(update bool) *Txn

func (*DB) PurgeOlderVersions

func (db *DB) PurgeOlderVersions() error

PurgeOlderVersions deletes older versions of all keys.

func (*DB) PurgeVersionsBelow

func (db *DB) PurgeVersionsBelow(key []byte, ts uint64) error

PurgeVersionsBelow will delete all versions of a key below the specified version purge 清理 净化

func (*DB) RunValueLogGC

func (db *DB) RunValueLogGC(discardRatio float64) error

func (*DB) Update

func (db *DB) Update(fn func(txn *Txn) error) error

Update executes a function, creating and managing a read-write transaction for the user. Error returned by the function is relayed by the Update method.

func (*DB) View

func (db *DB) View(fn func(txn *Txn) error) error

View executes a function creating and managing a read-only transaction for the user. Error returned by the function is relayed by the View method.

type Entry

type Entry struct {
	Key       []byte
	Value     []byte
	UserMeta  byte
	ExpiresAt uint64 // time.Unix
	// contains filtered or unexported fields
}

type Item

type Item struct {
	// contains filtered or unexported fields
}

Item is returned during iteration. Both the Key() and Value() output is only valid until iterator.Next() is called.

func (*Item) EstimatedSize

func (item *Item) EstimatedSize() int64

EstimatedSize returns approximate size of the key-value pair.

This can be called while iterating through a store to quickly estimate the size of a range of key-value pairs (without fetching the corresponding values).

func (*Item) ExpiresAt

func (item *Item) ExpiresAt() uint64

ExpiresAt returns a Unix time value indicating when the item will be considered expired. 0 indicates that the item will never expire.

func (*Item) Key

func (item *Item) Key() []byte

Key returns the key.

Key is only valid as long as item is valid, or transaction is valid. If you need to use it outside its validity, please copy it.

func (*Item) ToString

func (item *Item) ToString() string

ToString returns a string representation of Item

func (*Item) UserMeta

func (item *Item) UserMeta() byte

UserMeta returns the userMeta set by the user. Typically, this byte, optionally set by the user is used to interpret the value.

func (*Item) Value

func (item *Item) Value() ([]byte, error)

Value retrieves the value of the item from the value log.

The returned value is only valid as long as item is valid, or transaction is valid. So, if you need to use it outside, please parse or copy it.

func (*Item) ValueCopy

func (item *Item) ValueCopy(dst []byte) ([]byte, error)

This function is useful in long running iterate/update transactions to avoid a write deadlock. See Github issue: https://github.com/dgraph-io/badger/issues/315

func (*Item) Version

func (item *Item) Version() uint64

Version returns the commit timestamp of the item.

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator helps iterating over the KV pairs in a lexicographically(按字典顺序) sorted order.

func (*Iterator) Close

func (it *Iterator) Close()

Close would close the iterator. NOTE: It is important to call this when you're done with iteration.

func (*Iterator) Item

func (it *Iterator) Item() *Item

Item returns pointer to the current key-value pair. This item is only valid until it.Next() gets called.

func (*Iterator) Next

func (it *Iterator) Next()

Next would advance the iterator by one. Always check it.Valid() after a Next() to ensure you have access to a valid it.Item().

func (*Iterator) Rewind

func (it *Iterator) Rewind()

func (*Iterator) Seek

func (it *Iterator) Seek(key []byte)

func (*Iterator) Valid

func (it *Iterator) Valid() bool

Valid returns false when iteration is done.

func (*Iterator) ValidForPrefix

func (it *Iterator) ValidForPrefix(prefix []byte) bool

ValidForPrefix returns false when iteration is done or when the current key is not prefixed by the specified prefix.

type IteratorOptions

type IteratorOptions struct {
	// Indicates whether we should prefetch values during iteration and store them.
	PrefetchValues bool
	// How many KV pairs to prefetch while iterating. Valid only if PrefetchValues is true.
	PrefetchSize int
	Reverse      bool // Direction of iteration. False is forward, true is backward.
	AllVersions  bool // Fetch all valid versions of the same key.
}

IteratorOptions is used to set options when iterating over lsmdb key-value stores.

This package provides DefaultIteratorOptions which contains options that should work for most applications. Consider using that as a starting point before customizing it for your own needs.

type ManagedDB

type ManagedDB struct {
	*DB
}

ManagedDB allows end users to manage the transactions themselves. Transaction start and commit timestamps are set by end-user.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

WARNING: This is an experimental feature and may be changed significantly in a future release. So please proceed with caution.

func OpenManaged

func OpenManaged(opts Options) (*ManagedDB, error)

OpenManaged returns a new ManagedDB, which allows more control over setting transaction timestamps.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

func (*ManagedDB) NewTransaction

func (db *ManagedDB) NewTransaction(update bool)

NewTransaction overrides DB.NewTransaction() and panics when invoked. Use NewTransactionAt() instead.

func (*ManagedDB) NewTransactionAt

func (db *ManagedDB) NewTransactionAt(readTs uint64, update bool) *Txn

NewTransactionAt follows the same logic as DB.NewTransaction(), but uses the provided read timestamp.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

func (*ManagedDB) PurgeVersionsBelow

func (db *ManagedDB) PurgeVersionsBelow(key []byte, ts uint64) error

PurgeVersionsBelow will delete all versions of a key below the specified version

type Manifest

type Manifest struct {
	Levels []levelManifest          // L1~Lx
	Tables map[uint64]tableManifest // L0

	// Contains(包含) total number of creation and deletion changes in the manifest -- used to compute
	Creations int // 创造物
	Deletions int // 删除部分
}

载货单,货单;旅客名单 Manifest represnts the contents of the MANIFEST file in a Badger store. The MANIFEST file describes the startup state of the db -- all LSM files and what level they're at. It consists of(包括) a sequence of(一系列的) ManifestChangeSet objects. Each of these is treated atomically, and contains a sequence of ManifestChange's (file creations/deletions) which we use to

func ReplayManifestFile

func ReplayManifestFile(fp *os.File) (ret Manifest, truncOffset int64, err error)

type Options

type Options struct {
	// 1. Mandatory flags
	// -------------------
	// Directory to store the data in. Should exist and be writable.
	Dir string
	// Directory to store the value log in. Can be the same as Dir. Should
	// exist and be writable.
	ValueDir string

	// 2. Frequently modified flags
	// -----------------------------
	// Sync all writes to disk. Setting this to true would slow down data
	// loading significantly.
	SyncWrites bool

	// How should LSM tree be accessed.
	TableLoadingMode options.FileLoadingMode

	// 3. Flags that user might want to review
	// ----------------------------------------
	// The following affect all levels of LSM tree.
	MaxTableSize        int64 // Each table (or file) is at most this size.
	LevelSizeMultiplier int   // Equals SizeOf(Li+1)/SizeOf(Li).
	MaxLevels           int   // Maximum number of levels of compaction.
	// If value size >= this threshold, only store value offsets in tree.
	ValueThreshold int
	// Maximum number of tables to keep in memory, before stalling.
	NumMemtables int
	// The following affect how we handle LSM tree L0.
	// Maximum number of Level 0 tables before we start compacting.
	NumLevelZeroTables int

	// If we hit this number of Level 0 tables, we will stall until L0 is
	// compacted away.
	NumLevelZeroTablesStall int

	// Maximum total size for L1.
	LevelOneSize int64

	// Size of single value log file.
	ValueLogFileSize int64

	// Number of compaction workers to run concurrently.
	NumCompactors int

	// 4. Flags for testing purposes
	// ------------------------------
	DoNotCompact bool // Stops LSM tree from compactions.
	// contains filtered or unexported fields
}

Options are params for creating DB object.

This package provides DefaultOptions which contains options that should work for most applications. Consider using that as a starting point before customizing it for your own needs.

func DefaultOptions

func DefaultOptions(dir string) Options

type Txn

type Txn struct {
	// contains filtered or unexported fields
}

Txn represents a lsmdb transaction.

func (*Txn) Commit

func (txn *Txn) Commit(callback func(error)) error

Commit commits the transaction, following these steps:

1. If there are no writes, return immediately.

2. Check if read rows were updated since txn started. If so, return ErrConflict.

3. If no conflict, generate a commit timestamp and update written rows' commit ts.

4. Batch up all writes, write them to value log and LSM tree.

5. If callback is provided, Badger will return immediately after checking for conflicts. Writes to the database will happen in the background. If there is a conflict, an error will be returned and the callback will not run. If there are no conflicts, the callback will be called in the background upon successful completion of writes or any error during write.

If error is nil, the transaction is successfully committed. In case of a non-nil error, the LSM tree won't be updated, so there's no need for any rollback.

func (*Txn) CommitAt

func (txn *Txn) CommitAt(commitTs uint64, callback func(error)) error

CommitAt commits the transaction, following the same logic as Commit(), but at the given commit timestamp. This will panic if not used with ManagedDB.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

func (*Txn) Delete

func (txn *Txn) Delete(key []byte) error

Delete deletes a key. This is done by adding a delete marker for the key at commit timestamp. Any reads happening before this timestamp would be unaffected. Any reads after this commit would see the deletion.

func (*Txn) Discard

func (txn *Txn) Discard()

func (*Txn) Get

func (txn *Txn) Get(key []byte) (item *Item, rerr error)

Get looks for key and returns corresponding Item. If key is not found, ErrKeyNotFound is returned.

func (*Txn) NewIterator

func (txn *Txn) NewIterator(opt IteratorOptions) *Iterator

NewIterator returns a new iterator. Depending upon(依赖于) the options, either only keys, or both key-value pairs would be fetched. The keys are returned in lexicographically(按照字典序) sorted order. Using prefetch is highly recommended if you're doing a long running iteration. Avoid long running iterations in update transactions.

func (*Txn) Set

func (txn *Txn) Set(key, val []byte) error

Set adds a key-value pair to the database.

It will return ErrReadOnlyTxn if update flag was set to false when creating the transaction.

func (*Txn) SetWithMeta

func (txn *Txn) SetWithMeta(key, val []byte, meta byte) error

SetWithMeta adds a key-value pair to the database, along with a metadata byte. This byte is stored alongside the key, and can be used as an aid to interpret the value or store other contextual bits corresponding to the key-value pair.

func (*Txn) SetWithTTL

func (txn *Txn) SetWithTTL(key, val []byte, dur time.Duration) error

SetWithTTL adds a key-value pair to the database, along with a time-to-live (TTL) setting. A key stored with with a TTL would automatically expire after the time has elapsed , and be eligible for garbage collection.

func (*Txn) Update added in v1.2.0

func (txn *Txn) Update(key, val []byte) error

Update delete a key-value pair then add it again with new value

Directories

Path Synopsis
cmd
y

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL