pebble: github.com/cockroachdb/pebble Index | Files | Directories

package pebble

import "github.com/cockroachdb/pebble"

Package pebble provides an ordered key/value store.

Index

Package Files

batch.go cache.go checkpoint.go cleaner.go commit.go compaction.go compaction_iter.go compaction_picker.go comparer.go db.go error_iter.go event.go filenames.go get_iter.go ingest.go internal.go iterator.go level_iter.go log_recycler.go logger.go mem_table.go merger.go merging_iter.go merging_iter_heap.go metrics.go open.go options.go pacer.go race_off.go read_state.go snapshot.go syncing_fs.go table_cache.go version_set.go

Constants

const (
    InternalKeyKindDelete          = base.InternalKeyKindDelete
    InternalKeyKindSet             = base.InternalKeyKindSet
    InternalKeyKindMerge           = base.InternalKeyKindMerge
    InternalKeyKindLogData         = base.InternalKeyKindLogData
    InternalKeyKindSingleDelete    = base.InternalKeyKindSingleDelete
    InternalKeyKindRangeDelete     = base.InternalKeyKindRangeDelete
    InternalKeyKindMax             = base.InternalKeyKindMax
    InternalKeyKindInvalid         = base.InternalKeyKindInvalid
    InternalKeySeqNumBatch         = base.InternalKeySeqNumBatch
    InternalKeySeqNumMax           = base.InternalKeySeqNumMax
    InternalKeyRangeDeleteSentinel = base.InternalKeyRangeDeleteSentinel
)

These constants are part of the file format, and should not be changed.

const (
    DefaultCompression = sstable.DefaultCompression
    NoCompression      = sstable.NoCompression
    SnappyCompression  = sstable.SnappyCompression
)

Exported Compression constants.

const (
    TableFormatRocksDBv2 = sstable.TableFormatRocksDBv2
    TableFormatLevelDB   = sstable.TableFormatLevelDB
)

Exported TableFormat constants.

const (
    TableFilter = base.TableFilter
)

Exported TableFilter constants.

Variables

var (
    // ErrNotFound is returned when a get operation does not find the requested
    // key.
    ErrNotFound = base.ErrNotFound
    // ErrClosed is returned when an operation is performed on a closed snapshot
    // or DB.
    ErrClosed = errors.New("pebble: closed")
    // ErrReadOnly is returned when a write operation is performed on a read-only
    // database.
    ErrReadOnly = errors.New("pebble: read-only")
)
var DefaultComparer = base.DefaultComparer

DefaultComparer exports the base.DefaultComparer variable.

var DefaultLogger defaultLogger

DefaultLogger logs to the Go stdlib logs.

var DefaultMerger = base.DefaultMerger

DefaultMerger exports the base.DefaultMerger variable.

var ErrInvalidBatch = errors.New("pebble: invalid batch")

ErrInvalidBatch indicates that a batch is invalid or otherwise corrupted.

var ErrNotIndexed = errors.New("pebble: batch not indexed")

ErrNotIndexed means that a read operation on a batch failed because the batch is not indexed and thus doesn't support reads.

var NoSync = &WriteOptions{Sync: false}

NoSync specifies the default write options for writes which do not synchronize to disk.

var Sync = &WriteOptions{Sync: true}

Sync specifies the default write options for writes which synchronize to disk.

func NewCache Uses

func NewCache(size int64) *cache.Cache

NewCache creates a new cache of the specified size. Memory for the cache is allocated on demand, not during initialization.

type AbbreviatedKey Uses

type AbbreviatedKey = base.AbbreviatedKey

AbbreviatedKey exports the base.AbbreviatedKey type.

type ArchiveCleaner Uses

type ArchiveCleaner = base.ArchiveCleaner

ArchiveCleaner exports the base.ArchiveCleaner type.

type Batch Uses

type Batch struct {
    // contains filtered or unexported fields
}

A Batch is a sequence of Sets, Merges, Deletes, and/or DeleteRanges that are applied atomically. Batch implements the Reader interface, but only an indexed batch supports reading (without error) via Get or NewIter. A non-indexed batch will return ErrNotIndexed when read from .

Indexing

Batches can be optionally indexed (see DB.NewIndexedBatch). An indexed batch allows iteration via an Iterator (see Batch.NewIter). The iterator provides a merged view of the operations in the batch and the underlying database. This is implemented by treating the batch as an additional layer in the LSM where every entry in the batch is considered newer than any entry in the underlying database (batch entries have the InternalKeySeqNumBatch bit set). By treating the batch as an additional layer in the LSM, iteration supports all batch operations (i.e. Set, Merge, Delete, and DeleteRange) with minimal effort.

The same key can be operated on multiple times in a batch, though only the latest operation will be visible. For example, Put("a", "b"), Delete("a") will cause the key "a" to not be visible in the batch. Put("a", "b"), Put("a", "c") will cause a read of "a" to return the value "c".

The batch index is implemented via an skiplist (internal/batchskl). While the skiplist implementation is very fast, inserting into an indexed batch is significantly slower than inserting into a non-indexed batch. Only use an indexed batch if you require reading from it.

Atomic commit

The operations in a batch are persisted by calling Batch.Commit which is equivalent to calling DB.Apply(batch). A batch is committed atomically by writing the internal batch representation to the WAL, adding all of the batch operations to the memtable associated with the WAL, and then incrementing the visible sequence number so that subsequent reads can see the effects of the batch operations. If WriteOptions.Sync is true, a call to Batch.Commit will guarantee that the batch is persisted to disk before returning. See commitPipeline for more on the implementation details.

Large batches

The size of a batch is limited only by available memory (be aware that indexed batches require considerably additional memory for the skiplist structure). A given WAL file has a single memtable associated with it (this restriction could be removed, but doing so is onerous and complex). And a memtable has a fixed size due to the underlying fixed size arena. Note that this differs from RocksDB where a memtable can grow arbitrarily large using a list of arena chunks. In RocksDB this is accomplished by storing pointers in the arena memory, but that isn't possible in Go.

During Batch.Commit, a batch which is larger than a threshold (> MemTableSize/2) is wrapped in a flushableBatch and inserted into the queue of memtables. A flushableBatch forces WAL to be rotated, but that happens anyways when the memtable becomes full so this does not cause significant WAL churn. Because the flushableBatch is readable as another layer in the LSM, Batch.Commit returns as soon as the flushableBatch has been added to the queue of memtables.

Internally, a flushableBatch provides Iterator support by sorting the batch contents (the batch is sorted once, when it is added to the memtable queue). Sorting the batch contents and insertion of the contents into a memtable have the same big-O time, but the constant factor dominates here. Sorting is significantly faster and uses significantly less memory.

Internal representation

The internal batch representation is a contiguous byte buffer with a fixed 12-byte header, followed by a series of records.

+-------------+------------+--- ... ---+
| SeqNum (8B) | Count (4B) |  Entries  |
+-------------+------------+--- ... ---+

Each record has a 1-byte kind tag prefix, followed by 1 or 2 length prefixed strings (varstring):

+-----------+-----------------+-------------------+
| Kind (1B) | Key (varstring) | Value (varstring) |
+-----------+-----------------+-------------------+

A varstring is a varint32 followed by N bytes of data. The Kind tags are exactly those specified by InternalKeyKind. The following table shows the format for records of each kind:

InternalKeyKindDelete       varstring
InternalKeyKindLogData      varstring
InternalKeyKindSet          varstring varstring
InternalKeyKindMerge        varstring varstring
InternalKeyKindRangeDelete  varstring varstring

The intuitive understanding here are that the arguments to Delete(), Set(), Merge(), and DeleteRange() are encoded into the batch.

The internal batch representation is the on disk format for a batch in the WAL, and thus stable. New record kinds may be added, but the existing ones will not be modified.

func (*Batch) Apply Uses

func (b *Batch) Apply(batch *Batch, _ *WriteOptions) error

Apply the operations contained in the batch to the receiver batch.

It is safe to modify the contents of the arguments after Apply returns.

func (*Batch) Close Uses

func (b *Batch) Close() error

Close closes the batch without committing it.

func (*Batch) Commit Uses

func (b *Batch) Commit(o *WriteOptions) error

Commit applies the batch to its parent writer.

func (*Batch) Count Uses

func (b *Batch) Count() uint32

Count returns the count of memtable-modifying operations in this batch. All operations with the except of LogData increment this count.

func (*Batch) Delete Uses

func (b *Batch) Delete(key []byte, _ *WriteOptions) error

Delete adds an action to the batch that deletes the entry for key.

It is safe to modify the contents of the arguments after Delete returns.

func (*Batch) DeleteDeferred Uses

func (b *Batch) DeleteDeferred(keyLen int) *DeferredBatchOp

DeleteDeferred is similar to Delete in that it adds a delete operation to the batch, except it only takes in key/value lengths instead of complete slices, letting the caller encode into those objects and then call Finish() on the returned object.

func (*Batch) DeleteRange Uses

func (b *Batch) DeleteRange(start, end []byte, _ *WriteOptions) error

DeleteRange deletes all of the keys (and values) in the range [start,end) (inclusive on start, exclusive on end).

It is safe to modify the contents of the arguments after DeleteRange returns.

func (*Batch) DeleteRangeDeferred Uses

func (b *Batch) DeleteRangeDeferred(startLen, endLen int) *DeferredBatchOp

DeleteRangeDeferred is similar to DeleteRange in that it adds a delete range operation to the batch, except it only takes in key lengths instead of complete slices, letting the caller encode into those objects and then call Finish() on the returned object. Note that DeferredBatchOp.Key should be populated with the start key, and DeferredBatchOp.Value should be populated with the end key.

func (*Batch) Empty Uses

func (b *Batch) Empty() bool

Empty returns true if the batch is empty, and false otherwise.

func (*Batch) Get Uses

func (b *Batch) Get(key []byte) (value []byte, err error)

Get gets the value for the given key. It returns ErrNotFound if the DB does not contain the key.

The caller should not modify the contents of the returned slice, but it is safe to modify the contents of the argument after Get returns.

func (*Batch) Indexed Uses

func (b *Batch) Indexed() bool

Indexed returns true if the batch is indexed (i.e. supports read operations).

func (*Batch) LogData Uses

func (b *Batch) LogData(data []byte, _ *WriteOptions) error

LogData adds the specified to the batch. The data will be written to the WAL, but not added to memtables or sstables. Log data is never indexed, which makes it useful for testing WAL performance.

It is safe to modify the contents of the argument after LogData returns.

func (*Batch) Merge Uses

func (b *Batch) Merge(key, value []byte, _ *WriteOptions) error

Merge adds an action to the batch that merges the value at key with the new value. The details of the merge are dependent upon the configured merge operator.

It is safe to modify the contents of the arguments after Merge returns.

func (*Batch) MergeDeferred Uses

func (b *Batch) MergeDeferred(keyLen, valueLen int) *DeferredBatchOp

MergeDeferred is similar to Merge in that it adds a merge operation to the batch, except it only takes in key/value lengths instead of complete slices, letting the caller encode into those objects and then call Finish() on the returned object.

func (*Batch) NewIter Uses

func (b *Batch) NewIter(o *IterOptions) *Iterator

NewIter returns an iterator that is unpositioned (Iterator.Valid() will return false). The iterator can be positioned via a call to SeekGE, SeekPrefixGE, SeekLT, First or Last. Only indexed batches support iterators.

func (*Batch) Reader Uses

func (b *Batch) Reader() BatchReader

Reader returns a BatchReader for the current batch contents. If the batch is mutated, the new entries will not be visible to the reader.

func (*Batch) Repr Uses

func (b *Batch) Repr() []byte

Repr returns the underlying batch representation. It is not safe to modify the contents. Reset() will not change the contents of the returned value, though any other mutation operation may do so.

func (*Batch) Reset Uses

func (b *Batch) Reset()

Reset clears the underlying byte slice and effectively empties the batch for reuse. Used in cases where Batch is only being used to build a batch, and where the end result is a Repr() call, not a Commit call or a Close call. Commits and Closes take care of releasing resources when appropriate.

func (*Batch) SeqNum Uses

func (b *Batch) SeqNum() uint64

SeqNum returns the batch sequence number which is applied to the first record in the batch. The sequence number is incremented for each subsequent record.

func (*Batch) Set Uses

func (b *Batch) Set(key, value []byte, _ *WriteOptions) error

Set adds an action to the batch that sets the key to map to the value.

It is safe to modify the contents of the arguments after Set returns.

func (*Batch) SetDeferred Uses

func (b *Batch) SetDeferred(keyLen, valueLen int) *DeferredBatchOp

SetDeferred is similar to Set in that it adds a set operation to the batch, except it only takes in key/value lengths instead of complete slices, letting the caller encode into those objects and then call Finish() on the returned object.

func (*Batch) SetRepr Uses

func (b *Batch) SetRepr(data []byte) error

SetRepr sets the underlying batch representation. The batch takes ownership of the supplied slice. It is not safe to modify it afterwards until the Batch is no longer in use.

func (*Batch) SingleDelete Uses

func (b *Batch) SingleDelete(key []byte, _ *WriteOptions) error

SingleDelete adds an action to the batch that single deletes the entry for key. See Writer.SingleDelete for more details on the semantics of SingleDelete.

It is safe to modify the contents of the arguments after SingleDelete returns.

func (*Batch) SingleDeleteDeferred Uses

func (b *Batch) SingleDeleteDeferred(keyLen int) *DeferredBatchOp

SingleDeleteDeferred is similar to SingleDelete in that it adds a single delete operation to the batch, except it only takes in key/value lengths instead of complete slices, letting the caller encode into those objects and then call Finish() on the returned object.

type BatchReader Uses

type BatchReader []byte

BatchReader iterates over the entries contained in a batch.

func MakeBatchReader Uses

func MakeBatchReader(repr []byte) BatchReader

MakeBatchReader constructs a BatchReader from a batch representation. The header (containing the batch count and seqnum) is ignored.

func (*BatchReader) Next Uses

func (r *BatchReader) Next() (kind InternalKeyKind, ukey []byte, value []byte, ok bool)

Next returns the next entry in this batch. The final return value is false if the batch is corrupt. The end of batch is reached when len(r)==0.

type Cache Uses

type Cache = cache.Cache

Cache exports the cache.Cache type.

type CacheMetrics Uses

type CacheMetrics = cache.Metrics

CacheMetrics holds metrics for the block and table cache.

type Cleaner Uses

type Cleaner = base.Cleaner

Cleaner exports the base.Cleaner type.

type CompactionInfo Uses

type CompactionInfo struct {
    // JobID is the ID of the compaction job.
    JobID int
    // Reason is the reason for the compaction.
    Reason string
    // Input contains the input tables for the compaction. A compaction is
    // performed from Input.Level to Input.Level+1. Input.Tables[0] contains the
    // inputs from Input.Level and Input.Tables[1] contains the inputs from
    // Input.Level+1.
    Input struct {
        Level  int
        Tables [2][]TableInfo
    }
    // Output contains the output tables generated by the compaction. The output
    // tables are empty for the compaction begin event.
    Output struct {
        Level  int
        Tables []TableInfo
    }
    Done bool
    Err  error
}

CompactionInfo contains the info for a compaction event.

func (CompactionInfo) String Uses

func (i CompactionInfo) String() string

type Compare Uses

type Compare = base.Compare

Compare exports the base.Compare type.

type Comparer Uses

type Comparer = base.Comparer

Comparer exports the base.Comparer type.

type Compression Uses

type Compression = sstable.Compression

Compression exports the base.Compression type.

type DB Uses

type DB struct {
    // contains filtered or unexported fields
}

DB provides a concurrent, persistent ordered key/value store.

A DB's basic operations (Get, Set, Delete) should be self-explanatory. Get and Delete will return ErrNotFound if the requested key is not in the store. Callers are free to ignore this error.

A DB also allows for iterating over the key/value pairs in key order. If d is a DB, the code below prints all key/value pairs whose keys are 'greater than or equal to' k:

iter := d.NewIter(readOptions)
for iter.SeekGE(k); iter.Valid(); iter.Next() {
	fmt.Printf("key=%q value=%q\n", iter.Key(), iter.Value())
}
return iter.Close()

The Options struct holds the optional parameters for the DB, including a Comparer to define a 'less than' relationship over keys. It is always valid to pass a nil *Options, which means to use the default parameter values. Any zero field of a non-nil *Options also means to use the default value for that parameter. Thus, the code below uses a custom Comparer, but the default values for every other parameter:

db := pebble.Open(&Options{
	Comparer: myComparer,
})

func Open Uses

func Open(dirname string, opts *Options) (*DB, error)

Open opens a LevelDB whose files live in the given directory.

func (*DB) Apply Uses

func (d *DB) Apply(batch *Batch, opts *WriteOptions) error

Apply the operations contained in the batch to the DB. If the batch is large the contents of the batch may be retained by the database. If that occurs the batch contents will be cleared preventing the caller from attempting to reuse them.

It is safe to modify the contents of the arguments after Apply returns.

func (*DB) AsyncFlush Uses

func (d *DB) AsyncFlush() (<-chan struct{}, error)

AsyncFlush asynchronously flushes the memtable to stable storage.

If no error is returned, the caller can receive from the returned channel in order to wait for the flush to complete.

func (*DB) Checkpoint Uses

func (d *DB) Checkpoint(destDir string) (err error)

Checkpoint constructs a snapshot of the DB instance in the specified directory. The WAL, MANIFEST, OPTIONS, and sstables will be copied into the snapshot. Hard links will be used when possible. Beware of the significant space overhead for a checkpoint if hard links are disabled. Also beware that even if hard links are used, the space overhead for the checkpoint will increase over time as the DB performs compactions.

func (*DB) Close Uses

func (d *DB) Close() error

Close closes the DB.

It is not safe to close a DB until all outstanding iterators are closed. It is valid to call Close multiple times. Other methods should not be called after the DB has been closed.

func (*DB) Compact Uses

func (d *DB) Compact(
    start, end []byte,
) error

Compact the specified range of keys in the database.

func (*DB) Delete Uses

func (d *DB) Delete(key []byte, opts *WriteOptions) error

Delete deletes the value for the given key. Deletes are blind all will succeed even if the given key does not exist.

It is safe to modify the contents of the arguments after Delete returns.

func (*DB) DeleteRange Uses

func (d *DB) DeleteRange(start, end []byte, opts *WriteOptions) error

DeleteRange deletes all of the keys (and values) in the range [start,end) (inclusive on start, exclusive on end).

It is safe to modify the contents of the arguments after DeleteRange returns.

func (*DB) EstimateDiskUsage Uses

func (d *DB) EstimateDiskUsage(start, end []byte) (uint64, error)

EstimateDiskUsage returns the estimated filesystem space used in bytes for storing the range `[start, end]`. The estimation is computed as follows:

- For sstables fully contained in the range the whole file size is included. - For sstables partially contained in the range the overlapping data block sizes

are included. Even if a data block partially overlaps, or we cannot determine
overlap due to abbreviated index keys, the full data block size is included in
the estimation. Note that unlike fully contained sstables, none of the
meta-block space is counted for partially overlapped files.

- There may also exist WAL entries for unflushed keys in this range. This

estimation currently excludes space used for the range in the WAL.

func (*DB) Flush Uses

func (d *DB) Flush() error

Flush the memtable to stable storage.

func (*DB) Get Uses

func (d *DB) Get(key []byte) ([]byte, error)

Get gets the value for the given key. It returns ErrNotFound if the DB does not contain the key.

The caller should not modify the contents of the returned slice, but it is safe to modify the contents of the argument after Get returns.

func (*DB) Ingest Uses

func (d *DB) Ingest(paths []string) error

Ingest ingests a set of sstables into the DB. Ingestion of the files is atomic and semantically equivalent to creating a single batch containing all of the mutations in the sstables. Ingestion may require the memtable to be flushed. The ingested sstable files are moved into the DB and must reside on the same filesystem as the DB. Sstables can be created for ingestion using sstable.Writer.

Ingestion loads each sstable into the lowest level of the LSM which it doesn't overlap (see ingestTargetLevel). If an sstable overlaps a memtable, ingestion forces the memtable to flush, and then waits for the flush to occur.

The steps for ingestion are:

1. Allocate file numbers for every sstable beign ingested.
2. Load the metadata for all sstables being ingest.
3. Sort the sstables by smallest key, verifying non overlap.
4. Hard link the sstables into the DB directory.
5. Allocate a sequence number to use for all of the entries in the
   sstables. This is the step where overlap with memtables is
   determined. If there is overlap, we remember the most recent memtable
   that overlaps.
6. Update the sequence number in the ingested sstables.
7. Wait for the most recent memtable that overlaps to flush (if any).
8. Add the ingested sstables to the version (DB.ingestApply).
9. Publish the ingestion sequence number.

Note that if the mutable memtable overlaps with ingestion, a flush of the memtable is forced equivalent to DB.Flush. Additionally, subsequent mutations that get sequence numbers larger than the ingestion sequence number get queued up behind the ingestion waiting for it to complete. This can produce a noticeable hiccup in performance. See https://github.com/cockroachdb/pebble/issues/25 for an idea for how to fix this hiccup.

func (*DB) LogData Uses

func (d *DB) LogData(data []byte, opts *WriteOptions) error

LogData adds the specified to the batch. The data will be written to the WAL, but not added to memtables or sstables. Log data is never indexed, which makes it useful for testing WAL performance.

It is safe to modify the contents of the argument after LogData returns.

func (*DB) Merge Uses

func (d *DB) Merge(key, value []byte, opts *WriteOptions) error

Merge adds an action to the DB that merges the value at key with the new value. The details of the merge are dependent upon the configured merge operator.

It is safe to modify the contents of the arguments after Merge returns.

func (*DB) Metrics Uses

func (d *DB) Metrics() *Metrics

Metrics returns metrics about the database.

func (*DB) NewBatch Uses

func (d *DB) NewBatch() *Batch

NewBatch returns a new empty write-only batch. Any reads on the batch will return an error. If the batch is committed it will be applied to the DB.

func (*DB) NewIndexedBatch Uses

func (d *DB) NewIndexedBatch() *Batch

NewIndexedBatch returns a new empty read-write batch. Any reads on the batch will read from both the batch and the DB. If the batch is committed it will be applied to the DB. An indexed batch is slower that a non-indexed batch for insert operations. If you do not need to perform reads on the batch, use NewBatch instead.

func (*DB) NewIter Uses

func (d *DB) NewIter(o *IterOptions) *Iterator

NewIter returns an iterator that is unpositioned (Iterator.Valid() will return false). The iterator can be positioned via a call to SeekGE, SeekLT, First or Last. The iterator provides a point-in-time view of the current DB state. This view is maintained by preventing file deletions and preventing memtables referenced by the iterator from being deleted. Using an iterator to maintain a long-lived point-in-time view of the DB state can lead to an apparent memory and disk usage leak. Use snapshots (see NewSnapshot) for point-in-time snapshots which avoids these problems.

func (*DB) NewSnapshot Uses

func (d *DB) NewSnapshot() *Snapshot

NewSnapshot returns a point-in-time view of the current DB state. Iterators created with this handle will all observe a stable snapshot of the current DB state. The caller must call Snapshot.Close() when the snapshot is no longer needed. Snapshots are not persisted across DB restarts (close -> open). Unlike the implicit snapshot maintained by an iterator, a snapshot will not prevent memtables from being released or sstables from being deleted. Instead, a snapshot prevents deletion of sequence numbers referenced by the snapshot.

func (*DB) SSTables Uses

func (d *DB) SSTables() [][]TableInfo

SSTables retrieves the current sstables. The returned slice is indexed by level and each level is indexed by the position of the sstable within the level. Note that this information may be out of date due to concurrent flushes and compactions.

func (*DB) Set Uses

func (d *DB) Set(key, value []byte, opts *WriteOptions) error

Set sets the value for the given key. It overwrites any previous value for that key; a DB is not a multi-map.

It is safe to modify the contents of the arguments after Set returns.

func (*DB) SingleDelete Uses

func (d *DB) SingleDelete(key []byte, opts *WriteOptions) error

SingleDelete adds an action to the batch that single deletes the entry for key. See Writer.SingleDelete for more details on the semantics of SingleDelete.

It is safe to modify the contents of the arguments after SingleDelete returns.

type DeferredBatchOp Uses

type DeferredBatchOp struct {

    // Key and Value point to parts of the binary batch representation where
    // keys and values should be encoded/copied into. len(Key) and len(Value)
    // bytes must be copied into these slices respectively before calling
    // Finish(). Changing where these slices point to is not allowed.
    Key, Value []byte
    // contains filtered or unexported fields
}

DeferredBatchOp represents a batch operation (eg. set, merge, delete) that is being inserted into the batch. Indexing is not performed on the specified key until Finish is called, hence the name deferred. This struct lets the caller copy or encode keys/values directly into the batch representation instead of copying into an intermediary buffer then having pebble.Batch copy off of it.

func (DeferredBatchOp) Finish Uses

func (d DeferredBatchOp) Finish()

Finish completes the addition of this batch operation, and adds it to the index if necessary. Must be called once (and exactly once) keys/values have been filled into Key and Value. Not calling Finish or not copying/encoding keys will result in an incomplete index, and calling Finish twice may result in a panic.

type DeleteCleaner Uses

type DeleteCleaner = base.DeleteCleaner

DeleteCleaner exports the base.DeleteCleaner type.

type Equal Uses

type Equal = base.Equal

Equal exports the base.Equal type.

type EventListener Uses

type EventListener struct {
    // BackgroundError is invoked whenever an error occurs during a background
    // operation such as flush or compaction.
    BackgroundError func(error)

    // CompactionBegin is invoked after the inputs to a compaction have been
    // determined, but before the compaction has produced any output.
    CompactionBegin func(CompactionInfo)

    // CompactionEnd is invoked after a compaction has completed and the result
    // has been installed.
    CompactionEnd func(CompactionInfo)

    // FlushBegin is invoked after the inputs to a flush have been determined,
    // but before the flush has produced any output.
    FlushBegin func(FlushInfo)

    // FlushEnd is invoked after a flush has complated and the result has been
    // installed.
    FlushEnd func(FlushInfo)

    // ManifestCreated is invoked after a manifest has been created.
    ManifestCreated func(ManifestCreateInfo)

    // ManifestDeleted is invoked after a manifest has been deleted.
    ManifestDeleted func(ManifestDeleteInfo)

    // TableCreated is invoked when a table has been created.
    TableCreated func(TableCreateInfo)

    // TableDeleted is invoked after a table has been deleted.
    TableDeleted func(TableDeleteInfo)

    // TableIngested is invoked after an externally created table has been
    // ingested via a call to DB.Ingest().
    TableIngested func(TableIngestInfo)

    // WALCreated is invoked after a WAL has been created.
    WALCreated func(WALCreateInfo)

    // WALDeleted is invoked after a WAL has been deleted.
    WALDeleted func(WALDeleteInfo)

    // WriteStallBegin is invoked when writes are intentionally delayed.
    WriteStallBegin func(WriteStallBeginInfo)

    // WriteStallEnd is invoked when delayed writes are released.
    WriteStallEnd func()
}

EventListener contains a set of functions that will be invoked when various significant DB events occur. Note that the functions should not run for an excessive amount of time as they are invoked synchronously by the DB and may block continued DB work. For a similar reason it is advisable to not perform any synchronous calls back into the DB.

func MakeLoggingEventListener Uses

func MakeLoggingEventListener(logger Logger) EventListener

MakeLoggingEventListener creates an EventListener that logs all events to the specified logger.

func (*EventListener) EnsureDefaults Uses

func (l *EventListener) EnsureDefaults(logger Logger)

EnsureDefaults ensures that background error events are logged to the specified logger if a handler for those events hasn't been otherwise specified. Ensure all handlers are non-nil so that we don't have to check for nil-ness before invoking.

type FilterMetrics Uses

type FilterMetrics = sstable.FilterMetrics

FilterMetrics holds metrics for the filter policy

type FilterPolicy Uses

type FilterPolicy = base.FilterPolicy

FilterPolicy exports the base.FilterPolicy type.

type FilterType Uses

type FilterType = base.FilterType

FilterType exports the base.FilterType type.

type FilterWriter Uses

type FilterWriter = base.FilterWriter

FilterWriter exports the base.FilterWriter type.

type FlushInfo Uses

type FlushInfo struct {
    // JobID is the ID of the flush job.
    JobID int
    // Reason is the reason for the flush.
    Reason string
    // Output contains the ouptut table generated by the flush. The output info
    // is empty for the flush begin event.
    Output []TableInfo
    Done   bool
    Err    error
}

FlushInfo contains the info for a flush event.

func (FlushInfo) String Uses

func (i FlushInfo) String() string

type InternalKey Uses

type InternalKey = base.InternalKey

InternalKey exports the base.InternalKey type.

type InternalKeyKind Uses

type InternalKeyKind = base.InternalKeyKind

InternalKeyKind exports the base.InternalKeyKind type.

type IterOptions Uses

type IterOptions struct {
    // LowerBound specifies the smallest key (inclusive) that the iterator will
    // return during iteration. If the iterator is seeked or iterated past this
    // boundary the iterator will return Valid()==false. Setting LowerBound
    // effectively truncates the key space visible to the iterator.
    LowerBound []byte
    // UpperBound specifies the largest key (exclusive) that the iterator will
    // return during iteration. If the iterator is seeked or iterated past this
    // boundary the iterator will return Valid()==false. Setting UpperBound
    // effectively truncates the key space visible to the iterator.
    UpperBound []byte
    // TableFilter can be used to filter the tables that are scanned during
    // iteration based on the user properties. Return true to scan the table and
    // false to skip scanning.
    TableFilter func(userProps map[string]string) bool
}

IterOptions hold the optional per-query parameters for NewIter.

Like Options, a nil *IterOptions is valid and means to use the default values.

func (*IterOptions) GetLowerBound Uses

func (o *IterOptions) GetLowerBound() []byte

GetLowerBound returns the LowerBound or nil if the receiver is nil.

func (*IterOptions) GetUpperBound Uses

func (o *IterOptions) GetUpperBound() []byte

GetUpperBound returns the UpperBound or nil if the receiver is nil.

type Iterator Uses

type Iterator struct {
    // contains filtered or unexported fields
}

Iterator iterates over a DB's key/value pairs in key order.

An iterator must be closed after use, but it is not necessary to read an iterator until exhaustion.

An iterator is not goroutine-safe, but it is safe to use multiple iterators concurrently, with each in a dedicated goroutine.

It is also safe to use an iterator concurrently with modifying its underlying DB, if that DB permits modification. However, the resultant key/value pairs are not guaranteed to be a consistent snapshot of that DB at a particular point in time.

func (*Iterator) Close Uses

func (i *Iterator) Close() error

Close closes the iterator and returns any accumulated error. Exhausting all the key/value pairs in a table is not considered to be an error. It is valid to call Close multiple times. Other methods should not be called after the iterator has been closed.

func (*Iterator) Error Uses

func (i *Iterator) Error() error

Error returns any accumulated error.

func (*Iterator) First Uses

func (i *Iterator) First() bool

First moves the iterator the the first key/value pair. Returns true if the iterator is pointing at a valid entry and false otherwise.

func (*Iterator) Key Uses

func (i *Iterator) Key() []byte

Key returns the key of the current key/value pair, or nil if done. The caller should not modify the contents of the returned slice, and its contents may change on the next call to Next.

func (*Iterator) Last Uses

func (i *Iterator) Last() bool

Last moves the iterator the the last key/value pair. Returns true if the iterator is pointing at a valid entry and false otherwise.

func (*Iterator) Next Uses

func (i *Iterator) Next() bool

Next moves the iterator to the next key/value pair. Returns true if the iterator is pointing at a valid entry and false otherwise.

func (*Iterator) Prev Uses

func (i *Iterator) Prev() bool

Prev moves the iterator to the previous key/value pair. Returns true if the iterator is pointing at a valid entry and false otherwise.

func (*Iterator) SeekGE Uses

func (i *Iterator) SeekGE(key []byte) bool

SeekGE moves the iterator to the first key/value pair whose key is greater than or equal to the given key. Returns true if the iterator is pointing at a valid entry and false otherwise.

func (*Iterator) SeekLT Uses

func (i *Iterator) SeekLT(key []byte) bool

SeekLT moves the iterator to the last key/value pair whose key is less than the given key. Returns true if the iterator is pointing at a valid entry and false otherwise.

func (*Iterator) SeekPrefixGE Uses

func (i *Iterator) SeekPrefixGE(key []byte) bool

SeekPrefixGE moves the iterator to the first key/value pair whose key is greater than or equal to the given key and shares a common prefix with the given key. Returns true if the iterator is pointing at a valid entry and false otherwise. Note that a user-defined Split function must be supplied to the Comparer. Also note that the iterator will not observe keys not matching the prefix.

func (*Iterator) SetBounds Uses

func (i *Iterator) SetBounds(lower, upper []byte)

SetBounds sets the lower and upper bounds for the iterator. Note that the iterator will always be invalidated and must be repositioned with a call to SeekGE, SeekPrefixGE, SeekLT, First, or Last.

func (*Iterator) Valid Uses

func (i *Iterator) Valid() bool

Valid returns true if the iterator is positioned at a valid key/value pair and false otherwise.

func (*Iterator) Value Uses

func (i *Iterator) Value() []byte

Value returns the value of the current key/value pair, or nil if done. The caller should not modify the contents of the returned slice, and its contents may change on the next call to Next.

type LevelMetrics Uses

type LevelMetrics struct {
    // The total number of files in the level.
    NumFiles int64
    // The total size in bytes of the files in the level.
    Size uint64
    // The level's compaction score.
    Score float64
    // The number of incoming bytes from other levels read during
    // compactions. This excludes bytes moved and bytes ingested. For L0 this is
    // the bytes written to the WAL.
    BytesIn uint64
    // The number of bytes ingested. The sibling metric for tables is
    // TablesIngested.
    BytesIngested uint64
    // The number of bytes moved into the level by a "move" compaction. The
    // sibling metric for tables is TablesMoved.
    BytesMoved uint64
    // The number of bytes read for compactions at the level. This includes bytes
    // read from other levels (BytesIn), as well as bytes read for the level.
    BytesRead uint64
    // The number of bytes written during flushes and compactions. The sibling
    // metrics for tables are TablesCompacted and TablesFlushed.
    BytesWritten uint64
    // The number of sstables compacted to this level.
    TablesCompacted uint64
    // The number of sstables flushed to this level.
    TablesFlushed uint64
    // The number of sstables ingested into the level.
    TablesIngested uint64
    // The number of sstables moved to this level by a "move" compaction.
    TablesMoved uint64
}

LevelMetrics holds per-level metrics such as the number of files and total size of the files, and compaction related metrics.

func (*LevelMetrics) Add Uses

func (m *LevelMetrics) Add(u *LevelMetrics)

Add updates the counter metrics for the level.

func (*LevelMetrics) WriteAmp Uses

func (m *LevelMetrics) WriteAmp() float64

WriteAmp computes the write amplification for compactions at this level. Computed as BytesWritten / BytesIn.

type LevelOptions Uses

type LevelOptions struct {
    // BlockRestartInterval is the number of keys between restart points
    // for delta encoding of keys.
    //
    // The default value is 16.
    BlockRestartInterval int

    // BlockSize is the target uncompressed size in bytes of each table block.
    //
    // The default value is 4096.
    BlockSize int

    // BlockSizeThreshold finishes a block if the block size is larger than the
    // specified percentage of the target block size and adding the next entry
    // would cause the block to be larger than the target block size.
    //
    // The default value is 90
    BlockSizeThreshold int

    // Compression defines the per-block compression to use.
    //
    // The default value (DefaultCompression) uses snappy compression.
    Compression Compression

    // FilterPolicy defines a filter algorithm (such as a Bloom filter) that can
    // reduce disk reads for Get calls.
    //
    // One such implementation is bloom.FilterPolicy(10) from the pebble/bloom
    // package.
    //
    // The default value means to use no filter.
    FilterPolicy FilterPolicy

    // FilterType defines whether an existing filter policy is applied at a
    // block-level or table-level. Block-level filters use less memory to create,
    // but are slower to access as a check for the key in the index must first be
    // performed to locate the filter block. A table-level filter will require
    // memory proportional to the number of keys in an sstable to create, but
    // avoids the index lookup when determining if a key is present. Table-level
    // filters should be preferred except under constrained memory situations.
    FilterType FilterType

    // IndexBlockSize is the target uncompressed size in bytes of each index
    // block. When the index block size is larger than this target, two-level
    // indexes are automatically enabled. Setting this option to a large value
    // (such as math.MaxInt32) disables the automatic creation of two-level
    // indexes.
    //
    // The default value is the value of BlockSize.
    IndexBlockSize int

    // The target file size for the level.
    TargetFileSize int64
}

LevelOptions holds the optional per-level parameters.

func (*LevelOptions) EnsureDefaults Uses

func (o *LevelOptions) EnsureDefaults() *LevelOptions

EnsureDefaults ensures that the default values for all of the options have been initialized. It is valid to call EnsureDefaults on a nil receiver. A non-nil result will always be returned.

type Logger Uses

type Logger interface {
    Infof(format string, args ...interface{})
    Fatalf(format string, args ...interface{})
}

Logger defines an interface for writing log messages.

type ManifestCreateInfo Uses

type ManifestCreateInfo struct {
    // JobID is the ID of the job the caused the manifest to be created.
    JobID int
    Path  string
    // The file number of the new Manifest.
    FileNum uint64
    Err     error
}

ManifestCreateInfo contains info about a manifest creation event.

func (ManifestCreateInfo) String Uses

func (i ManifestCreateInfo) String() string

type ManifestDeleteInfo Uses

type ManifestDeleteInfo struct {
    // JobID is the ID of the job the caused the Manifest to be deleted.
    JobID   int
    Path    string
    FileNum uint64
    Err     error
}

ManifestDeleteInfo contains the info for a Manifest deletion event.

func (ManifestDeleteInfo) String Uses

func (i ManifestDeleteInfo) String() string

type Merge Uses

type Merge = base.Merge

Merge exports the base.Merge type.

type Merger Uses

type Merger = base.Merger

Merger exports the base.Merger type.

type Metrics Uses

type Metrics struct {
    BlockCache CacheMetrics

    Compact struct {
        // The total number of compactions.
        Count int64
        // An estimate of the number of bytes that need to be compacted for the LSM
        // to reach a stable state.
        EstimatedDebt uint64
    }

    Flush struct {
        // The total number of flushes.
        Count int64
    }

    Filter FilterMetrics

    Levels [numLevels]LevelMetrics

    MemTable struct {
        // The number of bytes allocated by memtables and large (flushable)
        // batches.
        Size uint64
        // The count of memtables.
        Count int64
    }

    TableCache CacheMetrics

    // Count of the number of open sstable iterators.
    TableIters int64

    WAL struct {
        // Number of live WAL files.
        Files int64
        // Number of obsolete WAL files.
        ObsoleteFiles int64
        // Size of the live data in the WAL files. Note that with WAL file
        // recycling this is less than the actual on-disk size of the WAL files.
        Size uint64
        // Number of logical bytes written to the WAL.
        BytesIn uint64
        // Number of bytes written to the WAL.
        BytesWritten uint64
    }
}

Metrics holds metrics for various subsystems of the DB such as the Cache, Compactions, WAL, and per-Level metrics.

TODO(peter): The testing of these metrics is relatively weak. There should be testing that performs various operations on a DB and verifies that the metrics reflect those operations.

func (*Metrics) String Uses

func (m *Metrics) String() string

Pretty-print the metrics, showing a line for the WAL, a line per-level, and a total:

__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___w-amp
    WAL         1    27 B       -    48 B       -       -       -       -   108 B       -       -     2.2
      0         2   1.6 K    0.50    81 B   825 B       1     0 B       0   2.4 K       3     0 B    30.6
      1         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B     0.0
      2         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B     0.0
      3         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B     0.0
      4         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B     0.0
      5         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B     0.0
      6         1   825 B    0.00   1.6 K     0 B       0     0 B       0   825 B       1   1.6 K     0.5
  total         3   2.4 K       -   933 B   825 B       1     0 B       0   4.1 K       4   1.6 K     4.5
  flush         3
compact         1   1.6 K          (size == estimated-debt)
 memtbl         1   4.0 M
 bcache         4   752 B    7.7%  (score == hit-rate)
 tcache         0     0 B    0.0%  (score == hit-rate)
 titers         0
 filter         -       -    0.0%  (score == utility)

The WAL "in" metric is the size of the batches written to the WAL. The WAL "write" metric is the size of the physical data written to the WAL which includes record fragment overhead. Write amplification is computed as bytes-written / bytes-in, except for the total row where bytes-in is replaced with WAL-bytes-written + bytes-ingested.

type Options Uses

type Options struct {
    // Sync sstables and the WAL periodically in order to smooth out writes to
    // disk. This option does not provide any persistency guarantee, but is used
    // to avoid latency spikes if the OS automatically decides to write out a
    // large chunk of dirty filesystem buffers.
    //
    // The default value is 512KB.
    BytesPerSync int

    // Cache is used to cache uncompressed blocks from sstables.
    //
    // The default cache size is 8 MB.
    Cache *cache.Cache

    // Cleaner cleans obsolete files.
    //
    // The default cleaner uses the DeleteCleaner.
    Cleaner Cleaner

    // Comparer defines a total ordering over the space of []byte keys: a 'less
    // than' relationship. The same comparison algorithm must be used for reads
    // and writes over the lifetime of the DB.
    //
    // The default value uses the same ordering as bytes.Compare.
    Comparer *Comparer

    // Disable the write-ahead log (WAL). Disabling the write-ahead log prohibits
    // crash recovery, but can improve performance if crash recovery is not
    // needed (e.g. when only temporary state is being stored in the database).
    //
    // TODO(peter): untested
    DisableWAL bool

    // ErrorIfExists is whether it is an error if the database already exists.
    //
    // The default value is false.
    ErrorIfExists bool

    // ErrorIfNotExists is whether it is an error if the database does not
    // already exist.
    //
    // The default value is false which will cause a database to be created if it
    // does not already exist.
    ErrorIfNotExists bool

    // EventListener provides hooks to listening to significant DB events such as
    // flushes, compactions, and table deletion.
    EventListener EventListener

    // Filters is a map from filter policy name to filter policy. It is used for
    // debugging tools which may be used on multiple databases configured with
    // different filter policies. It is not necessary to populate this filters
    // map during normal usage of a DB.
    Filters map[string]FilterPolicy

    // FS provides the interface for persistent file storage.
    //
    // The default value uses the underlying operating system's file system.
    FS  vfs.FS

    // The number of files necessary to trigger an L0 compaction.
    L0CompactionThreshold int

    // Hard limit on the number of L0 files. Writes are stopped when this
    // threshold is reached.
    L0StopWritesThreshold int

    // The maximum number of bytes for LBase. The base level is the level which
    // L0 is compacted into. The base level is determined dynamically based on
    // the existing data in the LSM. The maximum number of bytes for other levels
    // is computed dynamically based on the base level's maximum size. When the
    // maximum number of bytes for a level is exceeded, compaction is requested.
    LBaseMaxBytes int64

    // Per-level options. Options for at least one level must be specified. The
    // options for the last level are used for all subsequent levels.
    Levels []LevelOptions

    // Logger used to write log messages.
    //
    // The default logger uses the Go standard library log package.
    Logger Logger

    // MaxManifestFileSize is the maximum size the MANIFEST file is allowed to
    // become. When the MANIFEST exceeds this size it is rolled over and a new
    // MANIFEST is created.
    MaxManifestFileSize int64

    // MaxOpenFiles is a soft limit on the number of open files that can be
    // used by the DB.
    //
    // The default value is 1000.
    MaxOpenFiles int

    // The size of a MemTable in steady state. The actual MemTable size starts at
    // min(256KB, MemTableSize) and doubles for each subsequent MemTable up to
    // MemTableSize. This reduces the memory pressure caused by MemTables for
    // short lived (test) DB instances. Note that more than one MemTable can be
    // in existence since flushing a MemTable involves creating a new one and
    // writing the contents of the old one in the
    // background. MemTableStopWritesThreshold places a hard limit on the size of
    // the queued MemTables.
    MemTableSize int

    // Hard limit on the size of queued of MemTables. Writes are stopped when the
    // sum of the queued memtable sizes exceeds
    // MemTableStopWritesThreshold*MemTableSize. This value should be at least 2
    // or writes will stop whenever a MemTable is being flushed.
    MemTableStopWritesThreshold int

    // Merger defines the associative merge operation to use for merging values
    // written with {Batch,DB}.Merge.
    //
    // The default merger concatenates values.
    Merger *Merger

    // MinCompactionRate sets the minimum rate at which compactions occur. The
    // default is 4 MB/s.
    MinCompactionRate int

    // MinFlushRate sets the minimum rate at which the MemTables are flushed. The
    // default is 1 MB/s.
    MinFlushRate int

    // ReadOnly indicates that the DB should be opened in read-only mode. Writes
    // to the DB will return an error, background compactions are disabled, and
    // the flush that normally occurs after replaying the WAL at startup is
    // disabled.
    ReadOnly bool

    // TableFormat specifies the format version for writing sstables. The default
    // is TableFormatRocksDBv2 which creates RocksDB compatible sstables. Use
    // TableFormatLevelDB to create LevelDB compatible sstable which can be used
    // by a wider range of tools and libraries.
    TableFormat TableFormat

    // TablePropertyCollectors is a list of TablePropertyCollector creation
    // functions. A new TablePropertyCollector is created for each sstable built
    // and lives for the lifetime of the table.
    TablePropertyCollectors []func() TablePropertyCollector

    // WALDir specifies the directory to store write-ahead logs (WALs) in. If
    // empty (the default), WALs will be stored in the same directory as sstables
    // (i.e. the directory passed to pebble.Open).
    WALDir string

    // WALMinSyncInterval is the minimum duration between syncs of the WAL. If
    // WAL syncs are requested faster than this interval, they will be
    // artificially delayed. Introducing a small artificial delay (500us) between
    // WAL syncs can allow more operations to arrive and reduce IO operations
    // while having a minimal impact on throughput. This option is supplied as a
    // closure in order to allow the value to be changed dynamically. The default
    // value is 0.
    //
    // TODO(peter): rather than a closure, should there be another mechanism for
    // changing options dynamically?
    WALMinSyncInterval func() time.Duration
}

Options holds the optional parameters for configuring pebble. These options apply to the DB at large; per-query options are defined by the IterOptions and WriteOptions types.

func (*Options) Check Uses

func (o *Options) Check(s string) error

Check verifies the options are compatible with the previous options serialized by Options.String(). For example, the Comparer and Merger must be the same, or data will not be able to be properly read from the DB.

func (*Options) Clone Uses

func (o *Options) Clone() *Options

Clone creates a shallow-copy of the supplied options.

func (*Options) EnsureDefaults Uses

func (o *Options) EnsureDefaults() *Options

EnsureDefaults ensures that the default values for all options are set if a valid value was not already specified. Returns the new options.

func (*Options) Level Uses

func (o *Options) Level(level int) LevelOptions

Level returns the LevelOptions for the specified level.

func (*Options) MakeReaderOptions Uses

func (o *Options) MakeReaderOptions() sstable.ReaderOptions

MakeReaderOptions constructs sstable.ReaderOptions from the corresponding options in the receiver.

func (*Options) MakeWriterOptions Uses

func (o *Options) MakeWriterOptions(level int) sstable.WriterOptions

MakeWriterOptions constructs sstable.WriterOptions for the specified level from the corresponding options in the receiver.

func (*Options) Parse Uses

func (o *Options) Parse(s string, hooks *ParseHooks) error

Parse parses the options from the specified string. Note that certain options cannot be parsed into populated fields. For example, comparer and merger.

func (*Options) String Uses

func (o *Options) String() string

func (*Options) Validate Uses

func (o *Options) Validate() error

Validate verifies that the options are mutually consistent. For example, L0StopWritesThreshold must be >= L0CompactionThreshold, otherwise a write stall would persist indefinitely.

type ParseHooks Uses

type ParseHooks struct {
    NewCleaner      func(name string) (Cleaner, error)
    NewComparer     func(name string) (*Comparer, error)
    NewFilterPolicy func(name string) (FilterPolicy, error)
    NewMerger       func(name string) (*Merger, error)
    SkipUnknown     func(name string) bool
}

ParseHooks contains callbacks to create options fields which can have user-defined implementations.

type Reader Uses

type Reader interface {
    // Get gets the value for the given key. It returns ErrNotFound if the DB
    // does not contain the key.
    //
    // The caller should not modify the contents of the returned slice, but
    // it is safe to modify the contents of the argument after Get returns.
    Get(key []byte) (value []byte, err error)

    // NewIter returns an iterator that is unpositioned (Iterator.Valid() will
    // return false). The iterator can be positioned via a call to SeekGE,
    // SeekLT, First or Last.
    NewIter(o *IterOptions) *Iterator

    // Close closes the Reader. It may or may not close any underlying io.Reader
    // or io.Writer, depending on how the DB was created.
    //
    // It is not safe to close a DB until all outstanding iterators are closed.
    // It is valid to call Close multiple times. Other methods should not be
    // called after the DB has been closed.
    Close() error
}

Reader is a readable key/value store.

It is safe to call Get and NewIter from concurrent goroutines.

type Separator Uses

type Separator = base.Separator

Separator exports the base.Separator type.

type Snapshot Uses

type Snapshot struct {
    // contains filtered or unexported fields
}

Snapshot provides a read-only point-in-time view of the DB state.

func (*Snapshot) Close Uses

func (s *Snapshot) Close() error

Close closes the snapshot, releasing its resources. Close must be called. Failure to do so while result in a tiny memory leak, and a large leak of resources on disk due to the entries the snapshot is preventing from being deleted.

func (*Snapshot) Get Uses

func (s *Snapshot) Get(key []byte) ([]byte, error)

Get gets the value for the given key. It returns ErrNotFound if the DB does not contain the key.

The caller should not modify the contents of the returned slice, but it is safe to modify the contents of the argument after Get returns.

func (*Snapshot) NewIter Uses

func (s *Snapshot) NewIter(o *IterOptions) *Iterator

NewIter returns an iterator that is unpositioned (Iterator.Valid() will return false). The iterator can be positioned via a call to SeekGE, SeekLT, First or Last.

type Split Uses

type Split = base.Split

Split exports the base.Split type.

type Successor Uses

type Successor = base.Successor

Successor exports the base.Successor type.

type TableCreateInfo Uses

type TableCreateInfo struct {
    JobID int
    // Reason is the reason for the table creation (flushing or compacting).
    Reason  string
    Path    string
    FileNum uint64
}

TableCreateInfo contains the info for a table creation event.

func (TableCreateInfo) String Uses

func (i TableCreateInfo) String() string

type TableDeleteInfo Uses

type TableDeleteInfo struct {
    JobID   int
    Path    string
    FileNum uint64
    Err     error
}

TableDeleteInfo contains the info for a table deletion event.

func (TableDeleteInfo) String Uses

func (i TableDeleteInfo) String() string

type TableFormat Uses

type TableFormat = sstable.TableFormat

TableFormat exports the base.TableFormat type.

type TableInfo Uses

type TableInfo = manifest.TableInfo

TableInfo exports the manifest.TableInfo type.

type TableIngestInfo Uses

type TableIngestInfo struct {
    // JobID is the ID of the job the caused the table to be ingested.
    JobID  int
    Tables []struct {
        TableInfo
        Level int
    }
    // GlobalSeqNum is the sequence number that was assigned to all entries in
    // the ingested table.
    GlobalSeqNum uint64
    Err          error
}

TableIngestInfo contains the info for a table ingestion event.

func (TableIngestInfo) String Uses

func (i TableIngestInfo) String() string

type TablePropertyCollector Uses

type TablePropertyCollector = sstable.TablePropertyCollector

TablePropertyCollector exports the base.TablePropertyCollector type.

type ValueMerger Uses

type ValueMerger = base.ValueMerger

ValueMerger exports the base.ValueMerger type.

type WALCreateInfo Uses

type WALCreateInfo struct {
    // JobID is the ID of the job the caused the WAL to be created.
    JobID int
    Path  string
    // The file number of the new WAL.
    FileNum uint64
    // The file number of a previous WAL which was recycled to create this
    // one. Zero if recycling did not take place.
    RecycledFileNum uint64
    Err             error
}

WALCreateInfo contains info about a WAL creation event.

func (WALCreateInfo) String Uses

func (i WALCreateInfo) String() string

type WALDeleteInfo Uses

type WALDeleteInfo struct {
    // JobID is the ID of the job the caused the WAL to be deleted.
    JobID   int
    Path    string
    FileNum uint64
    Err     error
}

WALDeleteInfo contains the info for a WAL deletion event.

func (WALDeleteInfo) String Uses

func (i WALDeleteInfo) String() string

type WriteOptions Uses

type WriteOptions struct {
    // Sync is whether to sync underlying writes from the OS buffer cache
    // through to actual disk, if applicable. Setting Sync can result in
    // slower writes.
    //
    // If false, and the machine crashes, then some recent writes may be lost.
    // Note that if it is just the process that crashes (and the machine does
    // not) then no writes will be lost.
    //
    // In other words, Sync being false has the same semantics as a write
    // system call. Sync being true means write followed by fsync.
    //
    // The default value is true.
    Sync bool
}

WriteOptions hold the optional per-query parameters for Set and Delete operations.

Like Options, a nil *WriteOptions is valid and means to use the default values.

func (*WriteOptions) GetSync Uses

func (o *WriteOptions) GetSync() bool

GetSync returns the Sync value or true if the receiver is nil.

type WriteStallBeginInfo Uses

type WriteStallBeginInfo struct {
    Reason string
}

WriteStallBeginInfo contains the info for a write stall begin event.

func (WriteStallBeginInfo) String Uses

func (i WriteStallBeginInfo) String() string

type Writer Uses

type Writer interface {
    // Apply the operations contained in the batch to the DB.
    //
    // It is safe to modify the contents of the arguments after Apply returns.
    Apply(batch *Batch, o *WriteOptions) error

    // Delete deletes the value for the given key. Deletes are blind all will
    // succeed even if the given key does not exist.
    //
    // It is safe to modify the contents of the arguments after Delete returns.
    Delete(key []byte, o *WriteOptions) error

    // SingleDelete is similar to Delete in that it deletes the value for the given key. Like Delete,
    // it is a blind operation that will succeed even if the given key does not exist.
    //
    // WARNING: Undefined (non-deterministic) behavior will result if a key is overwritten and
    // then deleted using SingleDelete. The record may appear deleted immediately, but be
    // resurrected at a later time after compactions have been performed. Or the record may
    // be deleted permanently. A Delete operation lays down a "tombstone" which shadows all
    // previous versions of a key. The SingleDelete operation is akin to "anti-matter" and will
    // only delete the most recently written version for a key. These different semantics allow
    // the DB to avoid propagating a SingleDelete operation during a compaction as soon as the
    // corresponding Set operation is encountered. These semantics require extreme care to handle
    // properly. Only use if you have a workload where the performance gain is critical and you
    // can guarantee that a record is written once and then deleted once.
    //
    // SingleDelete is internally transformed into a Delete if the most recent record for a key is either
    // a Merge or Delete record.
    //
    // It is safe to modify the contents of the arguments after SingleDelete returns.
    SingleDelete(key []byte, o *WriteOptions) error

    // DeleteRange deletes all of the keys (and values) in the range [start,end)
    // (inclusive on start, exclusive on end).
    //
    // It is safe to modify the contents of the arguments after Delete returns.
    DeleteRange(start, end []byte, o *WriteOptions) error

    // LogData adds the specified to the batch. The data will be written to the
    // WAL, but not added to memtables or sstables. Log data is never indexed,
    // which makes it useful for testing WAL performance.
    //
    // It is safe to modify the contents of the argument after LogData returns.
    LogData(data []byte, opts *WriteOptions) error

    // Merge merges the value for the given key. The details of the merge are
    // dependent upon the configured merge operation.
    //
    // It is safe to modify the contents of the arguments after Merge returns.
    Merge(key, value []byte, o *WriteOptions) error

    // Set sets the value for the given key. It overwrites any previous value
    // for that key; a DB is not a multi-map.
    //
    // It is safe to modify the contents of the arguments after Set returns.
    Set(key, value []byte, o *WriteOptions) error
}

Writer is a writable key/value store.

Goroutine safety is dependent on the specific implementation.

Directories

PathSynopsis
bloomPackage bloom implements Bloom filters.
internal/ackseq
internal/arenaskl
internal/base
internal/batchskl
internal/bytealloc
internal/cachePackage cache implements the CLOCK-Pro caching algorithm.
internal/crcPackage crc implements the checksum algorithm used throughout pebble.
internal/datadriven
internal/humanize
internal/lint
internal/manifest
internal/metamorphic
internal/private
internal/randvar
internal/rangedel
internal/ratePackage rate provides a rate limiter.
internal/rawalloc
internal/recordPackage record reads and writes sequences of records.
sstablePackage sstable implements readers and writers of pebble tables.
tool
vfs

Package pebble imports 33 packages (graph) and is imported by 13 packages. Updated 2019-12-07. Refresh now. Tools for package owners.