base: github.com/grailbio/base/mapio Index | Files

package mapio

import "github.com/grailbio/base/mapio"

Package mapio implements a sorted, on-disk map, similar to the SSTable data structure used in Bigtable [1], Cassandra [2], and others. Maps are read-only, and are produced by a Writer. Each Writer expects keys to be appended in lexicographic order. Buf provides a means of buffering writes to be sorted before appended to a Writer.

Mapio's on-disk layout loosely follows that of LevelDB [3]. Each Map is a sequence of blocks; each block comprises a sequence of entries, followed by a trailer:

block := blockEntry* blockTrailer
blockEntry :=
	nshared:   uvarint           // number of bytes shared with previous key
	nunshared: uvarint           // number of new bytes in this entry's key
	nvalue:    uvarint           // number of bytes in value
	key:       uint8[nunshared]  // the (prefix compressed) key
	value:     uint8[nvalue]     // the entry's value
blockTrailer :=
	restarts:  uint32[nrestart]  // array of key restarts
	nrestart:  uint32            // size of restart array
	type:      uint8             // block type (should be 0; reserved for future use)
	crc32:     uint32            // IEEE crc32 of contents and trailer

Maps prefix compress each key by storing the number of bytes shared with the previous key. Maps contain a number of restart points: points at which the full key is specified (and nshared = 0). The restart point are stored in an array in the block trailer. This array can be used to perform binary search for keys.

A Map is a sequence of data blocks, followed by an index block, followed by a trailer.

map := block(data)* block(meta)* block(index) mapTrailer
mapTrailer :=
	meta:	blockAddr[20]  // zero-padded address of the meta block index (tbd)
	index:  blockAddr[20]  // zero-padded address of index
	magic:	uint64         // magic (0xa8b2374e8558bc76)
blockAddr :=
	offset: uvarint        // offset of block in map
	len:    uvarint        // length of block

The index block contains one entry for each block in the map: each entry's key is the last key in that block; the entry's value is a blockAddr containing the position of that block. This arrangement allows the reader to binary search the index block then search the found block.

[1] https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf [2] https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf [3] https://github.com/google/leveldb

Index

Package Files

block.go buf.go doc.go map.go merged.go scanner.go writer.go

type Buf Uses

type Buf struct {
    // contains filtered or unexported fields
}

A Buf is an unordered write buffer for maps. It holds entries in memory; these are then sorted and written to a map.

func (*Buf) Append Uses

func (b *Buf) Append(key, value []byte)

Append append the given entry to the buffer.

func (*Buf) Len Uses

func (b *Buf) Len() int

Len implements sort.Interface

func (*Buf) Less Uses

func (b *Buf) Less(i, j int) bool

Less implements sort.Interface

func (*Buf) Size Uses

func (b *Buf) Size() int

Size returns the number size of this buffer in bytes.

func (*Buf) Swap Uses

func (b *Buf) Swap(i, j int)

Swap implements sort.Interface

func (*Buf) WriteTo Uses

func (b *Buf) WriteTo(w *Writer) error

WriteTo sorts and then writes all of the entries in this buffer to the provided writer.

type Map Uses

type Map struct {
    // contains filtered or unexported fields
}

Map is a read-only, sorted map backed by an io.ReadSeeker. The on-disk layout of maps are described by the package documentation. Maps support both lookup and (ordered) iteration. A Map instance maintains a current position, starting out at the first entry.

func New Uses

func New(r io.ReadSeeker) (*Map, error)

New opens the map at the provided io.ReadSeeker (usually a file).

func (*Map) Seek Uses

func (m *Map) Seek(key []byte) *MapScanner

Seek returns a map scanner beginning at the first key in the map >= the provided key.

type MapScanner Uses

type MapScanner struct {
    // contains filtered or unexported fields
}

MapScanner implements ordered iteration over a map.

func (*MapScanner) Err Uses

func (m *MapScanner) Err() error

Err returns the last error encountered while scanning.

func (*MapScanner) Key Uses

func (m *MapScanner) Key() []byte

Key returns the key that was last scanned.

func (*MapScanner) Scan Uses

func (m *MapScanner) Scan() bool

Scan scans the next entry, returning true on success. When Scan returns false, the caller should inspect Err to distinguish between scan completion and scan error.

func (*MapScanner) Value Uses

func (m *MapScanner) Value() []byte

Value returns the value that was last scanned.

type Merged Uses

type Merged []*Map

Merged represents the merged contents of multiple underlying maps. Like Map, Merged presents a sorted, scannable map, but it does not guarantee that the order of traversal is stable.

func (Merged) Seek Uses

func (m Merged) Seek(key []byte) *MergedScanner

Seek returns a scanner for the merged map that starts at the first entry where entryKey <= key.

type MergedScanner Uses

type MergedScanner []*MapScanner

MergedScanner is a scanner for merged maps.

func (MergedScanner) Err Uses

func (m MergedScanner) Err() error

Err returns the last error encountered while scanning, if any.

func (MergedScanner) Key Uses

func (m MergedScanner) Key() []byte

Key returns the last key scanned.

func (MergedScanner) Len Uses

func (m MergedScanner) Len() int

Len implements heap.Interface

func (MergedScanner) Less Uses

func (m MergedScanner) Less(i, j int) bool

Less implements heap.Interface

func (*MergedScanner) Pop Uses

func (m *MergedScanner) Pop() interface{}

Pop implements heap.Interface

func (*MergedScanner) Push Uses

func (m *MergedScanner) Push(x interface{})

Push implements heap.Interface

func (*MergedScanner) Scan Uses

func (m *MergedScanner) Scan() bool

Scan scans the next entry in the merged map, returning true on success. If Scan returns false, the caller should check Err to distinguish between scan completion and scan error.

func (MergedScanner) Swap Uses

func (m MergedScanner) Swap(i, j int)

Swap implements heap.Interface

func (MergedScanner) Value Uses

func (m MergedScanner) Value() []byte

Value returns the last value scanned.

type Scanner Uses

type Scanner interface {
    // Scan scans the next entry, returning true on success, after which
    // time the entry is available to inspect using the Key and Value
    // methods.
    Scan() bool
    // Err returns the last error encountered while scanning, if any.
    Err() error
    // Key returns the key of the last scanned entry.
    Key() []byte
    // Value returns the value of the last scanned entry.
    Value() []byte
}

Scanner is an ordered iterator over map entries.

type WriteOption Uses

type WriteOption func(*Writer)

WriteOption represents a tunable writer parameter.

func BlockSize Uses

func BlockSize(sz int) WriteOption

BlockSize sets the writer's target block size to sz (in bytes). Note that keys and values cannot straddle blocks, so that if large data are added to a map, block sizes can grow large. The default target block size is 4KB.

func RestartInterval Uses

func RestartInterval(iv int) WriteOption

RestartInterval sets the writer's restart interval to provided value. The default restart interval is 16.

type Writer Uses

type Writer struct {
    // contains filtered or unexported fields
}

A Writer appends key-value pairs to a map. Keys must be appended in lexicographic order.

func NewWriter Uses

func NewWriter(w io.Writer, opts ...WriteOption) *Writer

NewWriter returns a new Writer that writes a map to the provided io.Writer. BlockSize specifies the target block size, while restartInterval determines the frequency of key restart points, which trades off lookup performance with size. See package docs for more details.

func (*Writer) Append Uses

func (w *Writer) Append(key, value []byte) error

Append appends an entry to the maps. Keys must be provided in lexicographic order.

func (*Writer) Close Uses

func (w *Writer) Close() error

Close flushes the last block of the writer and writes the map's trailer. After successful close, the map is ready to be opened.

func (*Writer) Flush Uses

func (w *Writer) Flush() error

Flush creates a new block with the current contents. It forces the creation of a new block, and overrides the Writer's block size parameter.

Package mapio imports 9 packages (graph). Updated 2019-10-15. Refresh now. Tools for package owners.