vellum: github.com/couchbase/vellum Index | Examples | Files | Directories

package vellum

import "github.com/couchbase/vellum"

Package vellum is a library for building, serializing and executing an FST (finite state transducer).

There are two distinct phases, building an FST and using it.

When building an FST, you insert keys ([]byte) and their associated value (uint64). Insert operations MUST be done in lexicographic order. While building the FST, data is streamed to an underlying Writer. At the conclusion of building, you MUST call Close() on the builder.

After completion of the build phase, you can either Open() the FST if you serialized it to disk. Alternatively, if you already have the bytes in memory, you can use Load(). By default, Open() will use mmap to avoid loading the entire file into memory.

Once the FST is ready, you can use the Contains() method to see if a keys is in the FST. You can use the Get() method to see if a key is in the FST and retrieve it's associated value. And, you can use the Iterator method to enumerate key/value pairs within a specified range.

Code:

var buf bytes.Buffer
builder, err := vellum.New(&buf, nil)
if err != nil {
    log.Fatal(err)
}

err = builder.Insert([]byte("cat"), 1)
if err != nil {
    log.Fatal(err)
}

err = builder.Insert([]byte("dog"), 2)
if err != nil {
    log.Fatal(err)
}

err = builder.Insert([]byte("fish"), 3)
if err != nil {
    log.Fatal(err)
}

err = builder.Close()
if err != nil {
    log.Fatal(err)
}

fst, err := vellum.Load(buf.Bytes())
if err != nil {
    log.Fatal(err)
}

val, exists, err := fst.Get([]byte("cat"))
if err != nil {
    log.Fatal(err)
}
if exists {
    fmt.Println(val)
}

val, exists, err = fst.Get([]byte("dog"))
if err != nil {
    log.Fatal(err)
}
if exists {
    fmt.Println(val)
}

val, exists, err = fst.Get([]byte("fish"))
if err != nil {
    log.Fatal(err)
}
if exists {
    fmt.Println(val)
}

Output:

1
2
3

Index

Examples

Package Files

automaton.go builder.go common.go decoder_v1.go encoder_v1.go encoding.go fst.go fst_iterator.go merge_iterator.go pack.go registry.go transducer.go vellum.go vellum_mmap.go writer.go

Variables

var ErrIteratorDone = errors.New("iterator-done")

ErrIteratorDone is returned by Iterator/Next/Seek methods when the Current() value pointed to by the iterator is greater than the last key in this FST, or outside the configured startKeyInclusive/endKeyExclusive range of the Iterator.

var ErrOutOfOrder = errors.New("values not inserted in lexicographic order")

ErrOutOfOrder is returned when values are not inserted in lexicographic order.

func AutomatonContains Uses

func AutomatonContains(a Automaton, k []byte) bool

AutomatonContains implements an generic Contains() method which works on any implementation of Automaton

func Merge Uses

func Merge(w io.Writer, opts *BuilderOpts, itrs []Iterator, f MergeFunc) error

Merge will iterate through the provided Iterators, merge duplicate keys with the provided MergeFunc, and build a new FST to the provided Writer.

func MergeMax Uses

func MergeMax(vals []uint64) uint64

MergeMax chooses the maximum value

func MergeMin Uses

func MergeMin(vals []uint64) uint64

MergeMin chooses the minimum value

func MergeSum Uses

func MergeSum(vals []uint64) uint64

MergeSum sums the values

func TransducerGet Uses

func TransducerGet(t Transducer, k []byte) (bool, uint64)

TransducerGet implements an generic Get() method which works on any implementation of Transducer The caller MUST check the boolean return value for a match. Zero is a valid value regardless of match status, and if it is NOT a match, the value collected so far is returned.

type AlwaysMatch Uses

type AlwaysMatch struct{}

AlwaysMatch is an Automaton implementation which always matches

func (*AlwaysMatch) Accept Uses

func (m *AlwaysMatch) Accept(int, byte) int

Accept returns the next AlwaysMatch state

func (*AlwaysMatch) CanMatch Uses

func (m *AlwaysMatch) CanMatch(int) bool

CanMatch always returns true

func (*AlwaysMatch) IsMatch Uses

func (m *AlwaysMatch) IsMatch(int) bool

IsMatch always returns true

func (*AlwaysMatch) Start Uses

func (m *AlwaysMatch) Start() int

Start returns the AlwaysMatch start state

func (*AlwaysMatch) WillAlwaysMatch Uses

func (m *AlwaysMatch) WillAlwaysMatch(int) bool

WillAlwaysMatch always returns true

type Automaton Uses

type Automaton interface {

    // Start returns the start state
    Start() int

    // IsMatch returns true if and only if the state is a match
    IsMatch(int) bool

    // CanMatch returns true if and only if it is possible to reach a match
    // in zero or more steps
    CanMatch(int) bool

    // WillAlwaysMatch returns true if and only if the current state matches
    // and will always match no matter what steps are taken
    WillAlwaysMatch(int) bool

    // Accept returns the next state given the input to the specified state
    Accept(int, byte) int
}

Automaton represents the general contract of a byte-based finite automaton

type Builder Uses

type Builder struct {
    // contains filtered or unexported fields
}

A Builder is used to build a new FST. When possible data is streamed out to the underlying Writer as soon as possible.

func New Uses

func New(w io.Writer, opts *BuilderOpts) (*Builder, error)

New returns a new Builder which will stream out the underlying representation to the provided Writer as the set is built.

func (*Builder) Close Uses

func (b *Builder) Close() error

Close MUST be called after inserting all values.

func (*Builder) Insert Uses

func (b *Builder) Insert(key []byte, val uint64) error

Insert the provided value to the set being built. NOTE: values must be inserted in lexicographical order.

func (*Builder) Reset Uses

func (b *Builder) Reset(w io.Writer) error

type BuilderOpts Uses

type BuilderOpts struct {
    Encoder           int
    RegistryTableSize int
    RegistryMRUSize   int
}

BuilderOpts is a structure to let advanced users customize the behavior of the builder and some aspects of the generated FST.

type FST Uses

type FST struct {
    // contains filtered or unexported fields
}

FST is an in-memory representation of a finite state transducer, capable of returning the uint64 value associated with each []byte key stored, as well as enumerating all of the keys in order.

func Load Uses

func Load(data []byte) (*FST, error)

Load will return the FST represented by the provided byte slice.

func Open Uses

func Open(path string) (*FST, error)

Open loads the FST stored in the provided path

func (*FST) Accept Uses

func (f *FST) Accept(addr int, b byte) int

Accept returns the next state for this Automaton on input of byte b

func (*FST) AcceptWithVal Uses

func (f *FST) AcceptWithVal(addr int, b byte) (int, uint64)

AcceptWithVal returns the next state for this Automaton on input of byte b and also returns the output value for the transition

func (*FST) CanMatch Uses

func (f *FST) CanMatch(addr int) bool

CanMatch returns if this state can ever transition to a matching state in this Automaton

func (*FST) Close Uses

func (f *FST) Close() error

Close will unmap any mmap'd data (if managed by vellum) and it will close the backing file (if managed by vellum). You MUST call Close() for any FST instance that is created.

func (*FST) Contains Uses

func (f *FST) Contains(val []byte) (bool, error)

Contains returns true if this FST contains the specified key.

func (*FST) Debug Uses

func (f *FST) Debug(callback func(int, interface{}) error) error

Debug is only intended for debug purposes, it simply asks the underlying decoder visit each state, and pass it to the provided callback.

func (*FST) Get Uses

func (f *FST) Get(input []byte) (uint64, bool, error)

Get returns the value associated with the key. NOTE: a value of zero does not imply the key does not exist, you must consult the second return value as well.

func (*FST) GetMaxKey Uses

func (f *FST) GetMaxKey() ([]byte, error)

func (*FST) GetMinKey Uses

func (f *FST) GetMinKey() ([]byte, error)

func (*FST) IsMatch Uses

func (f *FST) IsMatch(addr int) bool

IsMatch returns if this state is a matching state in this Automaton

func (*FST) IsMatchWithVal Uses

func (f *FST) IsMatchWithVal(addr int) (bool, uint64)

IsMatchWithVal returns if this state is a matching state in this Automaton and also returns the final output value for this state

func (*FST) Iterator Uses

func (f *FST) Iterator(startKeyInclusive, endKeyExclusive []byte) (*FSTIterator, error)

Iterator returns a new Iterator capable of enumerating the key/value pairs between the provided startKeyInclusive and endKeyExclusive.

func (*FST) Len Uses

func (f *FST) Len() int

Len returns the number of entries in this FST instance.

func (*FST) Reader Uses

func (f *FST) Reader() (*Reader, error)

Reader() returns a Reader instance that a single thread may use to retrieve data from the FST

func (*FST) Search Uses

func (f *FST) Search(aut Automaton, startKeyInclusive, endKeyExclusive []byte) (*FSTIterator, error)

Search returns a new Iterator capable of enumerating the key/value pairs between the provided startKeyInclusive and endKeyExclusive that also satisfy the provided automaton.

func (*FST) Start Uses

func (f *FST) Start() int

Start returns the start state of this Automaton

func (*FST) Type Uses

func (f *FST) Type() int

Type returns the type of this FST instance.

func (*FST) Version Uses

func (f *FST) Version() int

Version returns the encoding version used by this FST instance.

func (*FST) WillAlwaysMatch Uses

func (f *FST) WillAlwaysMatch(int) bool

WillAlwaysMatch returns if from this state the Automaton will always be in a matching state

type FSTIterator Uses

type FSTIterator struct {
    // contains filtered or unexported fields
}

FSTIterator is a structure for iterating key/value pairs in this FST in lexicographic order. Iterators should be constructed with the FSTIterator method on the parent FST structure.

func (*FSTIterator) Close Uses

func (i *FSTIterator) Close() error

Close will free any resources held by this iterator.

func (*FSTIterator) Current Uses

func (i *FSTIterator) Current() ([]byte, uint64)

Current returns the key and value currently pointed to by the iterator. If the iterator is not pointing at a valid value (because Iterator/Next/Seek) returned an error previously, it may return nil,0.

func (*FSTIterator) Next Uses

func (i *FSTIterator) Next() error

Next advances this iterator to the next key/value pair. If there is none or the advancement goes beyond the configured endKeyExclusive, then ErrIteratorDone is returned.

func (*FSTIterator) Reset Uses

func (i *FSTIterator) Reset(f *FST,
    startKeyInclusive, endKeyExclusive []byte, aut Automaton) error

Reset resets the Iterator' internal state to allow for iterator reuse (e.g. pooling).

func (*FSTIterator) Seek Uses

func (i *FSTIterator) Seek(key []byte) error

Seek advances this iterator to the specified key/value pair. If this key is not in the FST, Current() will return the next largest key. If this seek operation would go past the last key, or outside the configured startKeyInclusive/endKeyExclusive then ErrIteratorDone is returned.

type Iterator Uses

type Iterator interface {

    // Current() returns the key/value pair currently pointed to.
    // The []byte of the key is ONLY guaranteed to be valid until
    // another call to Next/Seek/Close.  If you need it beyond that
    // point you MUST make a copy.
    Current() ([]byte, uint64)

    // Next() advances the iterator to the next key/value pair.
    // If no more key/value pairs exist, ErrIteratorDone is returned.
    Next() error

    // Seek() advances the iterator the specified key, or the next key
    // if it does not exist.
    // If no keys exist after that point, ErrIteratorDone is returned.
    Seek(key []byte) error

    // Reset resets the Iterator' internal state to allow for iterator
    // reuse (e.g. pooling).
    Reset(f *FST, startKeyInclusive, endKeyExclusive []byte, aut Automaton) error

    // Close() frees any resources held by this iterator.
    Close() error
}

Iterator represents a means of visiting key/value pairs in order.

type MergeFunc Uses

type MergeFunc func([]uint64) uint64

MergeFunc is used to choose the new value for a key when merging a slice of iterators, and the same key is observed with multiple values. Values presented to the MergeFunc will be in the same order as the original slice creating the MergeIterator. This allows some MergeFunc implementations to prioritize one iterator over another.

type MergeIterator Uses

type MergeIterator struct {
    // contains filtered or unexported fields
}

MergeIterator implements the Iterator interface by traversing a slice of iterators and merging the contents of them. If the same key exists in mulitipe underlying iterators, a user-provided MergeFunc will be invoked to choose the new value.

func NewMergeIterator Uses

func NewMergeIterator(itrs []Iterator, f MergeFunc) (*MergeIterator, error)

NewMergeIterator creates a new MergeIterator over the provided slice of Iterators and with the specified MergeFunc to resolve duplicate keys.

func (*MergeIterator) Close Uses

func (m *MergeIterator) Close() error

Close will attempt to close all the underlying Iterators. If any errors are encountered, the first will be returned.

func (*MergeIterator) Current Uses

func (m *MergeIterator) Current() ([]byte, uint64)

Current returns the key and value currently pointed to by this iterator. If the iterator is not pointing at a valid value (because Iterator/Next/Seek) returned an error previously, it may return nil,0.

func (*MergeIterator) Next Uses

func (m *MergeIterator) Next() error

Next advances this iterator to the next key/value pair. If there is none, then ErrIteratorDone is returned.

func (*MergeIterator) Seek Uses

func (m *MergeIterator) Seek(key []byte) error

Seek advances this iterator to the specified key/value pair. If this key is not in the FST, Current() will return the next largest key. If this seek operation would go past the last key, then ErrIteratorDone is returned.

type Reader Uses

type Reader struct {
    // contains filtered or unexported fields
}

A Reader is meant for a single threaded use

func (*Reader) Get Uses

func (r *Reader) Get(input []byte) (uint64, bool, error)

type Transducer Uses

type Transducer interface {

    // all transducers are also automatons
    Automaton

    // IsMatchWithValue returns true if and only if the state is a match
    // additionally it returns a states final value (if any)
    IsMatchWithVal(int) (bool, uint64)

    // Accept returns the next state given the input to the specified state
    // additionally it returns the value associated with the transition
    AcceptWithVal(int, byte) (int, uint64)
}

Transducer represents the general contract of a byte-based finite transducer

Directories

PathSynopsis
levenshtein
regexp
utf8

Package vellum imports 10 packages (graph) and is imported by 19 packages. Updated 2019-09-02. Refresh now. Tools for package owners.