bsearch

package module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2024 License: MIT Imports: 14 Imported by: 0

README

bsearch

bsearch is a go library providing binary search functionality for line-ordered byte streams (such as LC_ALL=C sorted text files). It allows very fast lookups based on line prefixes, like the venerable look(1) unix utility.

bsearch can also be used via a DB object with a simple db-like interface, allowing sorted CSV files to be used as a key-value store, with excellent performance.

bsearch currently only supports bytewise key comparisons (not UTF-8 collations). This restriction may be removed in the future.

Usage

    import "github.com/ProfoundNetworks/bsearch"

    // Create the index (also see cmd/bsearch_index for a CLI tool)
    idx, err := bsearch.NewIndex(filepath)

    // Alternatively, if you need to tweak the options
    idx, err := bsearch.NewIndexOptions(filepath, IndexOptions{Delimiter: '|'})

    // Persist the created index to disk, so that the searcher may use it
    err = idx.Write()

    // Instantiate searcher from a file
    bss, err := bsearch.NewSearcher(filepath)
    defer bss.Close()

    // Find first line beginning with searchStr
    line, err := bss.Line([]byte(searchStr))

    // Find position of first line beginning with searchStr
    pos, err := bss.LinePosition([]byte(searchStr))

    // Distinguish not found from other errors
    if err != nil && err == bsearch.ErrNotFound {
        // do something
    } else if err != nil {
        log.Fatal(err)
    }


    // Or via the `DB` interface
    db, err := bsearch.NewDB(filepath)
    defer db.Close()

    // Lookup values via byteslices or strings
    valbytes := db.Get(keybytes)
    valstr := db.GetString(keystring)

Status

bsearch is alpha software. Interfaces may break and change until an official version 1.0.0 is released.

Copyright 2020 Profound Networks LLC.

This project is licensed under the terms of the MIT license.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrIndexNotFound     = errors.New("index file not found")
	ErrIndexExpired      = errors.New("index file out of date")
	ErrIndexEmpty        = errors.New("index contains no entries")
	ErrIndexPathMismatch = errors.New("index file path mismatch")
)
View Source
var (
	ErrFileNotFound        = errors.New("filepath not found")
	ErrNotFile             = errors.New("filepath exists but is not a file")
	ErrFileCompressed      = errors.New("filepath exists but is compressed")
	ErrNotFound            = errors.New("key not found")
	ErrKeyExceedsBlocksize = errors.New("key length exceeds blocksize")
	ErrUnknownDelimiter    = errors.New("cannot guess delimiter from filename")
)

Functions

func IndexPath

func IndexPath(path string) (string, error)

IndexPath returns the absolute filepath of the index file assocated with path

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB provides a simple key-value-store-like interface using bsearch.Searcher, returning the first value from path for a given key (if you need more control you're encouraged to use bsearch.Searcher directly).

func NewDB

func NewDB(path string) (*DB, error)

NewDB returns a new DB for the file at path. The caller is responsible for calling DB.Close() when finished (e.g. via defer).

func (*DB) Close

func (db *DB) Close()

Close closes our Searcher's underlying reader (if applicable)

func (*DB) Get

func (db *DB) Get(key []byte) ([]byte, error)

Get returns the (first) value associated with key in db (or ErrNotFound if missing)

func (*DB) GetSlice added in v0.4.1

func (db *DB) GetSlice(key string) ([]string, error)

GetSlice returns the (first) value associated with key in db as a string slice (read using a csv.Reader with the appropriate Delimiter) (or returns ErrNotFound if missing)

func (*DB) GetString

func (db *DB) GetString(key string) (string, error)

GetString returns the (first) value associated with key in db, as a string (or ErrNotFound if missing)

type Index

type Index struct {
	Blocksize int
	// FIXME: Delimiter should really be a rune, not an arbitrarily-length []byte
	// Can we change without bumping the index version?
	Delimiter []byte
	Epoch     int64
	// Filepath is no longer exported (it is explicitly emptied in Write()),
	// but we keep it capitalised to accept old indices that used it
	// instead of Filename
	Filepath       string `json:",omitempty"`
	Filename       string
	Header         bool
	KeysIndexFirst bool
	KeysUnique     bool
	Length         int
	List           []IndexEntry `json:"-"`
	Version        int
	HeaderFields   []string `json:",omitempty"`
	// contains filtered or unexported fields
}

Index provides index metadata for the filepath dataset

func LoadIndex

func LoadIndex(path string) (*Index, error)

LoadIndex loads Index from the associated index file for path. Returns ErrIndexNotFound if no index file exists. Returns ErrIndexExpired if path is newer than the index file. Returns ErrIndexPathMismatch if index filepath does not equal path.

func NewIndex

func NewIndex(path string) (*Index, error)

NewIndex creates a new Index for the path dataset

func NewIndexOptions

func NewIndexOptions(path string, opt IndexOptions) (*Index, error)

NewIndexOptions creates a new Index for path with delim as the delimiter

func (*Index) Write

func (i *Index) Write() error

Write writes the index to disk

type IndexEntry

type IndexEntry struct {
	Key    string
	Offset int64 // file offset for start-of-block
}

type IndexOptions

type IndexOptions struct {
	Blocksize int
	Delimiter []byte
	Header    bool
	Logger    *zerolog.Logger // debug logger
}

type Searcher

type Searcher struct {
	Index *Index // bsearch index
	// contains filtered or unexported fields
}

Searcher provides binary search functionality on byte-ordered CSV-style delimited text files.

func NewSearcher

func NewSearcher(path string) (*Searcher, error)

NewSearcher returns a new Searcher for path using default options. The caller is responsible for calling *Searcher.Close() when finished.

func NewSearcherOptions

func NewSearcherOptions(path string, opt SearcherOptions) (*Searcher, error)

NewSearcherOptions returns a new Searcher for path using opt. The caller is responsible for calling *Searcher.Close() when finished.

func (*Searcher) Close

func (s *Searcher) Close()

Close closes the searcher's reader (if applicable)

func (*Searcher) Line

func (s *Searcher) Line(key []byte) ([]byte, error)

Line returns the first line in the reader that begins with key, using a binary search (data must be bytewise-ordered).

func (*Searcher) Lines

func (s *Searcher) Lines(b []byte) ([][]byte, error)

Lines returns all lines in the reader that begin with the byte slice b, using a binary search (data must be bytewise-ordered).

func (*Searcher) LinesN

func (s *Searcher) LinesN(key []byte, n int) ([][]byte, error)

LinesN returns the first n lines in the reader that begin with key, using a binary search (data must be bytewise-ordered).

type SearcherOptions

type SearcherOptions struct {
	MatchLE bool            // use less-than-or-equal-to match semantics
	Logger  *zerolog.Logger // debug logger
	// Index options (used to check index or build new one)
	Delimiter []byte // delimiter separating fields in dataset
	Header    bool   // first line of dataset is header and should be ignored
}

SearcherOptions struct for use with NewSearcherOptions

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL