disktable

package module
v0.0.0-...-423a4a7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 1, 2022 License: MIT Imports: 22 Imported by: 0

README

DiskTable

NOTE

This disk format is not stable yet, nor is the API. So unless you can afford to erase the data on a new build, don't use this.

If you want to use it as simply a diskcache that you create on startup, then it should be fine. But I make no guarantees.

Introduction

I needed a very basic NOSQL locally that I could embed in my Go program allowing me to serve off disk with some semblance of speed.

It needed to be:

  • Write once
  • Read many
  • Suppport a main data repo
  • Support a bunch of indexes on the data
  • Support duplicates in indexes
  • Lookup by indexes
  • Stream all data fast

These disktables are built off of outcaste.io/dgraph.

Now, this lacks SQL like characteristics. It is simply a bunch of key/value stores that let you do exact matches on indexes in order to find matching data. So I can say things like: "find cars that are blue with a v8 and made by chevy".

In the future I may allow things like searching by prefix and things like that. But those aren't in here today.

Documentation

Overview

Package disktable provides a write-once, read-many table with index supoprt. This is build on top of badgerDB, which is basically a key/value SSTable storage mechanism.

Let's create a table with some data:

dir := filepath.Join(os.TempDir(), "your_table"")
// Remove it if exists, may or may not want to do this. However you cannot
// create a table on a directory that exists.
os.RemoveAll(dir)

// These are our indexes on the data. AllowDuplicates allows duplicate entries
// in the index.
indexes := NewIndexes(
	&Index{Name: "First Name", AllowDuplicates: true},
	&Index{Name: "Last Name", AllowDuplicates: true},
	&Index{Name: "ID"},
}

w, err := New(dir, WithIndexes(indexes))
if err != nil {
	panic(err)
}

for _, data := range someData {
	b, err := proto.Marshal(data)
	if err != nil {
		panic(err)
	}

	insert := indexes.Insert(b).AddIndexKey(
		"First Name", UnsafeGetBytes(data.First),
	).AddIndexKey(
		"Last Name", UnsafeGetBytes(data.Last),
	).AddIndexKey(
		"ID", NumToByte(data.ID),
	)

	if err = w.WriteData(insert); err != nil {
		panic(err)
	}
}

if err := w.Close(); err != nil {
	panic(err)
}

Now let's open it and stream all records:

table, err := Open(dir)
if err != nil {
	panic(err)
}

results, err := table.FetchAll(ctx)
if err != nil {
	panic(err)
}

for result := range results {
	if result.Err != nil {
		panic(err)
	}

	entry := &pb.MyData{}
	if err := proto.Unmarshal(entry, result.Value); err != nil {
		panic(err)
	}

	fmt.Println("found: ", pretty.Sprint(entry))
}

Let's look for all entries that have the first name John:

results, err := table.Fetch(
	ctx,
	Lookup{IndexName: "First Name", Key: UnsafeGetBytes("John")},
)

if err != nil {
	panic(err)
}

for result := range results {
	if result.Err != nil {
		panic(err)
	}

	entry := &pb.MyData{}
	if err := proto.Unmarshal(entry, result.Value); err != nil {
		panic(err)
	}

	fmt.Println("found: ", pretty.Sprint(entry))
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ByteSlice2String

func ByteSlice2String(bs []byte) string

ByteSlice2String coverts a []byte to a string without incurring the cost of a copy of the given []byte parameter. This is an unsafe operation and requires that you never modify the []byte slice you passed in.

func ByteToNum

func ByteToNum[N Number](b []byte) (N, error)

ByteToNum returns a number stored in b that represents N. That number should be encoded in BigEndian, usually by NumToByte().

func NumStreamGoroutines

func NumStreamGoroutines(n int) interface {
	FetchAllOption
	calloptions.CallOption
}

NumStreamGoroutines sets the number of goroutines to be used in FetchAll(). By default this is 16.

func NumToByte

func NumToByte[N Number](n N) []byte

NumToByte converts a number into a BigEndian []byte sequence.

func UnsafeGetBytes

func UnsafeGetBytes(s string) []byte

UnsafeGetBytes retrieves the underlying []byte held in string "s" without doing a copy. Do not modify the []byte or suffer the consequences.

func WithInMemory

func WithInMemory() interface {
	WriteOption
	calloptions.CallOption
}

WithInMemory causes the DB to run from memory with no disk persistence. Great for tests. Can be used with:

  • New()

func WithIndexes

func WithIndexes(indexes Indexes) interface {
	WriteOption
	calloptions.CallOption
}

Indexes provide the indexes that will be used on this database. Can be used with:

  • New()

func WithLogger

func WithLogger(l badger.Logger) interface {
	WriteOption
	OpenOption
	calloptions.CallOption
}

WithLogger sets the logger for badger. By default this is goes to null. Can be used in:

  • New()
  • Open()

Types

type FetchAllOption

type FetchAllOption interface {
	// contains filtered or unexported methods
}

type Index

type Index struct {
	// Name of the index. This must be unique.
	Name string
	// AllowDuplicates indicates if this index allows duplicate keys for the index.
	AllowDuplicates bool
	// contains filtered or unexported fields
}

Index represents an index on our databse.

type Indexes

type Indexes struct {
	Err error
	// contains filtered or unexported fields
}

func NewIndexes

func NewIndexes(indexes ...*Index) Indexes

func (Indexes) Insert

func (i Indexes) Insert(value []byte) Insert

Insert creates an Insert type that can be used to write data to the database. See Insert for more information.

type Insert

type Insert struct {
	Err error
	// contains filtered or unexported fields
}

Insert represents a data insert into the table and is created from Indexes. You must use Insert.AddIndexKey() to all all index keys defined in Indexes.

func NewInsert

func NewInsert(value []byte) Insert

NewInsert creates a new Insert for writing into the table. This is only used when there are no indexes defined on the table. Otherwise you must uses Indexes.Insert().

func (Insert) AddIndexKey

func (i Insert) AddIndexKey(indexName string, key []byte) Insert

AddIndexKey adds a key for a given index. You must capture the returned Insert as AddIndexKey() does not have a pointer receiver.

type Lookup

type Lookup struct {
	// IndexName is the name of the index to do the lookup in.
	IndexName string
	// Key is the key in the index to lookup.
	Key []byte
}

Lookup provides the Index name and the Value that needs to match for the entry to be returned.

type Number

type Number interface {
	~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 |
		~int | ~int8 | ~int16 | ~int32 | ~int64 |
		~float32 | ~float64
}

Number represents any uint*, int* or float* type.

type OpenOption

type OpenOption interface {
	// contains filtered or unexported methods
}

OpenOption is optional arguments for Open().

type Result

type Result struct {
	Value []byte
	Err   error
}

Result is the result of a table lookup.

type Table

type Table struct {
	// contains filtered or unexported fields
}

Table represents our read-only table.

func Open

func Open(pathDir string, options ...OpenOption) (*Table, error)

Open opens an existing disktable for reading.

func (*Table) Close

func (t *Table) Close() error

Close closes all the databases.

func (*Table) Fetch

func (t *Table) Fetch(ctx context.Context, primary Lookup, secondaries ...Lookup) (chan Result, error)

Fetch retrieves specifc rows that match all index lookups. You cannot currently specify multiple searches in the same index. If you wish to fetch all rows, use FetchAll(). Here is an example:

results, err := table.Fetch(
	ctx,
	Lookup{IndexName: "First Name", Key: UnsafeGetBytes("John")},
)

if err != nil {
	panic(err)
}

for result := range results {
	if result.Err != nil {
		panic(err)
	}

	entry := &pb.MyData{}
	if err := proto.Unmarshal(entry, result.Value); err != nil {
		panic(err)
	}

	fmt.Println("found: ", pretty.Sprint(entry))
}

func (*Table) FetchAll

func (t *Table) FetchAll(ctx context.Context, options ...FetchAllOption) (chan Result, error)

FetchAll fetches all the tables entries.

func (*Table) Get

func (t *Table) Get(ctx context.Context, i uint64) ([]byte, error)

Get gets the i'th entry stored in the table.

func (*Table) Len

func (t *Table) Len() uint64

Len() returns the number of entries in the table.

type WriteOption

type WriteOption interface {
	// contains filtered or unexported methods
}

WriteOption is optional arguments for New().

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer represents our disk database.

func New

func New(dirPath string, options ...WriteOption) (*Writer, error)

New creates a new instance of our table store. "dirPath" is the path to a directory that will be created. This must not already exist.

func (*Writer) Close

func (d *Writer) Close() error

Close closes out the Writer.

func (*Writer) Flatten

func (d *Writer) Flatten()

Flatten flattens the LSM tree.

func (*Writer) GC

func (d *Writer) GC(discardRatio float64)

GC does garbage collection on the value log. If interested in everything it does, check out badger.DB.RunValueLogGC(). A value of 0 sets to 0.5 .

func (*Writer) WriteData

func (d *Writer) WriteData(insert Insert) error

Write data writes data to our database. indexes must be in the same order when you created this DB and have the same number of indexes. You cannot reuse any "value" or "indexValues" passed until all data has been written. This is because a single WriteData() does not cause data to be written.

Directories

Path Synopsis
testing

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL