database

package

v0.1.3 Latest Latest Go to latest Published: Sep 10, 2021 License: Apache-2.0 Imports: 23 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/kokes/smda

Links

Open Source Insights

Documentation ¶

Index ¶

func CacheIncomingFile(r io.Reader, path string) error
type Config
type Database
- func NewDatabase(wdir string, overrides *Config) (*Database, error)
type Dataset
- func NewDataset(name string) *Dataset
type ObjectType
type RowReader
- func NewRowReader(r io.Reader, settings *loadSettings) (RowReader, error)
type Stripe
type StripeReader
- func NewStripeReader(db *Database, ds *Dataset, stripe Stripe) (*StripeReader, error)
- func (sr *StripeReader) Close() error
- func (sr *StripeReader) ReadColumn(nthColumn int) (column.Chunk, error)
type UID
- func UIDFromHex(data []byte) (UID, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CacheIncomingFile ¶

func CacheIncomingFile(r io.Reader, path string) error

CacheIncomingFile saves data from a given reader to a file

Types ¶

type Config ¶

type Config struct {
	WorkingDirectory  string `json:"-"` // not exposing this in our json representation as the db can be moved around
	CreatedTimestamp  int64  `json:"created_timestamp"`
	DatabaseID        UID    `json:"database_id"`
	MaxRowsPerStripe  int    `json:"max_rows_per_stripe"`
	MaxBytesPerStripe int    `json:"max_bytes_per_stripe"`
}

Config sets some high level properties for a new Database. It's useful for testing or for passing settings based on cli flags.

type Database ¶

type Database struct {
	sync.Mutex
	Datasets    []*Dataset
	ServerHTTP  *http.Server
	ServerHTTPS *http.Server
	Config      *Config
}

Database is the main struct that contains it all - notably the datasets' metadata and the webserver Having the webserver here makes it convenient for testing - we can spawn new servers at a moment's notice

func NewDatabase ¶

func NewDatabase(wdir string, overrides *Config) (*Database, error)

NewDatabase initiates a new database object and binds it to a given directory. If the directory doesn't exist, it creates it. If it exists, it loads the data contained within.

func (*Database) AddDataset ¶

func (db *Database) AddDataset(ds *Dataset) error

AddDataset adds a Dataset to a Database this is a pretty rare event, so we don't expect much contention it's just to avoid some issues when marshaling the object around in the API etc.

func (*Database) DatasetPath ¶

func (db *Database) DatasetPath(ds *Dataset) string

DatasetPath returns the path of a given dataset (all the stripes are there) ARCH: consider merging this with dataPath based on a nullable dataset argument (like manifestPath)

func (*Database) Drop ¶

func (db *Database) Drop() error

Drop deletes all local data for a given Database

func (*Database) GetDataset ¶

func (db *Database) GetDataset(name, version string, latest bool) (*Dataset, error)

func (*Database) GetDatasetByVersion ¶ added in v0.1.3

func (db *Database) GetDatasetByVersion(name, version string) (*Dataset, error)

GetDataset retrieves a dataset based on its UID OPTIM: not efficient in this implementation, but we don't have a map-like structure to store our datasets - we keep them in a slice, so that we have predictable order -> we need a sorted map

func (*Database) GetDatasetLatest ¶ added in v0.1.3

func (db *Database) GetDatasetLatest(name string) (*Dataset, error)

func (*Database) LoadDatasetFromMap ¶

func (db *Database) LoadDatasetFromMap(name string, data map[string][]string) (*Dataset, error)

LoadDatasetFromMap allows for an easy setup of a new dataset, mostly useful for tests Converts this map into an in-memory CSV file and passes it to our usual routines OPTIM: the underlying call (LoadDatasetFromReaderAuto) caches this raw data on disk, may be unecessary

func (*Database) LoadDatasetFromReaderAuto ¶

func (db *Database) LoadDatasetFromReaderAuto(name string, r io.Reader) (*Dataset, error)

LoadDatasetFromReaderAuto loads data from a reader and returns a Dataset

func (*Database) LoadSampleData ¶

func (db *Database) LoadSampleData(sampleDir fs.FS) error

LoadSampleData reads all CSVs from a given directory and loads them up into the database using default settings

func (*Database) ReadColumnsFromStripeByNames ¶

func (db *Database) ReadColumnsFromStripeByNames(ds *Dataset, stripe Stripe, columns []string) (map[string]column.Chunk, int, error)

OPTIM: perhaps reorder the column requests, so that they are contiguous, or at least in order

also add a benchmark that reads columns in reverse and see if we get any benefits from this

type Dataset ¶

type Dataset struct {
	ID   UID    `json:"id"`
	Name string `json:"name"`
	// ARCH: move the next three to a a `Meta` struct?
	Created int64 `json:"created_timestamp"`
	NRows   int64 `json:"nrows"`
	// ARCH: note that we'd ideally get this as the uncompressed size... might be tricky to get
	SizeRaw    int64 `json:"size_raw"`
	SizeOnDisk int64 `json:"size_on_disk"`

	Schema column.TableSchema `json:"schema"`
	// TODO/OPTIM: we need the following for manifests, but it's unnecessary for writing in our
	// web requests - remove it from there
	Stripes []Stripe `json:"stripes"`
}

Dataset contains metadata for a given dataset, which at this point means a table

func NewDataset ¶

func NewDataset(name string) *Dataset

NewDataset creates a new empty dataset

type ObjectType ¶

type ObjectType uint8

ObjectType denotes what type an object is (or its ID) - dataset, stripe etc.

const (
	OtypeNone ObjectType = iota
	OtypeDatabase
	OtypeDataset
	OtypeStripe
)

object types are reflected in the UID - the first two hex characters define this object type, so it's clear what sort of object you're dealing with based on its prefix

type RowReader ¶

type RowReader interface {
	// ARCH: consider making it a ([]byte, error) as soon as we
	//       1) have csv.NewReader with []byte support
	// 		 2) have strconv.ParseXXX with []byte support (will come with generics?)
	ReadRow() ([]string, error)
}

func NewRowReader ¶

func NewRowReader(r io.Reader, settings *loadSettings) (RowReader, error)

NewRowReader creates a new RowReader based on loadSettings passed in - e.g. if there's a delimiter specified, it will likely create a csvReader etc.

type Stripe ¶

type Stripe struct {
	Id      UID      `json:"id"`
	Length  int      `json:"length"`
	Offsets []uint32 `json:"offsets"`
}

Stripe only contains metadata about a given stripe, it has to be loaded separately to obtain actual data

type StripeReader ¶

type StripeReader struct {
	// contains filtered or unexported fields
}

func NewStripeReader ¶

func NewStripeReader(db *Database, ds *Dataset, stripe Stripe) (*StripeReader, error)

OPTIM: pass in a bytes buffer to reuse it?

func (*StripeReader) Close ¶

func (sr *StripeReader) Close() error

func (*StripeReader) ReadColumn ¶

func (sr *StripeReader) ReadColumn(nthColumn int) (column.Chunk, error)

type UID ¶

type UID struct {
	Otype ObjectType
	// contains filtered or unexported fields
}

UID is a unique ID for a given object, it's NOT a uuid

func UIDFromHex ¶

func UIDFromHex(data []byte) (UID, error)

ARCH: test this instead the Unmarshal? Or both?

func (UID) MarshalJSON ¶

func (uid UID) MarshalJSON() ([]byte, error)

MarshalJSON satisfies the Marshaler interface, so that we can automatically marshal UIDs as JSON

func (UID) String ¶

func (uid UID) String() string

func (*UID) UnmarshalJSON ¶

func (uid *UID) UnmarshalJSON(data []byte) error

UnmarshalJSON satisfies the Unmarshaler interface (we need a pointer here, because we'll be writing to it)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL