infreqdb

package module
v0.0.0-...-61ce4a1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2018 License: MIT Imports: 14 Imported by: 1

README

GoDoc Build Status Go Report Card

infreqdb

S3 backed key/value database for infrequent read access

Access to hot data might be very frequent, but access to majority of data is rare.

Use-Cases

infreqdb might be useful if :-

  1. Your database is quite large.
  2. You mostly do bulk updates.
  3. Most of the data is cold, i.e. only a small subset of data is typically queried.
  4. You are able to partition your data in a way such that hot and cold objects live on different partitions.
  5. Your data can fit in key/value model.
  6. You don't mind occasional slow responses.
  7. You can tolerate eventual consistency.

Architecture

The source of truth of all data is a bucket in S3. The data is split into multiple partitions. Each partition is a Bolt database file. infreqdb caches partitions on disk. Changes to a partition is done by re-writing and uploading an entire partition. The partitions are stored gzipped.

Motivation

I have a PostgreSQL database(mostly time series) thats consuming about 500GB (and growing) of storage. The data is output of batch processing scripts, which process an hour worth of data each time and merge it in the database. The queries are mostly for fresh data.

500GB of might take 1500 GB disk storage - 2 replicas for HA and 250GB extra per replica to accommodate growth. Whereas the same data compressed might be 300GB (I haven't done an export yet) on S3 is 30GB on S3. S3 is already replicated.

For comparison.

  • 1500 GB EBS(gp2) costs $150/month
  • 30 GB on S3 costs $0.69/month + extra for requests, no bandwidth charges if running in EC2.

There are additional charges for requests when using S3. If the data/partition structure is not optimized, it can end up costing a lot.

$0.005 per 1,000 requests for PUT, COPY, POST, or LIST Requests $0.004 per 10,000 requests for GET and all other Requests

In my use-case, I expect to do a maximum of 100 PUTs per hour = 100 * 24 * 31 = 74400/month costing $0.372/month

GET/HEAD should cost even less, depends on cache HIT ratio I can achieve.

infreqdb makes use of :-

  1. PUT to store partitions.
  2. GET to fetch partitions.
  3. HEAD to check if the cached partition is the latest one. It does a HEAD request for each mutable partition.

Usage

infreqdb is a library, not a database server.

Example: toyexample.

Ideas

  1. Make storage pluggable.
  2. Make cluster that can gossip evictions, take ownership of a portion of data.
  3. Allow cached partitions to persist across restarts.

Disclaimer

I have not yet used infreqdb for anything large, just the toyexample

Backwards compatibility is not guaranteed. I am making changes to the API as I start using this library for real-world application.

Once I settle on few things, I will make the object store pluggable to allow user-implementation of any object store they wish to use.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	//ErrKeyNotString when partition key is not a valid string.
	ErrKeyNotString = errors.New("Key must be a string")
	//ErrInvalidObject occurs when object in cache is invalid.
	ErrInvalidObject = errors.New("Returned object is incorrect type")
)

Functions

func IsNotFound

func IsNotFound(err error) bool

IsNotFound reflects on error and determines if its a real failure or not-found types

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB is an instance of InfreqDB

func New

func New(bucket *s3.Bucket, prefix string, len int) (*DB, error)

New creates a new InfreqDB instance len is number of partitions to hold on disk.. use wisely... Better to use NewWithStorage() instead. New() will remain for backwards compatibility

func NewWithStorage

func NewWithStorage(storage Storage, len int) (*DB, error)

NewWithStorage creates new DB with user provided storage

func (*DB) CheckExpiry

func (db *DB) CheckExpiry() int

CheckExpiry expires items that have changed upstream Maybe unexport it and launch as loop

func (*DB) Close

func (db *DB) Close()

Close closes the db and deletes all local database fragments

func (*DB) Expire

func (db *DB) Expire(partid string)

Expire evicts the partition from disk

func (*DB) Get

func (db *DB) Get(partid string, bucket, key []byte) ([]byte, error)

Get gets single key from db

func (*DB) SetPart

func (db *DB) SetPart(partid, fname string, mutable bool) error

SetPart uploads the partition to S3 and expires local cache fname is the path to an uncompressed boltdb file Cache for this partition is invalidated. If running on a cluster you need to propagate this and Expire(partid) somehow. Set mutable to true in case you expect changes to this partition

func (*DB) View

func (db *DB) View(partid string, fn func(*bolt.Tx) error) (bool, error)

View inside individual bolt db See https://godoc.org/github.com/boltdb/bolt#DB.View for more info Second return argument indicates if the partition is mutable. Helpful hint for downstream caching.

type S3Storage

type S3Storage struct {
	// contains filtered or unexported fields
}

S3Storage implements interface to access AWS S3. Uses gzip for compression

func NewS3Storage

func NewS3Storage(bucket *s3.Bucket, prefix string) *S3Storage

NewS3Storage creates new storage that talks to aws S3

func (*S3Storage) Get

func (s3s *S3Storage) Get(part string) (fname string, found, mutable bool, lastmod time.Time, err error)

Get a partition file from S3 store into local file, suppress not found error

func (*S3Storage) GetLastMod

func (s3s *S3Storage) GetLastMod(part string) time.Time

GetLastMod gets last modification time for a partition Return ancient time on failure

func (*S3Storage) Put

func (s3s *S3Storage) Put(part, fname string, mutable bool) error

Put uploads a partition to s3

type Storage

type Storage interface {
	//Get retrieves a partition file from object store
	Get(part string) (fname string, found, mutable bool, lastmod time.Time, err error)
	//Put stores partition into object store
	Put(part, fname string, mutable bool) error
	//GetLastMod gets the last modified time for a partition.
	GetLastMod(part string) time.Time
}

Storage allows various operations against an object store. Use any object/file store. The Storage is responsible for [un]compression. This might be a good place to hook in some sort of upstream cache layer.

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL