microblob

package module
v0.2.19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 23, 2023 License: GPL-3.0 Imports: 22 Imported by: 0

README

microblob

microblob is a simplistic key-value store, that serves JSON documents from a file over HTTP. It is implemented in a few hundred lines of code and does not contain many features.

Warning: This server SHOULD NEVER BE EXPOSED PUBLICLY as it contains no security, rate-limiting or other safety measures whatsoever.

microblob was written in 2017 as an ad-hoc solution to replace a previous setup using memcachedb (which was getting slow). The main goal has been to serve about 200M JSON documents from a "persistent key-value store" over HTTP and to support frequent, fast rebuilds; with limited disk space and potentially limited memory. Code lacks tests and I would write it differently today. However, it ran without issues and happily served up to 400 requests/s with limited resources and with average response times of around 1ms.

DOI

This project has been developed for Project finc at Leipzig University Library.

$ cat file.ldj
{"id": "some-id-1", "name": "alice"}
{"id": "some-id-2", "name": "bob"}

$ microblob -key id file.ldj
INFO[0000] creating db fixtures/file.ldj.832a9151.db ...
INFO[0000] listening at http://127.0.0.1:8820 (fixtures/file.ldj.832a9151.db)

It supports fast rebuilds from scratch, as the preferred way to deploy this is for a build-once update-never use case. It scales up and down with memory and can serve hundred million documents and more.

Inspiration: So what's wrong with 1975 programming? Idea: Instead of implementing complicated caching mechanisms, we hand over caching completely to the operating system and try to stay out of its way.

Inserts are fast, since no data is actually moved. 150 million (1kB) documents can be serveable within an hour.

  • ㊗️ 2017-06-30 first 100 million requests served in production

Further documentation: docs/microblob.md

Update via curl

To send compressed data with curl:

$ curl -v --data-binary @- localhost:8820/update?key=id < <(gunzip -c fixtures/fake.ldj.gz)
...

Usage

Usage of microblob:
  -addr string
        address to serve (default "127.0.0.1:8820")
  -backend string
        backend to use: leveldb, debug (default "leveldb")
  -batch int
        number of lines in a batch (default 50000)
  -c string
        load options from a config (ini) file
  -create-db-only
        build the database only, then exit
  -db string
        the root directory, by default: 1000.ldj -> 1000.ldj.05028f38.db (based on flags)
  -ignore-missing-keys
        ignore record, that do not have a the specified key
  -key string
        key to extract, json, top-level only
  -log string
        access log file, don't log if empty
  -r string
        regular expression to use as key extractor
  -s string
        the config file section to use (default "main")
  -t    top level key extractor
  -version
        show version and exit

What it doesn't do

  • no deletions (microblob is currently append-only and does not care about garbage, so if you add more and more things, you will run out of space)
  • no compression (yet)
  • no security (anyone can query or update via HTTP)

Installation

Debian and RPM packages: see releases.

Or:

$ go install github.com/miku/microblob/cmd/microblob@latest

Documentation

Overview

Package microblob implements a thin layer above LevelDB to implement a key-value store.

Index

Constants

View Source
const Version = "0.2.14"

Version of application.

Variables

View Source
var ErrInvalidValue = errors.New("invalid entry")

ErrInvalidValue if a value is corrupted.

Functions

func Append added in v0.1.17

func Append(blobfn, fn string, backend Backend, kf KeyFunc) error

Append add a file to an existing blob file and adds their keys to the store.

func AppendBatchSize added in v0.1.17

func AppendBatchSize(blobfn, fn string, backend Backend, kf KeyFunc, size int, ignoreMissingKeys bool) (err error)

AppendBatchSize uses a given batch size.

func IsAllZero added in v0.1.17

func IsAllZero(p []byte) bool

IsAllZero returns true, if all bytes in a slice are zero.

func NewHandler added in v0.2.0

func NewHandler(backend Backend, blobfile string) http.Handler

NewHandler sets up routes for serving and stats.

func WithLastResponseTime added in v0.1.8

func WithLastResponseTime(h http.Handler) http.Handler

WithLastResponseTime keeps track of the last response time in exported variable lastResponseTime.

Types

type Backend

type Backend interface {
	Get(key string) ([]byte, error)
	WriteEntries(entries []Entry) error
	Close() error
}

Backend abstracts various implementations.

type BlobHandler

type BlobHandler struct {
	Backend Backend
}

BlobHandler serves blobs.

func (*BlobHandler) ServeHTTP

func (h *BlobHandler) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHTTP serves HTTP.

type Counter added in v0.1.20

type Counter interface {
	Count() (int64, error)
}

Counter can return the number of elements.

type DebugBackend

type DebugBackend struct {
	Writer io.Writer
}

DebugBackend just writes the key, value and offsets to a given writer.

func (DebugBackend) Close

func (b DebugBackend) Close() error

Close is a noop.

func (DebugBackend) Get

func (b DebugBackend) Get(key string) ([]byte, error)

Get is a noop, always return nothing.

func (DebugBackend) WriteEntries

func (b DebugBackend) WriteEntries(entries []Entry) error

WriteEntries write entries as TSV to the given writer.

type Entry

type Entry struct {
	Key    string `json:"k"`
	Offset int64  `json:"o"`
	Length int64  `json:"l"`
}

Entry associates a string key with a section in a file specified by offset and length.

type EntryWriter

type EntryWriter func(entries []Entry) error

EntryWriter writes entries to some storage, e.g. a file or a database.

type KeyExtractor

type KeyExtractor interface {
	ExtractKey([]byte) (string, error)
}

KeyExtractor extracts a string key from data.

type KeyFunc

type KeyFunc func([]byte) (string, error)

KeyFunc extracts a key from a blob.

type LevelDBBackend

type LevelDBBackend struct {
	Blobfile string

	Filename string

	AllowEmptyValues bool
	// contains filtered or unexported fields
}

LevelDBBackend writes entries into LevelDB.

func (*LevelDBBackend) Close

func (b *LevelDBBackend) Close() error

Close closes database handle and blob file.

func (*LevelDBBackend) Count added in v0.1.20

func (b *LevelDBBackend) Count() (n int64, err error)

Count returns the number of documents added. LevelDB says: There is no way to implement Count more efficiently inside leveldb than outside.

func (*LevelDBBackend) Get

func (b *LevelDBBackend) Get(key string) (data []byte, err error)

Get retrieves the data for a given key, using pread(2).

func (*LevelDBBackend) WriteEntries

func (b *LevelDBBackend) WriteEntries(entries []Entry) error

WriteEntries writes entries as batch into LevelDB. The value is fixed 16 byte slice, first 8 bytes represents the offset, last 8 bytes the length. https://play.golang.org/p/xwX8BmWtVl

type LineProcessor

type LineProcessor struct {
	BatchSize         int   // number of lines in a batch
	InitialOffset     int64 // allow offsets beside zero
	Verbose           bool
	IgnoreMissingKeys bool // skip document with missing keys
	// contains filtered or unexported fields
}

LineProcessor reads a line, extracts the key and writes entries.

func NewLineProcessor

func NewLineProcessor(r io.Reader, w EntryWriter, f KeyFunc) LineProcessor

NewLineProcessor reads lines from the given reader, extracts the key with the given key function and writes entries to the given entry writer.

func NewLineProcessorBatchSize

func NewLineProcessorBatchSize(r io.Reader, w EntryWriter, f KeyFunc, size int) LineProcessor

NewLineProcessorBatchSize reads lines from the given reader, extracts the key with the given key function and writes entries to the given entry writer. Additionally, the number of lines per batch can be specified.

func (LineProcessor) RunWithWorkers

func (p LineProcessor) RunWithWorkers() error

RunWithWorkers start processing the input, uses multiple workers.

type ParsingExtractor

type ParsingExtractor struct {
	Key string
}

ParsingExtractor actually parses the JSON and extracts a top-level key at the given path. This is slower than for example regular expressions, but not too much.

func (ParsingExtractor) ExtractKey

func (e ParsingExtractor) ExtractKey(b []byte) (s string, err error)

ExtractKey extracts the key. Fails, if key cannot be found in the document.

type RegexpExtractor

type RegexpExtractor struct {
	Pattern *regexp.Regexp
}

RegexpExtractor extract a key via regular expression.

func (RegexpExtractor) ExtractKey

func (e RegexpExtractor) ExtractKey(b []byte) (string, error)

ExtractKey returns the key found in a byte slice. Never fails, just might return unexpected values.

type ToplevelKeyExtractor added in v0.2.7

type ToplevelKeyExtractor struct{}

ToplevelKeyExtractor parses a JSON object, where the actual object is nested under a top level key, e.g. {"mykey1": {"name": "alice"}}.

func (ToplevelKeyExtractor) ExtractKey added in v0.2.7

func (e ToplevelKeyExtractor) ExtractKey(b []byte) (s string, err error)

type UpdateHandler added in v0.1.17

type UpdateHandler struct {
	Blobfile string
	Backend  Backend
}

UpdateHandler adds more data to the blob server.

func (UpdateHandler) ServeHTTP added in v0.1.17

func (u UpdateHandler) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHTTP appends data from POST body to existing blob file.

Directories

Path Synopsis
cmd
microblob
Exectable for microblob, can read options from flags or an ini file:
Exectable for microblob, can read options from flags or an ini file:

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL