gocask

package module
v0.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 17, 2022 License: MIT Imports: 5 Imported by: 0

README

GoCask

Go Test Coverage Status Go Report Card Go Reference

Go implementation of Bitcask - A Log-Structured Hash Table for Fast Key / Value Data as defined per this paper and with help from this repo.

A learning venture into database development. Special thanks go to the amazing Ben Johnson for pointing me in the right direction and being as helpful as he was.

Features (as defined by the paper+)

  • Low latency per item read or written
  • High throughput, especially when writing an incoming stream of random items
  • Ability to handle datasets much larger than RAM w/o degradation
  • Crash friendliness, both in terms of fast recovery and not losing data
  • Ease of backup and restore
  • A relatively simple, understandable (and thus supportable) code structure and data format
  • Predictable behavior under heavy access load or large volume
  • Data files are rotated based on the user defined data file size (2GB default)
  • A license that allowed for easy use
  • Data corruption crc check

Important notes

  • GoCask does not implement any buffer cache in-memory. Instead, it depends on the filesystem’s cache. Adjusting the caching characteristics of your filesystem can impact performance.
  • GoCask stores all keys in memory which means that your system needs to have enough RAM to store all of your keyspace

How to Use/Run

There are two ways to use gocask

Using gocask as a library (embedded db) in your own app

GoCask can be used similarly to bolt or badger as an embedded db.

go get github.com/aneshas/gocask/cmd/gocask and use the api. See the docs

Running as a standalone process

If you have go installed:

  • go install github.com/aneshas/gocask/cmd/gocask@latest
  • go install github.com/aneshas/gocask/cmd/gccli@latest
Run db server

Then run gocask which will run the db engine itself, open default db and start grpc (twirp) server on localhost:8888 (Run gocask -help to see config options and the defaults)

Interact with server via cli

While the server is running you can interact with it via gccli binary:

  • gccli keys - list stored keys
  • gccli put somekey someval - stores the key value pair
  • gccli get somekey - retrieves the value stored under the key
  • gccli del somekey - deletes the value stored under the key

gccli is just meant as a simple probing tool, and you can generate your own client you can use the .proto definition included (or use the pre generated go client.

If you don't have go installed, you can go to releases download latest release and go through the same process as above.

Still to come

Since the primary motivation for this repo was learning more about how db engines work and although it could already be used, it's far from production ready. With that being said, I do plan to maintain and extend it in the future.

Some things that are on my mind:

  • Support for multiple processes and write locking
  • Current key deletion is a soft delete (implement merging and hint files)
  • Fold over keys
  • Double down on tests (fuzz?)
  • Add benchmarks
  • Make it distributed
  • An eventstore spin off (use gocask instead of sqlite)

Documentation

Index

Constants

View Source
const (
	// KB represents base2 kilobyte
	KB int64 = 1024

	// MB represents base2 megabyte
	MB = KB * 1024

	// GB represents base2 gigabyte
	GB = MB * 1024

	// TB represents base2 terabyte
	TB = GB * 1024
)

Variables

This section is empty.

Functions

This section is empty.

Types

type DB

type DB struct {
	*cask.DB
}

DB represents gocask A Log-Structured Hash Table for Fast Key/Value Data Based on https://riak.com/assets/bitcask-intro.pdf

func Open

func Open(dbPath string, opts ...Option) (*DB, error)

Open opens an existing database at dbPath or creates a new one The database location can be configured with config options and the default is ~/gcdata Magic in:mem:db value for dbPath can be used in order to instantiate an in memory file system which can be used for testing purposes

type Option

type Option func(config cask.Config) cask.Config

Option represents gocask configuration option

func WithDataDir

func WithDataDir(path string) Option

WithDataDir configures the location of the data dir where your databases will reside

func WithMaxDataFileSize

func WithMaxDataFileSize(bytes int64) Option

WithMaxDataFileSize configures maximum data file size after which data files will be rotated

Directories

Path Synopsis
cmd
internal
crc
fs
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL