olricdb

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2018 License: Apache-2.0 Imports: 27 Imported by: 0

README

OlricDB

GoDoc Coverage Status Build Status Go Report Card License

Embeddable, in-memory and distributed key/value store for Go.

WIP

This project is a work in progress. The implementation is incomplete. The documentation may be inaccurate.

Table of Contents

Features

  • Designed to share some transient, approximate, fast-changing data between servers,
  • Accepts arbitrary types as value,
  • Only in-memory,
  • Embeddable,
  • Highly available,
  • Horizontally scalable,
  • Provides best-effort consistency guarantees without being a complete CP solution,
  • Distributes load fairly among cluster members with a consistent hash function,
  • Supports replication by default(with sync and async options),
  • Thread-safe by default,
  • Very simple package API,
  • Time-To-Live(TTL) eviction policy,
  • Offers an HTTP API with built-in Go client,
  • Provides a single-node lock implementation which can be used for non-critical purposes.

Installing

With a correctly configured Golang environment:

go get -u github.com/buraksezer/olricdb

Usage

OlricDB is designed to work efficiently with the minimum amount of configuration. So the default configuration should be enough for experimenting:

db, err := olricdb.New(nil)

This creates an OlricDB object without running any server at background. In order to run OlricDB, you need to call Start method.

err := db.Start()

When you call Start method, your process joins the cluster and will be responsible for some parts of the data. This call blocks indefinitely. So you may need to run it in a goroutine. Of course, this is just a single-node instance, because you didn't give any configuration.

Create a DMap object to access the cluster:

dm := db.NewDMap("my-dmap")

DMap object has Put, PutEx, Get, Delete, LockWithTimeout, Unlock and Destroy methods to access and modify data in OlricDB. We may add more methods for finer control but first, I'm willing to stabilize this set of features.

When you want to leave the cluster, just need to call Shutdown method:

err := db.Shutdown(context.Background())

This will stop background tasks, then call Shutdown methods of HTTP server and memberlist, respectively.

Put

Put sets the value for the given key. It overwrites any previous value for that key and it's thread-safe.

err := dm.Put("my-key", "my-value")

The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after Put returns but not before.

PutEx

Put sets the value for the given key with TTL. It overwrites any previous value for that key. It's thread-safe.

err := dm.PutEx("my-key", "my-value", time.Second)

The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after PutEx returns but not before.

Get

Get gets the value for the given key. It returns ErrKeyNotFound if the DB does not contains the key. It's thread-safe.

value, err := dm.Get("my-key")

It is safe to modify the contents of the returned value. It is safe to modify the contents of the argument after Get returns.

Delete

Delete deletes the value for the given key. Delete will not return error if key doesn't exist. It's thread-safe.

err := dm.Delete("my-key")

It is safe to modify the contents of the argument after Delete returns.

LockWithTimeout

LockWithTimeout sets a lock for the given key. If the lock is still unreleased the end of given period of time, it automatically releases the lock. Acquired lock is only for the key in this map. Please note that, before setting a lock for a key, you should set the key with Put method. Otherwise it returns ErrKeyNotFound error.

err := dm.LockWithTimeout("my-key", time.Second)

It returns immediately if it acquires the lock for the given key. Otherwise, it waits until timeout. The timeout is determined by http.Client which can be configured via Config structure.

You should know that the locks are approximate, and only to be used for non-critical purposes.

Please take a look at Lock Implementation section for implementation details.

Unlock

Unlock releases an acquired lock for the given key. It returns ErrNoSuchLock if there is no lock for the given key.

err := dm.Unlock("my-key")
Destroy

Destroy flushes the given DMap on the cluster. You should know that there is no global lock on DMaps. So if you call Put/PutEx and Destroy methods concurrently on the cluster, Put/PutEx calls may set new values to the DMap.

err := dm.Destroy()

Configuration

memberlist configuration can be tricky and and the default configuration set should be tuned for your environment. A detailed deployment and configuration guide will be prepared before stable release.

Please take a look at Config section at godoc.org

Here is a sample configuration for a cluster with two hosts:

m1, _ := olricdb.NewMemberlistConfig("local")
m1.BindAddr = "127.0.0.1"
m1.BindPort = 5555
c1 := &olricdb.Config{
	Name:          "127.0.0.1:3535", // Unique in the cluster and used by HTTP server.
	Peers:         []string{"127.0.0.1:5656"},
	MemberlistCfg: m1,
}

m2, _ := olricdb.NewMemberlistConfig("local")
m2.BindAddr = "127.0.0.1"
m2.BindPort = 5656
c2 := &olricdb.Config{
	Name:          "127.0.0.1:3636",
	Peers:         []string{"127.0.0.1:5555"},
	MemberlistCfg: m2,
}

db1, err := olricdb.New(c1)
// Check error

db2, err := olricdb.New(c2)
// Check error

// Call Start method for db1 and db2 in a seperate goroutine.

Architecture

Overview

OlricDB uses:

OlricDB distributes data among partitions. Every partition is owned by a cluster member and may has one or more backup for redundancy. When you read or write a map entry, you transparently talk to the partition owner. Each request hits the most up-to-date version of a particular data entry in a stable cluster.

In order to find the partition which the key belongs to, OlricDB hashes the key and mod it with the number of partitions:

partID = MOD(hash result, partition count)

The partitions are distributed among cluster members by using a consistent hashing algorithm. In order to get details, please see buraksezer/consistent. The backup owners are also calculated by the same package.

When a new cluster is created, one of the instances elected as the cluster coordinator. It manages the partition table:

  • When a node joins or leaves, it distributes the partitions and their backups among the members again,
  • Removes empty owners from the partition owners list,
  • Pushes the new partition table to all the members,
  • Pushes the the partition table to the cluster periodically.

Members propagates their birthdate(Unix timestamp in nanoseconds) to the cluster. The coordinator is the oldest member in the cluster. If the coordinator leaves the cluster, the second oldest member elected as the coordinator.

OlricDB has a component called fsck which is responsible for keeping underlying data structures consistent:

  • Works on every node,
  • When a node joins or leaves, the cluster coordinator pushes the new partition table. Then, fsck goroutine runs immediately and moves the partitions and backups to their new hosts,
  • Merges fragmented partitions,
  • Runs at background periodically and repairs partitions i.e. creates new backups if required.

Partitions have a concept called owners list. When a node joins or leaves the cluster, a new primary owner may be assigned by the coordinator. At any time, a partition may has one or more partition owner. If a partition has two or more owner, this is called fragmented partition. The last added owner is called primary owner. Write operation is only done by the primary owner. The previous owners are only used for read and delete.

When you read a key, the primary owner tries to find the key on itself, first. Then, queries the previous owners and backups, respectively. Delete operation works with the same way.

The data(distributed map objects) in the fragmented partition is moved slowly to the primary owner by fsck goroutine. Until the move is done, the data remains available on the previous owners. DMap methods use this list to query data on the cluster.

Please note that, multiple partition owner is an undesirable situation and the fsck component is designed to fix that in a short time.

OlricDB uses HTTP as transport layer. It's suitable to transfer small messages between servers. HTTP/2 is highly recommended for production use because it uses a single TCP socket to deliver multiple requests and responses in parallel.

When you call Start method of OlricDB, it starts an HTTP server at background which can be configured by the user via Config struct.

Consistency and Replication Model

OlricDB is an AP product, which employs the combination of primary-copy and optimistic replication techniques. With optimistic replication, when the partition owner receives a write or delete operation for a key, applies it locally, and propagates it to backup owners.

This technique enables OlricDB clusters to offer high throughput. However, due to temporary situations in the system, such as network failure, backup owners can miss some updates and diverge from the primary owner. If a partition owner crashes while there is an inconsistency between itself and the backups, strong consistency of the data can be lost.

Two types of backup replication are available: sync and async. Both types are still implementations of the optimistic replication model.

  • sync: Blocks until write/delete operation is applied by backup owners.
  • async: Just fire & forget.

An anti-entropy system has been planned to deal with inconsistencies in DMaps.

Eviction

OlricDB only implements TTL eviction policy. It shares the same algorithm with Redis:

Periodically Redis tests a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.

Specifically this is what Redis does 10 times per second:

  • Test 20 random keys from the set of keys with an associated expire.
  • Delete all the keys found expired.
  • If more than 25% of keys were expired, start again from step 1.

This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%

When a client tries to access a key, OlricDB returns ErrKeyNotFound if the key is found to be timed out. A background task evicts keys with the algorithm described above.

LRU eviction policy implementation has been planned.

Lock Implementation

DMap implementation is already thread-safe to meet your thread safety requirements. When you want to have more control on the concurrency, you can use LockWithTimeout method. It's slightly modified version of Moby's(formerly Docker) locker package. It utilizes sync.Mutex. Take a look at the code for details.

Please note that the lock implementation has no backup. So if the node, which the lock belongs to, crashed, the acquired lock is dropped.

I recommend the lock implementation to be used for efficiency purposes in general, instead of correctness.

Client

OlricDB is mainly designed to be used as an embedded DHT. So if you are running long-lived servers, OlricDB is pretty suitable to share some transient, approximate, fast-changing data between them. What if you want to access the cluster in a short-lived process? Fortunately, OlricDB has a simple HTTP API which can be used to access the cluster within any environment. It will be documented soon.

A Golang client is already prepared to access and modify DMaps from outside. Here is the documentation.

Planned Features

  • Anti-entropy system to repair inconsistencies in DMaps,
  • LRU eviction policy,
  • Eviction listeners, if it's reasonable and easy to build,
  • Python client.

We may implement different data structures such as list, queue or bitmap in OlricDB. It's highly depends on attention of the Golang community.

Sample Code

The following snipped can be run on your computer directly. It's a single-node setup, of course:

package main

import (
	"context"
	"fmt"
	"log"
	"reflect"
	"strconv"
	"time"

	"github.com/buraksezer/olricdb"
)

type customType struct {
	Field1 string
	Field2 uint64
}

func main() {
	// This creates a single-node OlricDB cluster. It's good enough for experimenting.
	db, err := olricdb.New(nil)
	if err != nil {
		log.Fatalf("Failed to create OlricDB object: %v", err)
	}

	go func() {
		// Call Start at background. It's a blocker call.
		err = db.Start()
		if err != nil {
			log.Fatalf("Failed to call Start: %v", err)
		}
	}()

	// Put 10 items into the DMap object.
	dm := db.NewDMap("bucket-of-arbitrary-items")
	for i := 0; i < 10; i++ {
		c := customType{}
		c.Field1 = fmt.Sprintf("num: %d", i)
		c.Field2 = uint64(i)
		err = dm.Put(strconv.Itoa(i), c)
		if err != nil {
			log.Printf("Put call failed: %v", err)
		}
	}

	// Read them again.
	for i := 0; i < 10; i++ {
		val, err := dm.Get(strconv.Itoa(i))
		if err != nil {
			log.Printf("Get call failed: %v", err)
		}
		fmt.Println(val, reflect.TypeOf(val))
	}

	// Don't forget the call Shutdown when you want to leave the cluster.
	ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
	err = db.Shutdown(ctx)
	if err != nil {
		log.Printf("Failed to shutdown OlricDB: %v", err)
	}
}

To-Do

  • Document the code,
  • Some parts of FSCK implementation is missing: It currently doesn't repair failed backups,
  • Design & write benchmarks,
  • Document the external HTTP API,
  • Build a website for OlricDB and create extensive documentation.

Caveats

OlricDB uses Golang's built-in map. It's known that the built-in map has problems with the GC:

OlricDB already uses map[uint64]interface{} as underlying data structure. It should work fine for most of the cases.

I have implemented an off-heap hash table with mmap. We may add an option to use it in the future but my implementation needs too much effort to be used in production.

Contributions

Please don't hesitate to fork the project and send a pull request or just e-mail me to ask questions and share ideas.

License

The Apache License, Version 2.0 - see LICENSE for more details.

About the name

The inner voice of Turgut Özben who is the main character of Oğuz Atay's masterpiece -The Disconnected-.

Documentation

Overview

Package olricdb provides embeddable, in-memory and distributed key/value store.

Index

Constants

View Source
const (
	// SyncBackupMode enables sync backup mode which means that the caller is blocked
	// until write/delete operation is applied by backup owners.
	// The default mode is SyncBackupMode
	SyncBackupMode = 0

	// AsyncBackupMode enables async backup mode which means that write/delete operations
	// are done in a background task.
	AsyncBackupMode = 1
)
View Source
const (
	// DefaultPartitionCount determines default partition count in the cluster.
	DefaultPartitionCount = 271

	// DefaultLoadFactor is used by the consistent hashing function. Keep it small.
	DefaultLoadFactor = 1.25

	// DefaultLogLevel determines the log level without extra configuration. It's DEBUG.
	DefaultLogLevel = "DEBUG"
)

Variables

View Source
var (
	// ErrKeyNotFound is returned when a key could not be found.
	ErrKeyNotFound = errors.New("key not found")

	// ErrOperationTimeout is returned when an operation times out.
	ErrOperationTimeout = errors.New("operation timeout")
)
View Source
var ErrNoSuchLock = errors.New("no such lock")

ErrNoSuchLock is returned when the requested lock does not exist

Functions

func NewMemberlistConfig

func NewMemberlistConfig(env string) (*memberlist.Config, error)

NewMemberlistConfig returns a new memberlist.Config from vendored version of that package. It takes an env parameter: local, lan and wan.

local: DefaultLocalConfig works like DefaultConfig, however it returns a configuration that is optimized for a local loopback environments. The default configuration is still very conservative and errs on the side of caution.

lan: DefaultLANConfig returns a sane set of configurations for Memberlist. It uses the hostname as the node name, and otherwise sets very conservative values that are sane for most LAN environments. The default configuration errs on the side of caution, choosing values that are optimized for higher convergence at the cost of higher bandwidth usage. Regardless, these values are a good starting point when getting started with memberlist.

wan: DefaultWANConfig works like DefaultConfig, however it returns a configuration that is optimized for most WAN environments. The default configuration is still very conservative and errs on the side of caution.

Types

type Config

type Config struct {
	LogLevel string
	// Name of this node in the cluster. This must be unique in the cluster. If this is not set,
	// OlricDB will set it to the hostname of the running machine. Example: node1.my-cluster.net
	//
	// Name is also used by the HTTP server as Addr. It should be an IP adress or domain name of the server.
	Name string

	// The list of host:port which are used by memberlist for discovery. Don't confuse it with Name.
	Peers []string

	// PartitionCount is 271, by default.
	PartitionCount uint64

	// BackupCount is 0, by default.
	BackupCount int

	// Default value is SyncBackupMode.
	BackupMode int

	// LoadFactor is used by consistent hashing function. It determines the maximum load
	// for a server in the cluster. Keep it small.
	LoadFactor float64

	// Default hasher is FNV64a. You may want to use a different hasher which implements
	// Hasher interface.
	Hasher Hasher

	// TLS certificate file for HTTP server. If it's empty, TLS is disabled.
	CertFile string

	// TLS key file for HTTP server. If it's empty, TLS is disabled.
	KeyFile string

	// A Client is an HTTP client. Its zero value (DefaultClient) is a usable client that uses DefaultTransport.
	Client *http.Client

	// HTTP server. Don't set Addr field. It's overwritten by Name field.
	Server *http.Server

	// LogOutput is the writer where logs should be sent. If this is not
	// set, logging will go to stderr by default. You cannot specify both LogOutput
	// and Logger at the same time.
	LogOutput io.Writer

	// Logger is a custom logger which you provide. If Logger is set, it will use
	// this for the internal logger. If Logger is not set, it will fall back to the
	// behavior for using LogOutput. You cannot specify both LogOutput and Logger
	// at the same time.
	Logger *log.Logger

	// MemberlistConfig is the memberlist configuration that OlricDB will
	// use to do the underlying membership management and gossip. Some
	// fields in the MemberlistConfig will be overwritten by OlricDB no
	// matter what:
	//
	//   * Name - This will always be set to the same as the NodeName
	//     in this configuration.
	//
	//   * Events - OlricDB uses a custom event delegate.
	//
	//   * Delegate - OlricDB uses a custom delegate.
	//
	// You have to use NewMemberlistConfig to create a new one.
	// Then, you may need to modify it to tune for your environment.
	MemberlistConfig *memberlist.Config
}

Config is the configuration for creating a OlricDB instance.

type DMap

type DMap struct {
	// contains filtered or unexported fields
}

DMap represents a distributed map object.

func (*DMap) Delete

func (dm *DMap) Delete(key string) error

Delete deletes the value for the given key. Delete will not return error if key doesn't exist. It's thread-safe. It is safe to modify the contents of the argument after Delete returns.

func (*DMap) Destroy

func (dm *DMap) Destroy() error

Destroy flushes the given DMap on the cluster. You should know that there is no global lock on DMaps. So if you call Put/PutEx and Destroy methods concurrently on the cluster, Put/PutEx calls may set new values to the DMap.

func (*DMap) Get

func (dm *DMap) Get(key string) (interface{}, error)

Get gets the value for the given key. It returns ErrKeyNotFound if the DB does not contains the key. It's thread-safe. It is safe to modify the contents of the returned value. It is safe to modify the contents of the argument after Get returns.

func (*DMap) LockWithTimeout

func (dm *DMap) LockWithTimeout(key string, timeout time.Duration) error

LockWithTimeout sets a lock for the given key. If the lock is still unreleased the end of given period of time, it automatically releases the lock. Acquired lock is only for the key in this map. Please note that, before setting a lock for a key, you should set the key with Put method. Otherwise it returns ErrKeyNotFound error.

It returns immediately if it acquires the lock for the given key. Otherwise, it waits until timeout. The timeout is determined by http.Client which can be configured via Config structure.

You should know that the locks are approximate, and only to be used for non-critical purposes.

func (*DMap) Put

func (dm *DMap) Put(key string, value interface{}) error

Put sets the value for the given key. It overwrites any previous value for that key and it's thread-safe. The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after Put returns but not before.

func (*DMap) PutEx

func (dm *DMap) PutEx(key string, value interface{}, timeout time.Duration) error

PutEx sets the value for the given key with TTL. It overwrites any previous value for that key. It's thread-safe. The key has to be string. Value type is arbitrary. It is safe to modify the contents of the arguments after Put returns but not before.

func (*DMap) Unlock

func (dm *DMap) Unlock(key string) error

Unlock releases an acquired lock for the given key. It returns ErrNoSuchLock if there is no lock for the given key.

type Hasher

type Hasher interface {
	Sum64([]byte) uint64
}

Hasher is responsible for generating unsigned, 64 bit hash of provided byte slice. Hasher should minimize collisions (generating same hash for different byte slice) and while performance is also important fast functions are preferable (i.e. you can use FarmHash family).

type NodeMetadata

type NodeMetadata struct {
	Birthdate int64
}

TODO: NodeMetadata will be removed.

type OlricDB

type OlricDB struct {
	// contains filtered or unexported fields
}

OlricDB represens an member in the cluster. All functions on the OlricDB structure are safe to call concurrently.

func New

func New(c *Config) (*OlricDB, error)

New creates a new OlricDB object, otherwise returns an error.

func (*OlricDB) NewDMap

func (db *OlricDB) NewDMap(name string) *DMap

NewDMap creates an returns a new DMap object.

func (*OlricDB) Shutdown

func (db *OlricDB) Shutdown(ctx context.Context) error

Shutdown stops background servers and leaves the cluster.

func (*OlricDB) Start

func (db *OlricDB) Start() error

Start starts background servers and joins the cluster.

Directories

Path Synopsis
Package client implements a Golang client to access an OlricDB cluster from outside.
Package client implements a Golang client to access an OlricDB cluster from outside.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL