datad

package module
v0.0.0-...-d23f83d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 27, 2014 License: BSD-3-Clause Imports: 14 Imported by: 0

README

datad

Build Status

A distributed cache that spreads an existing local data source across a cluster, routes requests for data to the appropriate nodes, and ensures data is replicated and available.

datad was created, and is (almost ready to be) used in production, at Sourcegraph to provide fast, reliable access to 4TB+ of git and hg repository data (files, commits, branches, etc.).

WARNING: This is a very new project. Use at your own risk!

Architecture

  • Data source: any existing local data source, keyed on some function of your choice. E.g., git repository data (keyed on clone URL).
  • Provider: an interface to the data source on the local machine with methods for ensuring a copy of the data exists on disk, updating the data, and enumerating all of the keys of data.
  • Registry: two mappings: (1) for a given data key, a list of cluster nodes that have the underlying data on disk; and (2) for a given node, a list of data keys that it should fetch/compute and store on disk.
  • Node: a member of the cluster that hosts a subset of the data from its local data source, which it continuously synchronizes with the registry.
  • Client: a consumer of the data source that routes its requests for data to the nodes that are registered for any given data key.

Tests

Run go test.

There are also good tests in sourcegraph.com/sourcegraph/vcsstore in the cluster package.

TODO

  • Support keeping a list of data keys that must always be available.
  • Allow nodes to indicate that they are in the process of fetching a piece of data for the first time. As it stands currently, if it takes (e.g.) 5s for the first fetch, and a client requests the key 2x before fetching finishes, then the first request will register the key to the node and the second request will deregister the key from the node (because the key transport notices the node failed to respond successfully). Because registration is done using (pseudo-) "consistent hashing", it usually gets registered to the same node on the next request, and by that time the node has the key, but this still causes unnecessary traffic and errors in the meantime.
  • Allow nodes to indicate they don't want more keys to be registered to them (e.g., when their disk is full).
  • Make the Provider.Keys method return keys as it finds them on disk, instead of waiting until it's found all of them.
  • When the provider is registering existing keys on disk, the watcher catches them and dupes an update. Just make the watcher not watch existing-registered keys.
  • Rebalances continue to reassign to dead nodes.

Documentation

Index

Constants

View Source
const (
	DefaultKeyPrefix = "/datad/"
)

Variables

View Source
var (
	// NodeMembershipTTL is the time-to-live of the etcd key that denotes a
	// node's membership in the cluster.
	NodeMembershipTTL = 10 * time.Second

	// BalanceInterval is the time interval for starting a balancing job on the
	// whole keyspace on each node.
	BalanceInterval = 5 * time.Minute
)
View Source
var ErrKeyNotExist = errors.New("key does not exist")
View Source
var ErrNoAvailableNodesForRegistration = errors.New("no available nodes to register key with")
View Source
var ErrNoNodesForKey = errors.New("key has no nodes")
View Source
var RegistrationTTL = 60 * time.Second

Functions

func IdentityKey

func IdentityKey(path string) (string, error)

IdentityKey is a KeyFunc that treats each path as a key.

Types

type Backend

type Backend interface {
	Get(key string) (string, error)
	List(key string, recursive bool) ([]string, error)

	// ListKeys lists only keys (not directories).
	ListKeys(key string, recursive bool) ([]string, error)

	Set(key, value string) error
	SetDir(key string, ttl uint64) error
	UpdateDir(key string, ttl uint64) error
	Delete(key string) error
}

func NewEtcdBackend

func NewEtcdBackend(keyPrefix string, c *etcd.Client) Backend

type Client

type Client struct {
	// KeyURLPrefix, if set, is prepended to all HTTP request URL paths using
	// the transport from TransportForKey. It is useful when your keys refer to
	// data hosted on a HTTP server at somewhere other than the root path. For
	// example, if the datad key "/foo" refers to "http://example.com/api/foo",
	// then KeyURLPrefix would be "/api/".
	KeyURLPrefix string

	Log *log.Logger
	// contains filtered or unexported fields
}

A Client routes requests for data.

func NewClient

func NewClient(b Backend) *Client

func (*Client) NodesForKey

func (c *Client) NodesForKey(key string) ([]string, error)

NodesForKey returns a list of nodes that, according to the registry, hold the data specified by key.

func (*Client) NodesInCluster

func (c *Client) NodesInCluster() ([]string, error)

NodesInCluster returns a list of all nodes in the cluster.

func (*Client) TransportForKey

func (c *Client) TransportForKey(key string, underlying http.RoundTripper) (*KeyTransport, error)

TransportForKey returns a HTTP transport (http.RoundTripper) optimized for accessing the data specified by key.

If key is not registered to any nodes, ErrNoNodesForKey is returned.

func (*Client) Update

func (c *Client) Update(key string) (nodes []string, err error)

Update updates key from the data source on the nodes that are registered to it. If key is not registered to any nodes, a node is registered for it and the key is created on that node.

type EtcdBackend

type EtcdBackend struct {
	// contains filtered or unexported fields
}

func (*EtcdBackend) Delete

func (c *EtcdBackend) Delete(key string) error

func (*EtcdBackend) Get

func (c *EtcdBackend) Get(key string) (string, error)

func (*EtcdBackend) List

func (c *EtcdBackend) List(key string, recursive bool) ([]string, error)

func (*EtcdBackend) ListKeys

func (c *EtcdBackend) ListKeys(key string, recursive bool) ([]string, error)

func (*EtcdBackend) Set

func (c *EtcdBackend) Set(key, value string) error

func (*EtcdBackend) SetDir

func (c *EtcdBackend) SetDir(key string, ttl uint64) error

func (*EtcdBackend) UpdateDir

func (c *EtcdBackend) UpdateDir(key string, ttl uint64) error

type HTTPError

type HTTPError struct {
	StatusCode int
	Body       string
}

func (*HTTPError) Error

func (e *HTTPError) Error() string

type KeyFunc

type KeyFunc func(path string) (key string, err error)

A KeyFunc maps path-space onto key-space.

In other words, it returns the key (a string) of the data stored at path. The key, in datad terms, is the unit of storage.

Depending on the type of data, keys and paths may be a 1-to-1 mapping, or paths may point to resources inside of a key. For example, you might key on repositories clone URLs and allow paths that refer to specific files or commits inside of a repository.

type KeyTransport

type KeyTransport struct {
	// contains filtered or unexported fields
}

func (*KeyTransport) CancelRequest

func (t *KeyTransport) CancelRequest(req *http.Request)

CancelRequest is to allow a nonzero Timeout on the http.Client. TODO(sqs): check this.

func (*KeyTransport) RoundTrip

func (t *KeyTransport) RoundTrip(req *http.Request) (*http.Response, error)

RoundTrip implements http.RoundTripper. If at least one node responds successfully, no error is returned. If all nodes fail to respond successfully, a *KeyTransportError is returned with the errors from each node.

func (*KeyTransport) SyncWithRegistry

func (t *KeyTransport) SyncWithRegistry() error

SyncWithRegistry updates the list of nodes that this transport attempts to make HTTP requests to. The new nodes are looked up in the registry.

type KeyTransportError

type KeyTransportError struct {
	URL        string
	NodeErrors map[string]error

	// OtherError is an error encountered while trying to register key with other nodes.
	OtherError error
}

KeyTransportError denotes that the key transport's RoundTrip failed. It records the individual errors for each node it attempted to contact.

func (*KeyTransportError) Error

func (e *KeyTransportError) Error() string

type Node

type Node struct {
	Name     string
	Provider Provider

	// Updaters is the maximum number of concurrent calls to Provider.Update
	// that may be executing at any given time on this node.
	Updaters int

	Log *log.Logger
	// contains filtered or unexported fields
}

A Node ensures that the provider's keys are registered and coordinates distribution of data among the other nodes in the cluster.

func NewNode

func NewNode(name string, b Backend, p Provider) *Node

NewNode creates a new node to publish data from a provider to the cluster. The name ("host:port") is advertised to the cluster and therefore must be accessible by the other clients and nodes in the cluster. The name should be the host and port where the data on this machine is accessible.

Call Start on this node to begin publishing its keys to the cluster.

func (*Node) Start

func (n *Node) Start() error

Start begins advertising this node's provider's keys to the cluster.

func (*Node) Stop

func (n *Node) Stop() error

Stop deregisters this node's keys and stops background processes for this node.

type Provider

type Provider interface {
	// HasKey returns whether this provider has the underlying data for key. If
	// not, it returns the error ErrKeyNotExist.
	HasKey(key string) (bool, error)

	// Keys returns a list of keys under keyPrefix.
	Keys(keyPrefix string) ([]string, error)

	// Update performs a synchronous update of this key's data from the
	// underlying data source. If key does not exist in this provider, it will
	// be created.
	Update(key string) error
}

A Provider makes data accessible to the datad cluster.

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

A Registry contains a bidirectional mapping between data keys and nodes: (1) for a given data key, a list of cluster nodes that have the underlying data on disk; and (2) for a given node, a list of data keys that it should fetch/compute and store on disk.

func NewRegistry

func NewRegistry(b Backend) *Registry

func (*Registry) Add

func (r *Registry) Add(key, node string) error

func (*Registry) KeyMap

func (r *Registry) KeyMap() (map[string][]string, error)

KeyMap returns a map of keys to a list of their registered nodes.

func (*Registry) KeysForNode

func (r *Registry) KeysForNode(node string) ([]string, error)

func (*Registry) NodesForKey

func (r *Registry) NodesForKey(key string) ([]string, error)

func (*Registry) Remove

func (r *Registry) Remove(key, node string) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL