storage

package
v0.0.0-...-2b2087e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 21, 2014 License: Apache-2.0 Imports: 22 Imported by: 0

Documentation

Overview

Package storage implements the Cockroach storage node. A storage node exports the "Node" Go RPC service. Each node handles one or more stores, identified by a device name. Each store corresponds to a single physical device. A store multiplexes to one or more ranges, identified by start key. Ranges are contiguous regions of the keyspace. Each range implements an instance of the raft consensus algorithm to synchronize range replicas.

The Engine interface provides an API for key-value stores. InMem implements an in-memory engine using a sorted map. RocksDB implements an engine for data stored to local disk using RocksDB, a variant of LevelDB.

Index

Constants

This section is empty.

Variables

View Source
var (
	// KeyMin is a minimum key value which sorts before all other keys.
	KeyMin = Key("")
	// KeyMax is a maximum key value which sorts after all other
	// keys. Because keys are stored using an ordered encoding (see
	// storage/encoding.go), they will never start with \xff.
	KeyMax = Key("\xff")

	// KeyConfigAccountingPrefix specifies the key prefix for accounting
	// configurations. The suffix is the affected key prefix.
	KeyConfigAccountingPrefix = Key("\x00acct")
	// KeyConfigPermissionPrefix specifies the key prefix for accounting
	// configurations. The suffix is the affected key prefix.
	KeyConfigPermissionPrefix = Key("\x00perm")
	// KeyConfigZonePrefix specifies the key prefix for zone
	// configurations. The suffix is the affected key prefix.
	KeyConfigZonePrefix = Key("\x00zone")
	// KeyMetaPrefix is the prefix for range metadata keys.
	KeyMetaPrefix = Key("\x00\x00meta")
	// KeyMeta1Prefix is the first level of key addressing. The value is a
	// RangeDescriptor struct.
	KeyMeta1Prefix = MakeKey(KeyMetaPrefix, Key("1"))
	// KeyMeta2Prefix is the second level of key addressing. The value is a
	// RangeDescriptor struct.
	KeyMeta2Prefix = MakeKey(KeyMetaPrefix, Key("2"))
	// KeyNodeIDGenerator contains a sequence generator for node IDs.
	KeyNodeIDGenerator = Key("\x00node-id-generator")
	// KeyStoreIDGeneratorPrefix specifies key prefixes for sequence
	// generators, one per node, for store IDs.
	KeyStoreIDGeneratorPrefix = Key("\x00store-id-generator-")
)

Constants for system-reserved keys in the KV map.

Functions

This section is empty.

Types

type AcctConfig

type AcctConfig struct {
}

AcctConfig holds accounting configuration.

type AccumulateTSRequest

type AccumulateTSRequest struct {
	RequestHeader
	Key    Key
	Counts []int64 // One per discrete subtime period (e.g. one/minute or one/second)
}

An AccumulateTSRequest is arguments to the AccumulateTS() method. It specifies the key at which to accumulate TS values, and the time series counts for this discrete time interval.

type AccumulateTSResponse

type AccumulateTSResponse struct {
	ResponseHeader
}

An AccumulateTSResponse is the return value from the AccumulateTS() method.

type Attributes

type Attributes []string

Attributes specifies a list of arbitrary strings describing node topology, store type, and machine capabilities.

func (Attributes) IsSubset

func (a Attributes) IsSubset(b Attributes) bool

IsSubset returns whether attributes list b is a subset of attributes list a.

func (Attributes) SortedString

func (a Attributes) SortedString() string

SortedString returns a sorted, de-duplicated, comma-separated list of the attributes.

type ContainsRequest

type ContainsRequest struct {
	RequestHeader
	Key Key
}

A ContainsRequest is arguments to the Contains() method.

type ContainsResponse

type ContainsResponse struct {
	ResponseHeader
	Exists bool
}

A ContainsResponse is the return value of the Contains() method.

type DeleteRangeRequest

type DeleteRangeRequest struct {
	RequestHeader
	StartKey Key // Empty to start at first key
	EndKey   Key // Non-inclusive; if empty, deletes all
}

A DeleteRangeRequest is arguments to the DeleteRange method. It specifies the range of keys to delete.

type DeleteRangeResponse

type DeleteRangeResponse struct {
	ResponseHeader
	NumDeleted uint64
}

A DeleteRangeResponse is the return value from the DeleteRange() method.

type DeleteRequest

type DeleteRequest struct {
	RequestHeader
	Key Key
}

A DeleteRequest is arguments to the Delete() method.

type DeleteResponse

type DeleteResponse struct {
	ResponseHeader
}

A DeleteResponse is the return value from the Delete() method.

type EndTransactionRequest

type EndTransactionRequest struct {
	RequestHeader
	Commit bool  // False to abort and rollback
	Keys   []Key // Write-intent keys to commit or abort
}

An EndTransactionRequest is arguments to the EndTransaction() method. It specifies whether to commit or roll back an extant transaction. It also lists the keys involved in the transaction so their write intents may be aborted or committed.

type EndTransactionResponse

type EndTransactionResponse struct {
	ResponseHeader
	CommitTimestamp int64 // Unix nanos (us)
	CommitWait      int64 // Remaining with (us)
}

An EndTransactionResponse is the return value from the EndTransaction() method. It specifies the commit timestamp for the final transaction (all writes will have this timestamp). It further specifies the commit wait, which is the remaining time the client MUST wait before signalling completion of the transaction to another distributed node to maintain consistency.

type Engine

type Engine interface {
	// The engine/store attributes.
	Attrs() Attributes
	// contains filtered or unexported methods
}

Engine is the interface that wraps the core operations of a key/value store.

type EnqueueMessageRequest

type EnqueueMessageRequest struct {
	RequestHeader
	Inbox   Key   // Recipient key
	Message Value // Message value to delivery to inbox
}

An EnqueueMessageRequest is arguments to the EnqueueMessage() method. It specifies the recipient inbox key and the message (an arbitrary byte slice value).

type EnqueueMessageResponse

type EnqueueMessageResponse struct {
	ResponseHeader
}

An EnqueueMessageResponse is the return value from the EnqueueMessage() method.

type EnqueueUpdateRequest

type EnqueueUpdateRequest struct {
	RequestHeader
	Update interface{}
}

An EnqueueUpdateRequest is arguments to the EnqueueUpdate() method. It specifies the update to enqueue for asynchronous execution. Update is an instance of one of the following messages: PutRequest, IncrementRequest, DeleteRequest, DeleteRangeRequest, or AccountingRequest.

type EnqueueUpdateResponse

type EnqueueUpdateResponse struct {
	ResponseHeader
}

An EnqueueUpdateResponse is the return value from the EnqueueUpdate() method.

type GetRequest

type GetRequest struct {
	RequestHeader
	Key Key
}

A GetRequest is arguments to the Get() method.

type GetResponse

type GetResponse struct {
	ResponseHeader
	Value Value
}

A GetResponse is the return value from the Get() method. If the key doesn't exist, returns nil for Value.Bytes.

type InMem

type InMem struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

InMem a simple, in-memory key-value store.

func NewInMem

func NewInMem(attrs Attributes, maxBytes int64) *InMem

NewInMem allocates and returns a new InMem object.

func (*InMem) Attrs

func (in *InMem) Attrs() Attributes

Attrs returns the list of attributes describing this engine. This includes the disk type (always "mem") and potentially other labels to identify important attributes of the engine.

func (*InMem) String

func (in *InMem) String() string

String formatter.

type IncrementRequest

type IncrementRequest struct {
	RequestHeader
	Key       Key
	Increment int64
}

An IncrementRequest is arguments to the Increment() method. It increments the value for key, interpreting the existing value as a varint64.

type IncrementResponse

type IncrementResponse struct {
	ResponseHeader
	NewValue int64
}

An IncrementResponse is the return value from the Increment method. The new value after increment is specified in NewValue. If the value could not be decoded as specified, Error will be set.

type InternalRangeLookupRequest

type InternalRangeLookupRequest struct {
	RequestHeader
	Key Key
}

An InternalRangeLookupRequest is arguments to the InternalRangeLookup() method. It specifies the key for range lookup, which is a system key prefixed by KeyMeta1Prefix or KeyMeta2Prefix to the user key.

type InternalRangeLookupResponse

type InternalRangeLookupResponse struct {
	ResponseHeader
	EndKey Key // The key in datastore whose value is the Range object.
	Range  RangeDescriptor
}

An InternalRangeLookupResponse is the return value from the InternalRangeLookup() method. It returns the metadata for the range where the key resides. When looking up 1-level metadata, it returns the info for the range containing the 2-level metadata for the key. And when looking up 2-level metadata, it returns the info for the range possibly containing the actual key and its value.

type Key

type Key []byte

Key defines the key in the key-value datastore.

func MakeKey

func MakeKey(prefix, suffix Key) Key

MakeKey makes a new key which is prefix+suffix.

func PrefixEndKey

func PrefixEndKey(prefix Key) Key

PrefixEndKey determines the end key given a start key as a prefix. This adds "1" to the final byte and propagates the carry. The special case of KeyMin ("") always returns KeyMax ("\xff").

type KeyValue

type KeyValue struct {
	Key
	Value
}

KeyValue is a pair of Key and Value for returned Key/Value pairs from ScanRequest/ScanResponse. It embeds a Key and a Value.

func (KeyValue) Compare

func (kv KeyValue) Compare(b llrb.Comparable) int

Compare implements the llrb.Comparable interface for tree nodes.

type LogEntry

type LogEntry struct {
	Method string
	Args   interface{}
	Reply  interface{}
	// contains filtered or unexported fields
}

A LogEntry provides serialization of a read/write command. Once committed to the log, the command is executed and the result returned via the done channel.

type NodeDescriptor

type NodeDescriptor struct {
	NodeID  int32
	Address net.Addr
	Attrs   Attributes // node specific attributes (e.g. datacenter, machine info)
}

NodeDescriptor holds details on node physical/network topology.

type PermConfig

type PermConfig struct {
	Perms []Permission `yaml:"permissions,omitempty"`
}

PermConfig holds permission configuration.

type Permission

type Permission struct {
	Users    []string `yaml:"users,omitempty"`    // Empty to specify default permission
	Read     bool     `yaml:"read,omitempty"`     // Default means reads are restricted
	Write    bool     `yaml:"write,omitempty"`    // Default means writes are restricted
	Priority float32  `yaml:"priority,omitempty"` // 0.0 means default priority
}

Permission specifies read/write access and associated priority.

type PutRequest

type PutRequest struct {
	RequestHeader
	Key      Key    // must be non-empty
	Value    Value  // The value to put
	ExpValue *Value // ExpValue.Bytes empty to test for non-existence
}

A PutRequest is arguments to the Put() method. Conditional puts are supported if ExpValue is set. - Returns true and sets value if ExpValue equals existing value. - If key doesn't exist and ExpValue is empty, sets value. - Otherwise, returns error.

type PutResponse

type PutResponse struct {
	ResponseHeader
	ActualValue *Value // ActualValue.Bytes set if conditional put failed
}

A PutResponse is the return value form the Put() method.

type Range

type Range struct {
	Meta RangeMetadata
	// contains filtered or unexported fields
}

A Range is a contiguous keyspace with writes managed via an instance of the Raft consensus algorithm. Many ranges may exist in a store and they are unlikely to be contiguous. Ranges are independent units and are responsible for maintaining their own integrity by replacing failed replicas, splitting and merging as appropriate.

func NewRange

func NewRange(meta RangeMetadata, engine Engine, allocator *allocator, gossip *gossip.Gossip) *Range

NewRange initializes the range starting at key.

func (*Range) AccumulateTS

func (r *Range) AccumulateTS(args *AccumulateTSRequest, reply *AccumulateTSResponse)

AccumulateTS is used internally to aggregate statistics over key ranges throughout the distributed cluster.

func (*Range) Contains

func (r *Range) Contains(args *ContainsRequest, reply *ContainsResponse)

Contains verifies the existence of a key in the key value store.

func (*Range) Delete

func (r *Range) Delete(args *DeleteRequest, reply *DeleteResponse)

Delete deletes the key and value specified by key.

func (*Range) DeleteRange

func (r *Range) DeleteRange(args *DeleteRangeRequest, reply *DeleteRangeResponse)

DeleteRange deletes the range of key/value pairs specified by start and end keys.

func (*Range) EndTransaction

func (r *Range) EndTransaction(args *EndTransactionRequest, reply *EndTransactionResponse)

EndTransaction either commits or aborts (rolls back) an extant transaction according to the args.Commit parameter.

func (*Range) EnqueueMessage

func (r *Range) EnqueueMessage(args *EnqueueMessageRequest, reply *EnqueueMessageResponse)

EnqueueMessage enqueues a message (Value) for delivery to a recipient inbox.

func (*Range) EnqueueUpdate

func (r *Range) EnqueueUpdate(args *EnqueueUpdateRequest, reply *EnqueueUpdateResponse)

EnqueueUpdate sidelines an update for asynchronous execution. AccumulateTS updates are sent this way. Eventually-consistent indexes are also built using update queues. Crucially, the enqueue happens as part of the caller's transaction, so is guaranteed to be executed if the transaction succeeded.

func (*Range) Get

func (r *Range) Get(args *GetRequest, reply *GetResponse)

Get returns the value for a specified key.

func (*Range) Increment

func (r *Range) Increment(args *IncrementRequest, reply *IncrementResponse)

Increment increments the value (interpreted as varint64 encoded) and returns the newly incremented value (encoded as varint64). If no value exists for the key, zero is incremented.

func (*Range) InternalRangeLookup

func (r *Range) InternalRangeLookup(args *InternalRangeLookupRequest, reply *InternalRangeLookupResponse)

InternalRangeLookup looks up the metadata info for the given args.Key. args.Key should be a metadata key, which are of the form "\0\0meta[12]<encoded_key>".

func (*Range) IsFirstRange

func (r *Range) IsFirstRange() bool

IsFirstRange returns true if this is the first range.

func (*Range) IsLeader

func (r *Range) IsLeader() bool

IsLeader returns true if this range replica is the raft leader. TODO(spencer): this is always true for now.

func (*Range) Put

func (r *Range) Put(args *PutRequest, reply *PutResponse)

Put sets the value for a specified key. Conditional puts are supported.

func (*Range) ReadOnlyCmd

func (r *Range) ReadOnlyCmd(method string, args, reply interface{}) error

ReadOnlyCmd executes a read-only command against the store. If this server is the raft leader, we can satisfy the read locally. Otherwise, if this server has executed a raft command or heartbeat at a timestamp greater than the read timestamp, we can also satisfy the read locally. Otherwise, we must ping the leader to determine with certainty whether our local data is up to date.

func (*Range) ReadWriteCmd

func (r *Range) ReadWriteCmd(method string, args, reply interface{}) <-chan error

ReadWriteCmd executes a read-write command against the store. If this node is the raft leader, it proposes the write to the other raft participants. Otherwise, the write is forwarded via a FollowerPropose RPC to the leader and this replica waits for an ACK to execute the command locally and return the result to the requesting client.

Commands which mutate the store must be proposed as part of the raft consensus write protocol. Only after committed can the command be executed. To facilitate this, ReadWriteCmd returns a channel which is signaled upon completion.

func (*Range) ReapQueue

func (r *Range) ReapQueue(args *ReapQueueRequest, reply *ReapQueueResponse)

ReapQueue destructively queries messages from a delivery inbox queue. This method must be called from within a transaction.

func (*Range) Scan

func (r *Range) Scan(args *ScanRequest, reply *ScanResponse)

Scan scans the key range specified by start key through end key up to some maximum number of results. The last key of the iteration is returned with the reply.

func (*Range) Start

func (r *Range) Start()

Start begins gossiping and starts the pending log entry processing loop in a goroutine.

func (*Range) Stop

func (r *Range) Stop()

Stop ends the log processing loop.

type RangeDescriptor

type RangeDescriptor struct {
	// The start key of the range represented by this struct, along with the
	// meta1 or meta2 key prefix.
	StartKey Key
	Replicas []Replica
}

RangeDescriptor is the metadata value stored for a metadata key. The metadata key has meta1 or meta2 key prefix and the suffix encodes the end key of the range this struct represents.

type RangeMetadata

type RangeMetadata struct {
	ClusterID string
	RangeID   int64
	StartKey  Key
	EndKey    Key
	Replicas  RangeDescriptor
}

A RangeMetadata holds information about the range, including range ID and start and end keys, and replicas slice.

type ReapQueueRequest

type ReapQueueRequest struct {
	RequestHeader
	Inbox      Key   // Recipient inbox key
	MaxResults int64 // Maximum results to return; must be > 0
}

A ReapQueueRequest is arguments to the ReapQueue() method. It specifies the recipient inbox key to which messages are waiting to be reapted and also the maximum number of results to return.

type ReapQueueResponse

type ReapQueueResponse struct {
	ResponseHeader
	Messages []Value
}

A ReapQueueResponse is the return value from the ReapQueue() method.

type Replica

type Replica struct {
	NodeID  int32
	StoreID int32
	RangeID int64
	Attrs   Attributes // combination of node & store attributes
}

Replica describes a replica location by node ID (corresponds to a host:port via lookup on gossip network), store ID (corresponds to a physical device, unique per node) and range ID. Datacenter and DiskType are provided to optimize reads. Replicas are stored in Range lookup records (meta1, meta2).

func ChooseRandomReplica

func ChooseRandomReplica(replicas []Replica) *Replica

ChooseRandomReplica returns a replica selected at random or nil if none exist.

type RequestHeader

type RequestHeader struct {
	// Timestamp specifies time at which read or writes should be
	// performed. In nanoseconds since the epoch. Defaults to current
	// wall time.
	Timestamp int64

	// Replica specifies the destination for the request. See config.go.
	Replica Replica
	// MaxTimestamp is the maximum wall time seen by the client to
	// date. This should be supplied with successive transactions for
	// linearalizability for this client. In nanoseconds since the
	// epoch.
	MaxTimestamp int64
	// TxID is set non-empty if a transaction is underway. Empty string
	// to start a new transaction.
	TxID string
}

RequestHeader is supplied with every storage node request.

type ResponseHeader

type ResponseHeader struct {
	// Error is non-nil if an error occurred.
	Error error
	// TxID is non-empty if a transaction is underway.
	TxID string
}

ResponseHeader is returned with every storage node response.

type RocksDB

type RocksDB struct {
	// contains filtered or unexported fields
}

RocksDB is a wrapper around a RocksDB database instance.

func NewRocksDB

func NewRocksDB(attrs Attributes, dir string) (*RocksDB, error)

NewRocksDB allocates and returns a new RocksDB object.

func (*RocksDB) Attrs

func (r *RocksDB) Attrs() Attributes

Attrs returns the list of attributes describing this engine. This may include a specification of disk type (e.g. hdd, ssd, fio, etc.) and potentially other labels to identify important attributes of the engine.

func (*RocksDB) String

func (r *RocksDB) String() string

String formatter.

type ScanRequest

type ScanRequest struct {
	RequestHeader
	StartKey   Key   // Empty to start at first key
	EndKey     Key   // Optional max key; empty to ignore
	MaxResults int64 // Must be > 0
}

A ScanRequest is arguments to the Scan() method. It specifies the start and end keys for the scan and the maximum number of results.

type ScanResponse

type ScanResponse struct {
	ResponseHeader
	Rows []KeyValue // Empty if no rows were scanned
}

A ScanResponse is the return value from the Scan() method.

type Store

type Store struct {
	Ident StoreIdent
	// contains filtered or unexported fields
}

A Store maintains a map of ranges by start key. A Store corresponds to one physical device.

func NewStore

func NewStore(engine Engine, gossip *gossip.Gossip) *Store

NewStore returns a new instance of a store.

func (*Store) Attrs

func (s *Store) Attrs() Attributes

Attrs returns the attributes of the underlying store.

func (*Store) Bootstrap

func (s *Store) Bootstrap(ident StoreIdent) error

Bootstrap writes a new store ident to the underlying engine. To ensure that no crufty data already exists in the engine, it scans the engine contents before writing the new store ident. The engine should be completely empty. It returns an error if called on a non-empty engine.

func (*Store) Capacity

func (s *Store) Capacity() (StoreCapacity, error)

Capacity returns the capacity of the underlying storage engine.

func (*Store) Close

func (s *Store) Close()

Close calls Range.Stop() on all active ranges.

func (*Store) CreateRange

func (s *Store) CreateRange(startKey, endKey Key, replicas []Replica) (*Range, error)

CreateRange allocates a new range ID and stores range metadata. On success, returns the new range.

func (*Store) Descriptor

func (s *Store) Descriptor(nodeDesc *NodeDescriptor) (*StoreDescriptor, error)

Descriptor returns a StoreDescriptor including current store capacity information.

func (*Store) GetRange

func (s *Store) GetRange(rangeID int64) (*Range, error)

GetRange fetches a range by ID. Returns an error if no range is found.

func (*Store) Init

func (s *Store) Init() error

Init reads the StoreIdent from the underlying engine.

func (*Store) IsBootstrapped

func (s *Store) IsBootstrapped() bool

IsBootstrapped returns true if the store has already been bootstrapped. If the store ident is corrupt, IsBootstrapped will return true; the exact error can be retrieved via a call to Init().

func (*Store) String

func (s *Store) String() string

String formats a store for debug output.

type StoreCapacity

type StoreCapacity struct {
	Capacity  int64
	Available int64
}

StoreCapacity contains capacity information for a storage device.

func (StoreCapacity) PercentAvail

func (sc StoreCapacity) PercentAvail() float64

PercentAvail computes the percentage of disk space that is available.

type StoreDescriptor

type StoreDescriptor struct {
	StoreID  int32
	Attrs    Attributes // store specific attributes (e.g. ssd, hdd, mem)
	Node     NodeDescriptor
	Capacity StoreCapacity
}

StoreDescriptor holds store information including store attributes, node descriptor and store capacity.

func (*StoreDescriptor) CombinedAttrs

func (s *StoreDescriptor) CombinedAttrs() Attributes

CombinedAttrs returns the full list of attributes for the store, including both the node and store attributes.

func (StoreDescriptor) Less

func (s StoreDescriptor) Less(b util.Ordered) bool

Less compares two StoreDescriptors based on percentage of disk available.

type StoreFinder

type StoreFinder func(Attributes) ([]*StoreDescriptor, error)

StoreFinder finds the disks in a datacenter with the most available capacity.

type StoreIdent

type StoreIdent struct {
	ClusterID string
	NodeID    int32
	StoreID   int32
}

A StoreIdent uniquely identifies a store in the cluster. The StoreIdent is written to the underlying storage engine at a store-reserved system key (keyStoreIdent).

type Value

type Value struct {
	// Bytes is the byte string value.
	Bytes []byte
	// Timestamp of value in nanoseconds since epoch.
	Timestamp int64
	// Expiration in nanoseconds.
	Expiration int64
}

Value specifies the value at a key. Multiple values at the same key are supported based on timestamp. Values which have been overwritten have an associated expiration, after which they will be permanently deleted.

type ZoneConfig

type ZoneConfig struct {
	// Replicas is a slice of Attributes, each describing required
	// capabilities of each replica in the zone.
	Replicas      []Attributes `yaml:"replicas,omitempty,flow"`
	RangeMinBytes int64        `yaml:"range_min_bytes,omitempty"`
	RangeMaxBytes int64        `yaml:"range_max_bytes,omitempty"`
}

ZoneConfig holds configuration that is needed for a range of KV pairs.

func ParseZoneConfig

func ParseZoneConfig(in []byte) (*ZoneConfig, error)

ParseZoneConfig parses a YAML serialized ZoneConfig.

func (*ZoneConfig) ToYAML

func (z *ZoneConfig) ToYAML() ([]byte, error)

ToYAML serializes a ZoneConfig as YAML.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL