ts

package
v0.0.0-...-d0b943e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2016 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Overview

Package ts provides a high-level time series API on top of an underlying Cockroach datastore. Cockroach itself provides a single monolithic, sorted key-value map. The ts package collects time series data, consisting of timestamped data points, and efficiently stores it in the key-value store.

Storing time series data is a unique challenge for databases. Time series data is typically generated at an extremely high volume, and is queried by providing a range of time of arbitrary size, which can lead to an enormous amount of data being scanned for a query. Many specialized time series databases exist to meet these challenges.

Cockroach organizes time series data by series name and timestamp; when querying a given series over a time range, all data in that range is thus stored contiguously. However, Cockroach transforms the data in a few important ways in order to effectively query a large amount of data.

Downsampling

The amount of data produced by time series sampling can be staggering; the naive solution, storing every incoming data point with perfect fidelity in a unique key, would command a tremendous amount of computing resources.

However, in most cases a perfect fidelity is not necessary or desired; the exact time a sample was taken is unimportant, with the overall trend of the data over time being far more important to analysis than the individual samples.

With this in mind, Cockroach downsamples data before storing it; the original timestamp for each data point in a series is not recorded. Cockroach instead divides time into contiguous slots of uniform length (e.g. 10 seconds); if multiple data points for a series fall in the same slot, they are aggregated together. Aggregated data for a slot records the count, sum, min and max of data points that were found in that slot. A single slot is referred to as a "sample", and the length of the slot is known as the "sample duration".

Cockroach also uses its own key space efficiently by storing data for multiple samples in a single key. This is done by again dividing time into contiguous slots, but with a longer duration; this is known as the "key duration". For example, Cockroach samples much of its internal data at a resolution of 10 seconds, but stores it with a "key duration" of 1 hour, meaning that all samples that fall in the same hour are stored at the same key. This strategy helps reduce the number of keys scanned during a query.

Finally, Cockroach can record the same series at multiple sample durations, in a process commonly known as "rollup". For example, a single series may be recorded with a sample size of 10 seconds, but also record the same data with a sample size of 1 hour. The 1 hour data will have much less information, but can be queried much faster; this is very useful when querying a series over a very long period of time (e.g. an entire month or year).

A specific sample duration in Cockroach is known as a Resolution. Cockroach supports a fixed set of Resolutions; each Resolution has a fixed sample duration and a key duration. For example, the resolution "Resolution10s" has a sample duration of 10 seconds and a key duration of 1 hour.

Source Keys

Another dimension of time series queries is the aggregation of multiple series; for example, you may want to query the same data point across multiple machines on a cluster.

While Cockroach will support this, in some cases queries are almost *always* an aggregate of multiple source series; for example, we often want to query storage data for a Cockroach accounting prefix, but data is always collected from multiple stores; the information on a specific range is rarely relevant, as they can arbitrarily split and rebalance over time.

Unforunately, data from multiple sources cannot be safely aggregated in the same way as multiple data points from the same series can be downsampled; each series must be stored separately. However, Cockroach *can* optimize for this situation by altering the keyspace; Cockroach can store data from multiple sources contiguously in the key space, ensuring that the multiple series can be queried together more efficiently.

This is done by creating a "source key", an optional identifier that is separate from the series name itself. Source keys are appended to the key as a suffix, after the series name and timestamp; this means that data that is from the same series and time period, but is from different sources, will be stored adjacently in the key space. Data from all sources in a series can thus be queried in a single scan.

Example

A hypothetical example from Cockroach: we want to record the size of all data stored in the cluster with a key prefix of "ExampleApplication". This will let us track data usage for a single application in a shared cluster.

The series name is: Cockroach.disksize.ExampleApplication

Data points for this series are automatically collected from any store that contains keys with a prefix of "ExampleApplication". When data points are written, they are recorded with a store key of: [store id]

There are 3 stores which contain data: 3, 9 and 10. These are arbitrary and may change over time, and are not often interested in the behavior of a specific store in this context.

Data is recorded for January 1st, 2016 between 10:05 pm and 11:05 pm. The data is recorded at a 10 second resolution.

The data is recorded into keys structurally similar to the following:

tsd.cockroach.disksize.ExampleApplication.10s.403234.3
tsd.cockroach.disksize.ExampleApplication.10s.403234.9
tsd.cockroach.disksize.ExampleApplication.10s.403234.10
tsd.cockroach.disksize.ExampleApplication.10s.403235.3
tsd.cockroach.disksize.ExampleApplication.10s.403235.9
tsd.cockroach.disksize.ExampleApplication.10s.403235.10

Data for each source is stored in two keys: one for the 10 pm hour, and one for the 11pm hour. Each key contains the tsd prefix, the series name, the resolution (10s), a timestamp representing the hour, and finally the series key. The keys will appear in the data store in the order shown above.

(Note that the keys will NOT be exactly as pictured above; they will be encoded in a way that is more efficient, but is not readily human readable.)

TODO(mrtracy): The ts package is a work in progress, and will initially only service queries for Cockroach's own internally generated time series data.

Package ts is a generated protocol buffer package.

It is generated from these files:
	cockroach/ts/timeseries.proto

It has these top-level messages:
	TimeSeriesDatapoint
	TimeSeriesData
	Query
	TimeSeriesQueryRequest
	TimeSeriesQueryResponse

Index

Constants

View Source
const (
	// URLPrefix is the prefix for all time series endpoints hosted by the
	// server.
	URLPrefix = "/ts/"
	// URLQuery is the relative URL which should accept query requests.
	URLQuery = URLPrefix + "query"
)

Variables

View Source
var (
	ErrInvalidLengthTimeseries = fmt.Errorf("proto: negative length found during unmarshaling")
	ErrIntOverflowTimeseries   = fmt.Errorf("proto: integer overflow")
)
View Source
var TimeSeriesQueryAggregator_name = map[int32]string{
	1: "AVG",
	2: "SUM",
	3: "MAX",
	4: "MIN",
}
View Source
var TimeSeriesQueryAggregator_value = map[string]int32{
	"AVG": 1,
	"SUM": 2,
	"MAX": 3,
	"MIN": 4,
}
View Source
var TimeSeriesQueryDerivative_name = map[int32]string{
	0: "NONE",
	1: "DERIVATIVE",
	2: "NON_NEGATIVE_DERIVATIVE",
}
View Source
var TimeSeriesQueryDerivative_value = map[string]int32{
	"NONE":                    0,
	"DERIVATIVE":              1,
	"NON_NEGATIVE_DERIVATIVE": 2,
}

Functions

func MakeDataKey

func MakeDataKey(name string, source string, r Resolution, timestamp int64) roachpb.Key

MakeDataKey creates a time series data key for the given series name, source, Resolution and timestamp. The timestamp is expressed in nanoseconds since the epoch; it will be truncated to an exact multiple of the supplied Resolution's KeyDuration.

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB provides Cockroach's Time Series API.

func NewDB

func NewDB(db *client.DB) *DB

NewDB creates a new DB instance.

func (*DB) PollSource

func (db *DB) PollSource(source DataSource, frequency time.Duration, r Resolution, stopper *stop.Stopper)

PollSource begins a Goroutine which periodically queries the supplied DataSource for time series data, storing the returned data in the server. Stored data will be sampled using the provided Resolution. The polling process will continue until the provided stop.Stopper is stopped.

func (*DB) Query

func (db *DB) Query(query Query, r Resolution, startNanos, endNanos int64) ([]TimeSeriesDatapoint, []string, error)

Query returns datapoints for the named time series during the supplied time span. Data is returned as a series of consecutive data points.

Data is queried only at the Resolution supplied: if data for the named time series is not stored at the given resolution, an empty result will be returned.

All data stored on the server is downsampled to some degree; the data points returned represent the average value within a sample period. Each datapoint's timestamp falls in the middle of the sample period it represents.

If data for the named time series was collected from multiple sources, each returned datapoint will represent the sum of datapoints from all sources at the same time. The returned string slices contains a list of all sources for the metric which were aggregated to produce the result.

func (*DB) StoreData

func (db *DB) StoreData(r Resolution, data []TimeSeriesData) error

StoreData writes the supplied time series data to the cockroach server. Stored data will be sampled at the supplied resolution.

type DataSource

type DataSource interface {
	GetTimeSeriesData() []TimeSeriesData
}

A DataSource can be queryied for a slice of time series data.

type Query

type Query struct {
	// The name of the time series to query.
	Name string `protobuf:"bytes,1,opt,name=name" json:"name"`
	// A downsampling aggregation function to apply to datapoints within the
	// same sample period.
	Downsampler *TimeSeriesQueryAggregator `protobuf:"varint,2,opt,name=downsampler,enum=cockroach.ts.TimeSeriesQueryAggregator,def=1" json:"downsampler,omitempty"`
	// An aggregation function used to combine timelike datapoints from the
	// different sources being queried.
	SourceAggregator *TimeSeriesQueryAggregator `` /* 153-byte string literal not displayed */
	// If set to a value other than 'NONE', query will return a derivative
	// (rate of change) of the aggregated datapoints.
	Derivative *TimeSeriesQueryDerivative `protobuf:"varint,4,opt,name=derivative,enum=cockroach.ts.TimeSeriesQueryDerivative,def=0" json:"derivative,omitempty"`
	// An optional list of sources to restrict the time series query. If no
	// sources are provided, all available sources will be queried.
	Sources []string `protobuf:"bytes,5,rep,name=sources" json:"sources,omitempty"`
}

Each Query defines a specific metric to query over the time span of this request.

func (*Query) Descriptor

func (*Query) Descriptor() ([]byte, []int)

func (*Query) GetDerivative

func (m *Query) GetDerivative() TimeSeriesQueryDerivative

func (*Query) GetDownsampler

func (m *Query) GetDownsampler() TimeSeriesQueryAggregator

func (*Query) GetName

func (m *Query) GetName() string

func (*Query) GetSourceAggregator

func (m *Query) GetSourceAggregator() TimeSeriesQueryAggregator

func (*Query) GetSources

func (m *Query) GetSources() []string

func (*Query) Marshal

func (m *Query) Marshal() (data []byte, err error)

func (*Query) MarshalTo

func (m *Query) MarshalTo(data []byte) (int, error)

func (*Query) ProtoMessage

func (*Query) ProtoMessage()

func (*Query) Reset

func (m *Query) Reset()

func (*Query) Size

func (m *Query) Size() (n int)

func (*Query) String

func (m *Query) String() string

func (*Query) Unmarshal

func (m *Query) Unmarshal(data []byte) error

type Resolution

type Resolution int64

Resolution is used to enumerate the different resolution values supported by Cockroach.

const (
	// Resolution10s stores data with a sample resolution of 10 seconds.
	Resolution10s Resolution = 1
)

Resolution enumeration values are directly serialized and persisted into system keys; these values must never be altered or reordered.

func DecodeDataKey

func DecodeDataKey(key roachpb.Key) (string, string, Resolution, int64, error)

DecodeDataKey decodes a time series key into its components.

func (Resolution) KeyDuration

func (r Resolution) KeyDuration() int64

KeyDuration returns the sample duration corresponding to this resolution value, expressed in nanoseconds. The key duration determines how many samples are stored at a single Cockroach key.

func (Resolution) SampleDuration

func (r Resolution) SampleDuration() int64

SampleDuration returns the sample duration corresponding to this resolution value, expressed in nanoseconds.

type Server

type Server struct {
	// contains filtered or unexported fields
}

Server handles incoming external requests related to time series data.

func NewServer

func NewServer(db *DB) *Server

NewServer instantiates a new Server which services requests with data from the supplied DB.

func (*Server) ServeHTTP

func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHTTP implements the http.Handler interface.

type TimeSeriesData

type TimeSeriesData struct {
	// A string which uniquely identifies the variable from which this data was
	// measured.
	Name string `protobuf:"bytes,1,opt,name=name" json:"name"`
	// A string which identifies the unique source from which the variable was measured.
	Source string `protobuf:"bytes,2,opt,name=source" json:"source"`
	// Datapoints representing one or more measurements taken from the variable.
	Datapoints []TimeSeriesDatapoint `protobuf:"bytes,3,rep,name=datapoints" json:"datapoints"`
}

TimeSeriesData is a set of measurements of a single named variable at multiple points in time. This message contains a name and a source which, in combination, uniquely identify the time series being measured. Measurement data is represented as a repeated set of TimeSeriesDatapoint messages.

func (*TimeSeriesData) Descriptor

func (*TimeSeriesData) Descriptor() ([]byte, []int)

func (*TimeSeriesData) Marshal

func (m *TimeSeriesData) Marshal() (data []byte, err error)

func (*TimeSeriesData) MarshalTo

func (m *TimeSeriesData) MarshalTo(data []byte) (int, error)

func (*TimeSeriesData) ProtoMessage

func (*TimeSeriesData) ProtoMessage()

func (*TimeSeriesData) Reset

func (m *TimeSeriesData) Reset()

func (*TimeSeriesData) Size

func (m *TimeSeriesData) Size() (n int)

func (*TimeSeriesData) String

func (m *TimeSeriesData) String() string

func (TimeSeriesData) ToInternal

func (ts TimeSeriesData) ToInternal(
	keyDuration, sampleDuration int64,
) ([]roachpb.InternalTimeSeriesData, error)

ToInternal places the datapoints in a TimeSeriesData message into one or more InternalTimeSeriesData messages. The structure and number of messages returned depends on two variables: a key duration, and a sample duration.

The key duration is an interval length in nanoseconds used to determine how many intervals are grouped into a single InternalTimeSeriesData message.

The sample duration is also an interval length in nanoseconds; it must be less than or equal to the key duration, and must also evenly divide the key duration. Datapoints which fall into the same sample interval will be aggregated together into a single Sample.

Example: Assume the desired result is to aggregate individual datapoints into the same sample if they occurred within the same second; additionally, all samples which occur within the same hour should be stored at the same key location within the same InternalTimeSeriesValue. The sample duration should be 10^9 nanoseconds (value of time.Second), and the key duration should be (3600*10^9) nanoseconds (value of time.Hour).

Note that this method does not accumulate data into individual samples in the case where multiple datapoints fall into the same sample period. Internal data should be merged into the cockroach data store before being read; the engine is responsible for accumulating samples.

The returned slice of InternalTimeSeriesData objects will not be sorted.

For more information on how time series data is stored, see InternalTimeSeriesData and its related structures.

func (*TimeSeriesData) Unmarshal

func (m *TimeSeriesData) Unmarshal(data []byte) error

type TimeSeriesDatapoint

type TimeSeriesDatapoint struct {
	// The timestamp when this datapoint is located, expressed in nanoseconds
	// since the unix epoch.
	TimestampNanos int64 `protobuf:"varint,1,opt,name=timestamp_nanos,json=timestampNanos" json:"timestamp_nanos"`
	// A floating point representation of the value of this datapoint.
	Value float64 `protobuf:"fixed64,2,opt,name=value" json:"value"`
}

TimeSeriesDatapoint is a single point of time series data; a value associated with a timestamp.

func (*TimeSeriesDatapoint) Descriptor

func (*TimeSeriesDatapoint) Descriptor() ([]byte, []int)

func (*TimeSeriesDatapoint) Marshal

func (m *TimeSeriesDatapoint) Marshal() (data []byte, err error)

func (*TimeSeriesDatapoint) MarshalTo

func (m *TimeSeriesDatapoint) MarshalTo(data []byte) (int, error)

func (*TimeSeriesDatapoint) ProtoMessage

func (*TimeSeriesDatapoint) ProtoMessage()

func (*TimeSeriesDatapoint) Reset

func (m *TimeSeriesDatapoint) Reset()

func (*TimeSeriesDatapoint) Size

func (m *TimeSeriesDatapoint) Size() (n int)

func (*TimeSeriesDatapoint) String

func (m *TimeSeriesDatapoint) String() string

func (*TimeSeriesDatapoint) Unmarshal

func (m *TimeSeriesDatapoint) Unmarshal(data []byte) error

type TimeSeriesQueryAggregator

type TimeSeriesQueryAggregator int32

TimeSeriesQueryAggregator describes a set of aggregation functions which can be used to combine multiple datapoints into a single datapoint.

Aggregators are used to "downsample" series by combining datapoints from the same series at different times. They are also used to "aggregate" values from different series, combining data points from different series at the same time.

const (
	// AVG returns the average value of datapoints.
	TimeSeriesQueryAggregator_AVG TimeSeriesQueryAggregator = 1
	// SUM returns the sum value of datapoints.
	TimeSeriesQueryAggregator_SUM TimeSeriesQueryAggregator = 2
	// MAX returns the maximum value of datapoints.
	TimeSeriesQueryAggregator_MAX TimeSeriesQueryAggregator = 3
	// MIN returns the minimum value of datapoints.
	TimeSeriesQueryAggregator_MIN TimeSeriesQueryAggregator = 4
)
const Default_Query_Downsampler TimeSeriesQueryAggregator = TimeSeriesQueryAggregator_AVG
const Default_Query_SourceAggregator TimeSeriesQueryAggregator = TimeSeriesQueryAggregator_SUM

func (TimeSeriesQueryAggregator) Enum

func (TimeSeriesQueryAggregator) EnumDescriptor

func (TimeSeriesQueryAggregator) EnumDescriptor() ([]byte, []int)

func (TimeSeriesQueryAggregator) String

func (x TimeSeriesQueryAggregator) String() string

func (*TimeSeriesQueryAggregator) UnmarshalJSON

func (x *TimeSeriesQueryAggregator) UnmarshalJSON(data []byte) error

type TimeSeriesQueryDerivative

type TimeSeriesQueryDerivative int32

TimeSeriesQueryDerivative describes a derivative function used to convert returned datapoints into a rate-of-change.

const (
	// NONE is the default value, and does not apply a derivative function.
	TimeSeriesQueryDerivative_NONE TimeSeriesQueryDerivative = 0
	// DERIVATIVE returns the first-order derivative of values in the time series.
	TimeSeriesQueryDerivative_DERIVATIVE TimeSeriesQueryDerivative = 1
	// NON_NEGATIVE_DERIVATIVE returns only non-negative values of the first-order
	// derivative; negative values are returned as zero. This should be used for
	// counters that monotonically increase, but might wrap or reset.
	TimeSeriesQueryDerivative_NON_NEGATIVE_DERIVATIVE TimeSeriesQueryDerivative = 2
)

func (TimeSeriesQueryDerivative) Enum

func (TimeSeriesQueryDerivative) EnumDescriptor

func (TimeSeriesQueryDerivative) EnumDescriptor() ([]byte, []int)

func (TimeSeriesQueryDerivative) String

func (x TimeSeriesQueryDerivative) String() string

func (*TimeSeriesQueryDerivative) UnmarshalJSON

func (x *TimeSeriesQueryDerivative) UnmarshalJSON(data []byte) error

type TimeSeriesQueryRequest

type TimeSeriesQueryRequest struct {
	// A timestamp in nanoseconds which defines the early bound of the time span
	// for this query.
	StartNanos int64 `protobuf:"varint,1,opt,name=start_nanos,json=startNanos" json:"start_nanos"`
	// A timestamp in nanoseconds which defines the late bound of the time span
	// for this query. Must be greater than start_nanos.
	EndNanos int64 `protobuf:"varint,2,opt,name=end_nanos,json=endNanos" json:"end_nanos"`
	// A set of Queries for this request. A request must have at least one
	// Query.
	Queries []Query `protobuf:"bytes,3,rep,name=queries" json:"queries"`
}

TimeSeriesQueryRequest is the standard incoming time series query request accepted from cockroach clients.

func (*TimeSeriesQueryRequest) Descriptor

func (*TimeSeriesQueryRequest) Descriptor() ([]byte, []int)

func (*TimeSeriesQueryRequest) Marshal

func (m *TimeSeriesQueryRequest) Marshal() (data []byte, err error)

func (*TimeSeriesQueryRequest) MarshalTo

func (m *TimeSeriesQueryRequest) MarshalTo(data []byte) (int, error)

func (*TimeSeriesQueryRequest) ProtoMessage

func (*TimeSeriesQueryRequest) ProtoMessage()

func (*TimeSeriesQueryRequest) Reset

func (m *TimeSeriesQueryRequest) Reset()

func (*TimeSeriesQueryRequest) Size

func (m *TimeSeriesQueryRequest) Size() (n int)

func (*TimeSeriesQueryRequest) String

func (m *TimeSeriesQueryRequest) String() string

func (*TimeSeriesQueryRequest) Unmarshal

func (m *TimeSeriesQueryRequest) Unmarshal(data []byte) error

type TimeSeriesQueryResponse

type TimeSeriesQueryResponse struct {
	// A set of Results; there will be one result for each Query in the matching
	// TimeSeriesQueryRequest, in the same order. A Result will be present for
	// each Query even if there are zero datapoints to return.
	Results []TimeSeriesQueryResponse_Result `protobuf:"bytes,1,rep,name=results" json:"results"`
}

TimeSeriesQueryResponse is the standard response for time series queries returned to cockroach clients.

func (*TimeSeriesQueryResponse) Descriptor

func (*TimeSeriesQueryResponse) Descriptor() ([]byte, []int)

func (*TimeSeriesQueryResponse) Marshal

func (m *TimeSeriesQueryResponse) Marshal() (data []byte, err error)

func (*TimeSeriesQueryResponse) MarshalTo

func (m *TimeSeriesQueryResponse) MarshalTo(data []byte) (int, error)

func (*TimeSeriesQueryResponse) ProtoMessage

func (*TimeSeriesQueryResponse) ProtoMessage()

func (*TimeSeriesQueryResponse) Reset

func (m *TimeSeriesQueryResponse) Reset()

func (*TimeSeriesQueryResponse) Size

func (m *TimeSeriesQueryResponse) Size() (n int)

func (*TimeSeriesQueryResponse) String

func (m *TimeSeriesQueryResponse) String() string

func (*TimeSeriesQueryResponse) Unmarshal

func (m *TimeSeriesQueryResponse) Unmarshal(data []byte) error

type TimeSeriesQueryResponse_Result

type TimeSeriesQueryResponse_Result struct {
	Query      `protobuf:"bytes,1,opt,name=query,embedded=query" json:"query"`
	Datapoints []TimeSeriesDatapoint `protobuf:"bytes,2,rep,name=datapoints" json:"datapoints"`
}

Result is the data returned from a single metric query over a time span.

func (*TimeSeriesQueryResponse_Result) Descriptor

func (*TimeSeriesQueryResponse_Result) Descriptor() ([]byte, []int)

func (*TimeSeriesQueryResponse_Result) Marshal

func (m *TimeSeriesQueryResponse_Result) Marshal() (data []byte, err error)

func (*TimeSeriesQueryResponse_Result) MarshalTo

func (m *TimeSeriesQueryResponse_Result) MarshalTo(data []byte) (int, error)

func (*TimeSeriesQueryResponse_Result) ProtoMessage

func (*TimeSeriesQueryResponse_Result) ProtoMessage()

func (*TimeSeriesQueryResponse_Result) Reset

func (m *TimeSeriesQueryResponse_Result) Reset()

func (*TimeSeriesQueryResponse_Result) Size

func (m *TimeSeriesQueryResponse_Result) Size() (n int)

func (*TimeSeriesQueryResponse_Result) String

func (*TimeSeriesQueryResponse_Result) Unmarshal

func (m *TimeSeriesQueryResponse_Result) Unmarshal(data []byte) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL