timeseries

package
v0.0.0-...-cbea63e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2021 License: Apache-2.0 Imports: 11 Imported by: 4

README

C* Time Series

Maintainer: dg@hailo-platform/H2O.com

This readme is brief on the basis that we're recreating this functionality using CQL3 within Om - so it will be explained in more detail there.

The basic idea is that we store entities, ordered by time, within C*. As a user, I should be able to say:

| Give me a list of "things" between date/time X and date/time Y

The way we store this in C* is that we store entire serialised entities (JSON) within column values, and we leverage the column name to give us time ordering, using a bespoke hand-crafted column name with a time component and then a unique ID, eg: 1387072974-7375.

To avoid rows growing too big (a C* anti-pattern), we choose a row key that relates to a "bucket" of time. This is configurable, and is the single most important thing you need to choose if using this package. If you choose a day, then all the things that have a time within a single day will be stored in one row. If you had 1M things happening on a day, this would be a problem.

Integration tests

Setup to run against boxen (hence hard-coded port 19160). You need to create the following schema definitions:

create keyspace testing;
use testing;
create column family TestNarrowRow;
create column family TestNarrowRowIndex;
create column family TestWideRow;
create column family TestWideRowIndex;
create column family TestPrimeRow;
create column family TestPrimeRowIndex;
create column family TestNoInterval;
create column family TestNoIntervalIndex;
create column family TestLargeInterval;
create column family TestLargeIntervalIndex;

Then run this:

go test -tags=integration -v .

Documentation

Overview

Offers up a recipe built on Gossie to make dealing with time indexes simple.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Item

type Item struct {
	// contains filtered or unexported fields
}

Item represents something fetched from timeseries

func (*Item) Unmarshal

func (i *Item) Unmarshal(into interface{}) error

Unmarshal will unmarshal the fetched item into a struct

type Iterator

type Iterator interface {
	Next() bool
	Item() *Item
	Err() error
	Rewind()
	Last() string
}

Iterator represents a slice of our time series that can be looped through, loading on-demand

type Marshaler

type Marshaler func(i interface{}) (uid string, t time.Time)

Marshaler turns each item into a unique ID and time (that we index under)

type SecondaryIndexer

type SecondaryIndexer func(i interface{}) (index string)

SecondaryIndexer turns each item into a secondary index ID

type TimeSeries

type TimeSeries struct {
	Ks, Cf string
	// RowGranularity defines how small or large our row sizes are (how much time each represents)
	RowGranularity time.Duration
	// Marshaler is used to turn a an interface{} into a UID and time (for each item)
	Marshaler Marshaler
	// Secondary indexer is either nil (if none) or a function to extract the secondary
	// index ID from each item
	SecondaryIndexer SecondaryIndexer
	// IndexCf defines whether we should keep an overall index of the data (meaning we can skip rows on read and also do open-ended iterators)
	IndexCf string
	// Ttler is a function for calculating a TTL to apply to written columns
	Ttler Ttler
}

TimeSeries represents a single C* time series index on some data

func (*TimeSeries) Delete

func (ts *TimeSeries) Delete(writer gossie.Writer, item interface{}) error

Delete an item from the timeseries (remove the column entirely)

func (*TimeSeries) Iterator

func (ts *TimeSeries) Iterator(start, end time.Time, from, secondaryIndex string) Iterator

Iterator yields a new iterator that will loop through all time series Items within the specified range - loading from C* on-demand

func (*TimeSeries) Map

func (ts *TimeSeries) Map(writer gossie.Writer, item, lastRead interface{}) error

Map will write in any mutations needed to maintain this timeseries index based on the current item and the lastRead item

func (*TimeSeries) ReversedIterator

func (ts *TimeSeries) ReversedIterator(start, end time.Time, from, secondaryIndex string) Iterator

ReversedIterator yields a new iterator that will loop through all time series Items within the specified range - loading from C* on-demand - in reverse order

func (*TimeSeries) RowKeyAndColumnName

func (ts *TimeSeries) RowKeyAndColumnName(item interface{}) ([]byte, []byte)

func (*TimeSeries) UnboundedIterator

func (ts *TimeSeries) UnboundedIterator(secondaryIndex string) (Iterator, error)

UnboundedIterator yields a new iterator that will loop through every item within a time series

type Ttler

type Ttler func(i interface{}) int32

Ttler turns each item into a TTL, for ageing out data stored within a timeseries

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL