muse

package module
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2021 License: MIT Imports: 12 Imported by: 0

README

Build Status codecov Go Report Card GoDoc License

go-muse

Golang library for comparing one time series with a group of other labeled time series. This library supports arbitrary slicing and dicing of the labeled time series for a more iterative approach to finding visibly similar timeseries. Comparison between two timeseries is done by z-normalizing both series and cross correlating the two using Fast Fourier Transforms (FFT).

Motivation

A common problem in the operations world is finding all the graphs that look like a particular alert or incident. For example a Site Reliability Engineer (SRE) receives an alert which indicates that something is broken. The SRE generally will open up the graph that triggered the alert which is likely one graph of many in a dashboard. Next, the SRE begins scrolling through this dashboard looking for anything that resembles the waveform of the received alert. Once the SRE has filtered down the set of graphs that looks the original alert graph, he/she begins building a story as to why the alert fired and root causing the incident. This whole process can be time consuming depending on the size and complexity of the dashboards. This library aims to provide a first pass filtering of the existing graphs or time series, so that an engineer can focus just on what looks similar.

This library will filter results down to anything that is positively or negatively correlated with the input reference series. You can limit the number of results returned and also specify a score between 0 to 1 with 1 being perfectly correlated. You can also filter down to a number of samples before and after the input reference series to filter our strong matches that are outside your window of interest.

Contents

Installation

$ go get -u github.com/aouyang1/go-muse
$ cd $GOPATH/src/github.com/aouyang1/go-muse
$ make all

Quick start

package main

import (
	"fmt"

	"github.com/aouyang1/go-muse"
	"github.com/matrix-profile-foundation/go-matrixprofile/siggen"
)

func main() {
	sampleRate := 1.0 // once per minute
	duration := 480.0 // minutes

	// create a reference rectangular time series with an amplitude of 1.5 centered
	// at 240 minutes and a width of 10 minutes
	ref := NewSeries(
		siggen.Add(
			siggen.Rect(1.5, 240, 10, sampleRate, duration),
			siggen.Noise(0.1, int(sampleRate*duration)),
		), NewLabels(LabelMap{"graph": "CallTime99Pct", "host": "host1"}),
	)

	// create a comparison group of time series that the reference will query against
	comp := NewGroup("comparison")
	comp.Add(
		ref,
		NewSeries(
			siggen.Add(
				siggen.Rect(1.5, 242, 7, sampleRate, duration),
				siggen.Noise(0.1, int(sampleRate*duration)),
			), NewLabels(LabelMap{"graph": "CallTime99Pct", "host": "host2"}),
		),
		NewSeries(
			siggen.Add(
				siggen.Rect(43, 240, 10, sampleRate, duration),
				siggen.Noise(0.1, int(sampleRate*duration)),
			), NewLabels(LabelMap{"graph": "ErrorRate", "host": "host1"}),
		),
		NewSeries(
			siggen.Add(
				siggen.Line(0, 0.1, int(sampleRate*duration)),
				siggen.Noise(0.1, int(sampleRate*duration)),
			), NewLabels(LabelMap{"graph": "ErrorRate", "host": "host2"}),
		),
	)

	maxLag := 15.0   // minutes
	topN := 4        // top 4 grouped series
	threshold := 0.5 // correlation threshold
	m := NewBatch(ref, comp, NewResults(int(maxLag/sampleRate), topN, threshold))

	// Rank each individual time series in the comparison group
	m.Run(nil)
	fmt.Println(m.Results.Fetch())

	// Rank time series grouped by the graph label
	m.Run([]string{"graph"})
	fmt.Println(m.Results.Fetch())

	// Rank time series grouped by the host label
	m.Run([]string{"host"})
	fmt.Println(m.Results.Fetch())
}

Benchmarks

Benchmark name NumReps Time/Rep Memory/Rep Alloc/Rep
BenchmarkMuseRun-4 214075 5019 ns/op 1680 B/op 20 allocs/op
BenchmarkMuseRunLarge-4 9 128044546 ns/op 2124970 B/op 405 allocs/op
BenchmarkFilterByLabelValues-4 1736983 705 ns/op 496 B/op 8 allocs/op
BenchmarkIndexLabelValues-4 354844 3198 ns/op 2152 B/op 38 allocs/op
BenchmarkZPad-4 1035133 1180 ns/op 4096 B/op 1 allocs/op
BenchmarkZNormalize-4 789307 1287 ns/op 0 B/op 0 allocs/op
BenchmarkXCorr-4 148 8228987 ns/op 2115418 B/op 7 allocs/op
BenchmarkXCorrWithX-4 226 4910405 ns/op 0 B/op 0 allocs/op

Ran on a 2018 MacBookAir on Apr 19, 2020

    Processor: 1.6 GHz Intel Core i5
       Memory: 8GB 2133 MHz LPDDR3
           OS: macOS Mojave v10.14.2
 Logical CPUs: 4
Physical CPUs: 2
$ make bench

Contributing

  • Fork the repository
  • Create a new branch (feature_* or bug_*)for the new feature or bug fix
  • Run tests
  • Commit your changes
  • Push code and open a new pull request

Testing

Run all tests including benchmarks

$ make all

Just run benchmarks

$ make bench

Just run tests

$ make test

Contact

License

The MIT License (MIT). See LICENSE for more details.

Copyright (c) 2018 Austin Ouyang

Documentation

Overview

Example
sampleRate := 1.0 // once per minute
duration := 480.0 // minutes

// create a reference rectangular time series with an amplitude of 1.5 centered
// at 240 minutes and a width of 10 minutes
ref := NewSeries(
	siggen.Add(
		siggen.Rect(1.5, 240, 10, sampleRate, duration),
		siggen.Noise(0.1, int(sampleRate*duration)),
	), NewLabels(LabelMap{"graph": "CallTime99Pct", "host": "host1"}),
)

// create a comparison group of time series that the reference will query against
comp := NewGroup("comparison")
comp.Add(
	ref,
	NewSeries(
		siggen.Add(
			siggen.Rect(1.5, 242, 7, sampleRate, duration),
			siggen.Noise(0.1, int(sampleRate*duration)),
		), NewLabels(LabelMap{"graph": "CallTime99Pct", "host": "host2"}),
	),
	NewSeries(
		siggen.Add(
			siggen.Rect(43, 240, 10, sampleRate, duration),
			siggen.Noise(0.1, int(sampleRate*duration)),
		), NewLabels(LabelMap{"graph": "ErrorRate", "host": "host1"}),
	),
	NewSeries(
		siggen.Add(
			siggen.Line(0, 0.1, int(sampleRate*duration)),
			siggen.Noise(0.1, int(sampleRate*duration)),
		), NewLabels(LabelMap{"graph": "ErrorRate", "host": "host2"}),
	),
	NewSeries(
		siggen.Line(0, 0.1, int(sampleRate*duration)),
		NewLabels(LabelMap{"graph": "ErrorRate", "host": "host3"}),
	),
)

maxLag := 15.0   // minutes
topN := 4        // top 4 grouped series
threshold := 0.0 // correlation threshold
m, err := NewBatch(ref, comp, NewResults(int(maxLag/sampleRate), topN, threshold, SignFilter_ANY), 2)
if err != nil {
	panic(err)
}

// Rank each individual time series in the comparison group
m.Run(nil)
res, _ := m.Results.Fetch()
fmt.Println("Unique")
for _, s := range res {
	fmt.Printf("%s, Lag: %d, Score: %.3f\n", s.Labels.ID(s.Labels.Keys()), s.Lag, s.PercentScore)
}

// Rank time series grouped by the graph label
m.Run([]string{"graph"})
res, _ = m.Results.Fetch()
fmt.Println("By Graph")
for _, s := range res {
	fmt.Printf("%s, Lag: %d, Score: %.3f\n", s.Labels.ID(s.Labels.Keys()), s.Lag, s.PercentScore)
}

// Rank time series grouped by the host label
m.Run([]string{"host"})
res, _ = m.Results.Fetch()
fmt.Println("By Host")
for _, s := range res {
	fmt.Printf("%s, Lag: %d, Score: %.3f\n", s.Labels.ID(s.Labels.Keys()), s.Lag, s.PercentScore)
}
Output:

Unique
graph:CallTime99Pct,host:host1, Lag: 0, Score: 1.000
graph:ErrorRate,host:host1, Lag: 0, Score: 0.991
graph:CallTime99Pct,host:host2, Lag: -3, Score: 0.822
graph:ErrorRate,host:host3, Lag: 0, Score: 0.000
By Graph
graph:CallTime99Pct,host:host1, Lag: 0, Score: 1.000
graph:ErrorRate,host:host1, Lag: 0, Score: 0.991
By Host
graph:CallTime99Pct,host:host1, Lag: 0, Score: 1.000
graph:CallTime99Pct,host:host2, Lag: -3, Score: 0.822
graph:ErrorRate,host:host3, Lag: 0, Score: 0.000

Index

Examples

Constants

View Source
const (
	SignFilter_POS = 1
	SignFilter_NEG = -1
	SignFilter_ANY = 0
)
View Source
const (
	// DefaultLabel is the label name if a series is specified without any labels
	DefaultLabel = "uid"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Batch added in v0.4.0

type Batch struct {
	Comparison  *Group
	Results     *Results
	Concurrency int
	// contains filtered or unexported fields
}

Batch is used to setup and run a z-normalized cross correlation between a reference series against each individual comparison series while tracking the resulting scores

func NewBatch added in v0.4.0

func NewBatch(ref *Series, comp *Group, results *Results, cc int) (*Batch, error)

NewBatch creates a new Muse instance with a set reference timeseries, a comparison group of timeseries, and results

func (*Batch) Run added in v0.4.0

func (b *Batch) Run(groupByLabels []string) error

Run calculates the top N graphs with the highest scores given a reference time series and a group of comparison time series. Number of scores will be the number of unique labels specified in the input. If no groupByLabels is specified, then each timeseries will receive its own score.

type Group

type Group struct {
	Name string
	// contains filtered or unexported fields
}

Group is a collection of timeseries keeping track of all labeled timeseries, All timeseries must be unique regarding their label value pairs

func NewGroup

func NewGroup(name string) *Group

NewGroup creates a new Group and initializes the timeseries label registry

func (*Group) Add

func (g *Group) Add(series ...*Series) error

Add will register a time series with its labels into the current groups registry. If the timeseries with the exact same label values already exists, an error will be returned

func (*Group) FilterByLabelValues

func (g *Group) FilterByLabelValues(labels *Labels) []*Series

FilterByLabelValues returns the slice of timeseries filtered by specified label value pairs

func (*Group) Length

func (g *Group) Length() int

Length returns the length of all timeseries. All timeseries have the same length

type LabelMap

type LabelMap map[string]string

LabelMap is a map of label keys to values

type Labels

type Labels struct {
	// contains filtered or unexported fields
}

Labels is a map of label names to label values

func NewLabels

func NewLabels(labels LabelMap) *Labels

NewLabels creates a Label storing the map and sorted keys

func (*Labels) Get

func (l *Labels) Get(key string) (string, bool)

Get returns the value of a specified key. Returns an empty string if the key is not present

func (Labels) ID

func (l Labels) ID(labels []string) string

ID constructs the unique identifier based on an input set of labels. This does not have to be all the unique label names. Format will have the following "key1:val1,key2:val2" and so on

func (*Labels) Keys

func (l *Labels) Keys() []string

Keys returns the sorted keys of the labels

func (*Labels) Len

func (l *Labels) Len() int

Len returns the number of labels

type Muse

type Muse struct {
	Results *Results
	// contains filtered or unexported fields
}

Muse is the primary struct to setup and run a z-normalized cross correlation between a reference series against an individual comparison series while tracking the resulting scores

func New

func New(ref *Series, results *Results) (*Muse, error)

New creates a new Muse instance with a set reference timeseries, and a comparison timeseries, and results

func (*Muse) Run

func (m *Muse) Run(compGraphs []*Series) error

Run compares a single comparison series against the reference series and updates the score results

type Results

type Results struct {
	sync.Mutex
	MaxLag     int
	TopN       int
	Threshold  float64
	SignFilter SignFilter
	// contains filtered or unexported fields
}

Results tracks the top scores in sorted order given a specified maximum lag, top N and score threshold

func NewResults

func NewResults(maxLag int, topN int, threshold float64, signFilter SignFilter) *Results

NewResults creates a new instance of results to track the top similar graphs

func (*Results) Fetch

func (r *Results) Fetch() (Scores, float64)

Fetch returns the sorted scores in ascending order along with the average absolute percent score

func (*Results) Update added in v0.4.0

func (r *Results) Update(s Score)

Update records the input score

type Score

type Score struct {
	Labels       *Labels `json:"labels"`
	Lag          int     `json:"lag"`
	PercentScore float64 `json:"percentScore"`
}

Score keeps track of the cross correlation score and the related series

type Scores

type Scores []Score

Scores is a slice of individual Score

func (Scores) Len

func (s Scores) Len() int

func (Scores) Less

func (s Scores) Less(i, j int) bool

func (*Scores) Pop

func (s *Scores) Pop() interface{}

Pop implements the function in the heap interface

func (*Scores) Push

func (s *Scores) Push(x interface{})

Push implements the function in the heap interface

func (Scores) Swap

func (s Scores) Swap(i, j int)

type Series

type Series struct {
	// contains filtered or unexported fields
}

Series is the general representation of timeseries containing only values.

func NewSeries

func NewSeries(y []float64, labels *Labels) *Series

NewSeries creates a new Series with a set of labels. If not labels are specified a unique ID is automatically generated

func (*Series) Labels

func (s *Series) Labels() *Labels

Labels returns the map of label to values for the timeseries

func (*Series) Length

func (s *Series) Length() int

Length returns the length of the timeseries

func (*Series) UID

func (s *Series) UID() string

UID generates the unique identifier string that represents this particular timeseries. This must be unique within a timeseries Group

func (*Series) Values

func (s *Series) Values() []float64

Values returns the y or series values

type SignFilter added in v0.6.0

type SignFilter int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL