quantiles

package module
v0.0.0-...-dc62c7f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2023 License: MIT Imports: 3 Imported by: 1

README

quantiles - Optimal Quantile Approximation in Streams

GoDoc CircleCI

This is a translation of TensorFlow's quantile helper class, it aims to compute approximate quantiles with error bound guarantees for weighted data sets. This implementation is an adaptation of techniques from the following papers:

The key ideas at play are the following:
  • Maintain an in-memory multi-level quantile summary in a way to guarantee a maximum approximation error of eps * W per bucket where W is the total weight across all points in the input dataset.
  • Two base operations are defined: MERGE and COMPRESS. MERGE combines two summaries guaranteeing a epsNew = max(eps1, eps2). COMPRESS compresses a summary to b + 1 elements guaranteeing epsNew = epsOld + 1/b.
  • b * sizeof(summary entry) must ideally be small enough to fit in an average CPU L2 cache.
  • To distribute this algorithm with maintaining error bounds, we need the worker-computed summaries to have no more than eps / h error where h is the height of the distributed computation graph which is 2 for an MR with no combiner.

We mainly want to max out IO bw by ensuring we're not compute-bound and using a reasonable amount of RAM.

Complexity:
  • Compute: O(n * log(1/eps * log(eps * n))).
  • Memory: O(1/eps * log^2(eps * n)) <- for one worker streaming through the entire dataset.

An epsilon value of zero would make the algorithm extremely inefficent and therefore, is disallowed.

Example Usage

package quantiles_test

import (
	"fmt"

	"github.com/axiomhq/quantiles"
)

func Example() {
	sketch := quantiles.NewDefault()
	for i := 0.0; i < 1e6; i++ {
		if err := sketch.Push(i, 1.0); err != nil {
			panic(err)
		}
	}
	fmt.Print("ApproximationError:") 	
	fmt.Println(sketch.ApproximationError(1))  // 0 <nil>

	fmt.Print("Finalize:") 
	fmt.Println(sketch.Finalize())            // <nil>

 
	fmt.Print("GenerateQuantiles(4):")         
	fmt.Println(sketch.GenerateQuantiles(4))  // [0 251865 503730 746595 999999] <nil>


	fmt.Print("GenerateQuantiles(10):")
	fmt.Println(sketch.GenerateQuantiles(10)) // [0 98946 197892 296838 395789 503730 602676 701622 800568 899514 999999] <nil>

	sum, err := sketch.FinalSummary()
	if err != nil {
		panic(err)
	}
	fmt.Print("GenerateQuantiles(4):")
	fmt.Println(sum.GenerateQuantiles(4))     // [0 251865 503730 746595 999999]
}

TODO

  • Implement an online estimator without the need of finalizing the stream
  • Add proper documentation
  • Benchmark
  • Add serialization

Documentation

Overview

Example
package main

import (
	"fmt"

	"github.com/axiomhq/quantiles"
)

func main() {
	sketch := quantiles.NewDefault()
	for i := 0.0; i < 1e6; i++ {
		if err := sketch.Push(i, 1.0); err != nil {
			panic(err)
		}
	}
	fmt.Print("ApproximationError:")
	fmt.Println(sketch.ApproximationError(1))

	fmt.Print("Finalize:")
	fmt.Println(sketch.Finalize())

	fmt.Print("GenerateQuantiles(4):")
	fmt.Println(sketch.GenerateQuantiles(4))

	fmt.Print("GenerateQuantiles(10):")
	fmt.Println(sketch.GenerateQuantiles(10))

	sum, err := sketch.FinalSummary()
	if err != nil {
		panic(err)
	}
	fmt.Print("GenerateQuantiles(4):")
	fmt.Println(sum.GenerateQuantiles(4))

}
Output:

ApproximationError:0.006218905472636816 <nil>
Finalize:<nil>
GenerateQuantiles(4):[0 249854 499710 749566 999999] <nil>
GenerateQuantiles(10):[0 98302 200702 299006 401406 499710 598014 700414 798718 900094 999999] <nil>
GenerateQuantiles(4):[0 249854 499710 749566 999999]

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Sketch

type Sketch struct {
	// contains filtered or unexported fields
}

Sketch ...

func New

func New(eps float64, maxElements int64) (*Sketch, error)

New returns a new Sketch for a given eps and maxElements

func NewDefault

func NewDefault() *Sketch

NewDefault returns a new Sketch with the eps = 0.01 and maxElements 1000

func (*Sketch) ApproximationError

func (stream *Sketch) ApproximationError(level int64) (float64, error)

ApproximationError calculates approximation error for the specified level. If the passed level is negative, the approximation error for the entire summary is returned. Note that after Finalize is called, only the overall error is available.

func (*Sketch) FinalSummary

func (stream *Sketch) FinalSummary() (*Summary, error)

FinalSummary ...

func (*Sketch) Finalize

func (stream *Sketch) Finalize() error

Finalize flushes approximator and finalizes state.

func (*Sketch) GenerateBoundaries

func (stream *Sketch) GenerateBoundaries(numBoundaries int64) ([]float64, error)

GenerateBoundaries generates requested number of boundaries after finalizing stream. The returned boundaries can be queried using std::lower_bound to get the bucket for a given value. The boundaries, while still guaranteeing approximation bounds, don't necessarily represent the actual quantiles of the distribution. Boundaries are preferable over quantiles when the caller is less interested in the actual quantiles distribution and more interested in getting a representative sample of boundary values.

func (*Sketch) GenerateQuantiles

func (stream *Sketch) GenerateQuantiles(numQuantiles int64) ([]float64, error)

GenerateQuantiles generates requested number of quantiles after finalizing stream. The returned quantiles can be queried using std::lower_bound to get the bucket for a given value.

func (*Sketch) MaxDepth

func (stream *Sketch) MaxDepth() int

MaxDepth ...

func (*Sketch) Push

func (stream *Sketch) Push(value float64, weight float64) error

Push a value and a weight into the stream

func (*Sketch) PushSummary

func (stream *Sketch) PushSummary(summary []SumEntry) error

PushSummary pushes full summary while maintaining approximation error invariants.

func (*Sketch) Quantile

func (stream *Sketch) Quantile(q float64) (float64, error)

Quantile ...

type SumEntry

type SumEntry struct {
	// contains filtered or unexported fields
}

SumEntry represents a summary entry

func (SumEntry) MaxRank

func (se SumEntry) MaxRank() float64

MaxRank returns the entries maximum rank

func (SumEntry) MinRank

func (se SumEntry) MinRank() float64

MinRank returns the entries minimum rank

func (SumEntry) Value

func (se SumEntry) Value() float64

Value returns the entries value

func (SumEntry) Weight

func (se SumEntry) Weight() float64

Weight returns the entries weight

type Summary

type Summary struct {
	// contains filtered or unexported fields
}

Summary is a summarizes the stream entries

func (*Summary) ApproximationError

func (sum *Summary) ApproximationError() float64

ApproximationError ...

func (*Summary) Clear

func (sum *Summary) Clear()

Clear reset the summary

func (*Summary) Entries

func (sum *Summary) Entries() []SumEntry

Entries returns all summary entries

func (*Summary) GenerateBoundaries

func (sum *Summary) GenerateBoundaries(numBoundaries int64) []float64

GenerateBoundaries ...

func (*Summary) GenerateQuantiles

func (sum *Summary) GenerateQuantiles(numQuantiles int64) []float64

GenerateQuantiles returns a slice of float64 of size numQuantiles+1, the ith entry is the `i * 1/numQuantiles+1` quantile

func (*Summary) MaxValue

func (sum *Summary) MaxValue() float64

MaxValue returns the max weight value of the summary

func (*Summary) Merge

func (sum *Summary) Merge(other *Summary)

Merge another summary into the this summary (great for esimating quantiles over several streams)

func (*Summary) MinValue

func (sum *Summary) MinValue() float64

MinValue returns the min weight value of the summary

func (*Summary) Quantile

func (sum *Summary) Quantile(q float64) (float64, error)

Quantile returns the value for quantile q

func (*Summary) Size

func (sum *Summary) Size() int64

Size returns the size (num of entries) in the summary

func (*Summary) TotalWeight

func (sum *Summary) TotalWeight() float64

TotalWeight returns the total weight of the summary

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL