quantile

package
v0.0.0-...-03f9df7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 7, 2023 License: BSD-2-Clause Imports: 2 Imported by: 105

Documentation

Overview

Package quantile computes approximate quantiles over an unbounded data stream within low memory and CPU bounds.

A small amount of accuracy is traded to achieve the above properties.

Multiple streams can be merged before calling Query to generate a single set of results. This is meaningful when the streams represent the same type of data. See Merge and Samples.

For more detailed information about the algorithm used, see:

Effective Computation of Biased Quantiles over Data Streams

http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf

Example (MergeMultipleStreams)
package main

import (
	"fmt"
	"github.com/bmizerany/perks/quantile"
)

func main() {
	// Scenario:
	// We have multiple database shards. On each shard, there is a process
	// collecting query response times from the database logs and inserting
	// them into a Stream (created via NewTargeted(0.90)), much like the
	// Simple example. These processes expose a network interface for us to
	// ask them to serialize and send us the results of their
	// Stream.Samples so we may Merge and Query them.
	//
	// NOTES:
	// * These sample sets are small, allowing us to get them
	// across the network much faster than sending the entire list of data
	// points.
	//
	// * For this to work correctly, we must supply the same quantiles
	// a priori the process collecting the samples supplied to NewTargeted,
	// even if we do not plan to query them all here.
	ch := make(chan quantile.Samples)
	getDBQuerySamples(ch)
	q := quantile.NewTargeted(0.90)
	for samples := range ch {
		q.Merge(samples)
	}
	fmt.Println("perc90:", q.Query(0.90))
}

// This is a stub for the above example. In reality this would hit the remote
// servers via http or something like it.
func getDBQuerySamples(ch chan quantile.Samples) {}
Output:

Example (Simple)
package main

import (
	"bufio"
	"fmt"
	"github.com/bmizerany/perks/quantile"
	"log"
	"os"
	"strconv"
)

func main() {
	ch := make(chan float64)
	go sendFloats(ch)

	// Compute the 50th, 90th, and 99th percentile.
	q := quantile.NewTargeted(0.50, 0.90, 0.99)
	for v := range ch {
		q.Insert(v)
	}

	fmt.Println("perc50:", q.Query(0.50))
	fmt.Println("perc90:", q.Query(0.90))
	fmt.Println("perc99:", q.Query(0.99))
	fmt.Println("count:", q.Count())
}

func sendFloats(ch chan<- float64) {
	f, err := os.Open("exampledata.txt")
	if err != nil {
		log.Fatal(err)
	}
	sc := bufio.NewScanner(f)
	for sc.Scan() {
		b := sc.Bytes()
		v, err := strconv.ParseFloat(string(b), 64)
		if err != nil {
			log.Fatal(err)
		}
		ch <- v
	}
	if sc.Err() != nil {
		log.Fatal(sc.Err())
	}
	close(ch)
}
Output:

perc50: 5
perc90: 14
perc99: 40
count: 2388
Example (Window)
package main

import (
	"github.com/bmizerany/perks/quantile"
	"time"
)

func main() {
	// Scenario: We want the 90th, 95th, and 99th percentiles for each
	// minute.

	ch := make(chan float64)
	go sendStreamValues(ch)

	tick := time.NewTicker(1 * time.Minute)
	q := quantile.NewTargeted(0.90, 0.95, 0.99)
	for {
		select {
		case t := <-tick.C:
			flushToDB(t, q.Samples())
			q.Reset()
		case v := <-ch:
			q.Insert(v)
		}
	}
}

func sendStreamValues(ch chan float64) {

}

func flushToDB(t time.Time, samples quantile.Samples) {

}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Sample

type Sample struct {
	Value float64 `json:",string"`
	Width float64 `json:",string"`
	Delta float64 `json:",string"`
}

Sample holds an observed value and meta information for compression. JSON tags have been added for convenience.

type Samples

type Samples []Sample

Samples represents a slice of samples. It implements sort.Interface.

func (Samples) Len

func (a Samples) Len() int

func (Samples) Less

func (a Samples) Less(i, j int) bool

func (Samples) Swap

func (a Samples) Swap(i, j int)

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

Stream computes quantiles for a stream of float64s. It is not thread-safe by design. Take care when using across multiple goroutines.

func NewBiased

func NewBiased() *Stream

NewBiased returns an initialized Stream for high-biased quantiles (e.g. 50th, 90th, 99th) not known a priori with finer error guarantees for the higher ranks of the data distribution. See http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf for time, space, and error properties.

func NewTargeted

func NewTargeted(quantiles ...float64) *Stream

NewTargeted returns an initialized Stream concerned with a particular set of quantile values that are supplied a priori. Knowing these a priori reduces space and computation time. See http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf for time, space, and error properties.

func (*Stream) Count

func (s *Stream) Count() int

Count returns the total number of samples observed in the stream since initialization.

func (*Stream) Insert

func (s *Stream) Insert(v float64)

Insert inserts v into the stream.

func (*Stream) Merge

func (s *Stream) Merge(samples Samples)

Merge merges samples into the underlying streams samples. This is handy when merging multiple streams from separate threads, database shards, etc.

func (*Stream) Query

func (s *Stream) Query(q float64) float64

Query returns the computed qth percentiles value. If s was created with NewTargeted, and q is not in the set of quantiles provided a priori, Query will return an unspecified result.

func (*Stream) Reset

func (s *Stream) Reset()

Reset reinitializes and clears the list reusing the samples buffer memory.

func (*Stream) Samples

func (s *Stream) Samples() Samples

Samples returns stream samples held by s.

func (Stream) SetEpsilon

func (s Stream) SetEpsilon(epsilon float64)

SetEpsilon sets the error epsilon for the Stream. The default epsilon is 0.01 and is usually satisfactory. If needed, this must be called before all Inserts. To learn more, see: http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL