Documentation ¶
Overview ¶
Package quantile computes approximate quantiles over an unbounded data stream within low memory and CPU bounds.
A small amount of accuracy is traded to achieve the above properties.
Multiple streams can be merged before calling Query to generate a single set of results. This is meaningful when the streams represent the same type of data. See Merge and Samples.
For more detailed information about the algorithm used, see:
Effective Computation of Biased Quantiles over Data Streams ¶
http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf
Example (MergeMultipleStreams) ¶
package main import ( "fmt" "github.com/bmizerany/perks/quantile" ) func main() { // Scenario: // We have multiple database shards. On each shard, there is a process // collecting query response times from the database logs and inserting // them into a Stream (created via NewTargeted(0.90)), much like the // Simple example. These processes expose a network interface for us to // ask them to serialize and send us the results of their // Stream.Samples so we may Merge and Query them. // // NOTES: // * These sample sets are small, allowing us to get them // across the network much faster than sending the entire list of data // points. // // * For this to work correctly, we must supply the same quantiles // a priori the process collecting the samples supplied to NewTargeted, // even if we do not plan to query them all here. ch := make(chan quantile.Samples) getDBQuerySamples(ch) q := quantile.NewTargeted(0.90) for samples := range ch { q.Merge(samples) } fmt.Println("perc90:", q.Query(0.90)) } // This is a stub for the above example. In reality this would hit the remote // servers via http or something like it. func getDBQuerySamples(ch chan quantile.Samples) {}
Output:
Example (Simple) ¶
package main import ( "bufio" "fmt" "github.com/bmizerany/perks/quantile" "log" "os" "strconv" ) func main() { ch := make(chan float64) go sendFloats(ch) // Compute the 50th, 90th, and 99th percentile. q := quantile.NewTargeted(0.50, 0.90, 0.99) for v := range ch { q.Insert(v) } fmt.Println("perc50:", q.Query(0.50)) fmt.Println("perc90:", q.Query(0.90)) fmt.Println("perc99:", q.Query(0.99)) fmt.Println("count:", q.Count()) } func sendFloats(ch chan<- float64) { f, err := os.Open("exampledata.txt") if err != nil { log.Fatal(err) } sc := bufio.NewScanner(f) for sc.Scan() { b := sc.Bytes() v, err := strconv.ParseFloat(string(b), 64) if err != nil { log.Fatal(err) } ch <- v } if sc.Err() != nil { log.Fatal(sc.Err()) } close(ch) }
Output: perc50: 5 perc90: 14 perc99: 40 count: 2388
Example (Window) ¶
package main import ( "github.com/bmizerany/perks/quantile" "time" ) func main() { // Scenario: We want the 90th, 95th, and 99th percentiles for each // minute. ch := make(chan float64) go sendStreamValues(ch) tick := time.NewTicker(1 * time.Minute) q := quantile.NewTargeted(0.90, 0.95, 0.99) for { select { case t := <-tick.C: flushToDB(t, q.Samples()) q.Reset() case v := <-ch: q.Insert(v) } } } func sendStreamValues(ch chan float64) { } func flushToDB(t time.Time, samples quantile.Samples) { }
Output:
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Sample ¶
type Sample struct { Value float64 `json:",string"` Width float64 `json:",string"` Delta float64 `json:",string"` }
Sample holds an observed value and meta information for compression. JSON tags have been added for convenience.
type Samples ¶
type Samples []Sample
Samples represents a slice of samples. It implements sort.Interface.
type Stream ¶
type Stream struct {
// contains filtered or unexported fields
}
Stream computes quantiles for a stream of float64s. It is not thread-safe by design. Take care when using across multiple goroutines.
func NewBiased ¶
func NewBiased() *Stream
NewBiased returns an initialized Stream for high-biased quantiles (e.g. 50th, 90th, 99th) not known a priori with finer error guarantees for the higher ranks of the data distribution. See http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf for time, space, and error properties.
func NewTargeted ¶
NewTargeted returns an initialized Stream concerned with a particular set of quantile values that are supplied a priori. Knowing these a priori reduces space and computation time. See http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf for time, space, and error properties.
func (*Stream) Count ¶
Count returns the total number of samples observed in the stream since initialization.
func (*Stream) Merge ¶
Merge merges samples into the underlying streams samples. This is handy when merging multiple streams from separate threads, database shards, etc.
func (*Stream) Query ¶
Query returns the computed qth percentiles value. If s was created with NewTargeted, and q is not in the set of quantiles provided a priori, Query will return an unspecified result.
func (*Stream) Reset ¶
func (s *Stream) Reset()
Reset reinitializes and clears the list reusing the samples buffer memory.
func (Stream) SetEpsilon ¶
func (s Stream) SetEpsilon(epsilon float64)
SetEpsilon sets the error epsilon for the Stream. The default epsilon is 0.01 and is usually satisfactory. If needed, this must be called before all Inserts. To learn more, see: http://dimacs.rutgers.edu/~graham/pubs/papers/bquant-icde.pdf