streamstats

package module

v0.0.0-...-15a9614 Latest Latest Go to latest Published: Feb 10, 2017 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bmkessler/streamstats

Links

Open Source Insights

README ¶

streamstats

Streaming stats data structures and algorithms in golang that are O(1) time and space in the number of elements processed.

Moment-based Statistics

Single variable moments up to fourth order and first-order covariance use the methods of:

"Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments." Philippe P. Pébay, Technical Report SAND2008-6212, Sandia National Laboratories, September 2008.

which extend the results of:

"Note on a method for calculating corrected sums of squares and products". B. P. Welford (1962). Technometrics 4(3):419–420 (popularized by Donald Knuth in "The Art of Computer Programming")

to arbitrary moments and combinations of arbitrary sized populations allowing parallel aggregation. These moments are also extended to two dependent variables with a covariance Sxy

This also an includes exponentially-weighted moving average with damping factor, 0 < lambda < 1, using update formula m = (1-lambda)*m + lambda*x

Order Statistics

Quantiles and Histograms are based on the P2-algorithm:

"The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations." Raj Jain and Imrich Chlamtac, Communications of the ACM Volume 28 Issue 10, October 1985 Pages 1076-1085

Count Distinct

Count distinct is provided by an implementation of the HyperLogLog data structure based on:

"Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm" Philippe Flajolet and Éric Fusy and Olivier Gandouet and et al. in AOFA ’07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON ANALYSIS OF ALGORITHMS

This implementation includes some of the HyperLogLog++ enhancements such as the 64-bit hash function which eliminates the large cardinality correction for hash collisions and an empirical bias correction for small cardinalities The implementation is space in-efficient since bits are used to store the counts which could be at most 60 < 2^6

An additional LinearCounting implementation that is backed by a BitVector is available as well. If the maximum possible cardinality is known, this structure uses only 12.5% of the memory as the HyperLogLog and runs much faster for both Add and Distinct. However, the data structure saturates at the maximum value while HyperLogLog can count to virtually unlimited cardinalities.

Set Membership

Approximate set membership is provided by a BloomFilter implementation based on:

"Space/time trade-offs in hash coding with allowable errors" Burton H. Bloom Communications of the ACM Volume 13 Issue 7, July 1970 Pages 422-426

the size m of the filter is rounded up to the nearest power of two for speed of addition and membership check, which could result in a larger filter depending on the cardinality and false positive target you supply. the k different hash functions are derived from top (h1) and bottom (h2) 32-bits of a 64-bit hash function using h[i] = h1 + i* h2 mod m for i in 0...m-1 based on

"Less hashing, same performance: Building a better Bloom filter" Adam Kirsch, Michael Mitzenmacher Random Structures & Algorithms Volume 33 Issue 2, September 2008 Pages 187-218

Benchmarks

Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz
go version go1.7.3 linux/amd64
BenchmarkBloomFilterAdd-4              	20000000	        75.5 ns/op
BenchmarkBloomFilterCheck-4            	20000000	        70.0 ns/op
BenchmarkEWMAAdd-4                     	200000000	         8.28 ns/op
BenchmarkHyperLogLogP10Add-4           	30000000	        56.3 ns/op
BenchmarkHyperLogLogP10Distinct-4      	 1000000	      2178 ns/op
BenchmarkLinearCountingP10Add-4        	50000000	        36.1 ns/op
BenchmarkLinearCountingP10Distinct-4   	10000000	       175 ns/op
BenchmarkMomentStatsAdd-4              	100000000	        19.7 ns/op
BenchmarkP2Histogram8Add-4             	10000000	       165 ns/op
BenchmarkP2Histogram16Add-4            	 5000000	       349 ns/op
BenchmarkP2Histogram32Add-4            	 2000000	       702 ns/op
BenchmarkP2Histogram64Add-4            	 1000000	      1365 ns/op
BenchmarkP2Histogram128Add-4           	  500000	      2673 ns/op
BenchmarkP2QuantileAdd-4               	20000000	        66.4 ns/op

Documentation ¶

Overview ¶

Package streamstats includes data structures and algorithms that are O(1) time and space in the number of elements processed.

Moment-Based Statistics ¶

Single variable moments up to fourth order and first-order covariance use the methods of:

"Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments." Philippe P. Pébay, Technical Report SAND2008-6212, Sandia National Laboratories, September 2008.

which extend the results of:

"Note on a method for calculating corrected sums of squares and products".

B. P. Welford (1962).
Technometrics 4(3):419–420

(popularized by Donald Knuth in "The Art of Computer Programming")

to arbitrary moments and combinations of arbitrary sized populations allowing parallel aggregation. These moments are also extended to two dependent variables with a covariance Sxy

This also an includes exponentially-weighted moving average with damping factor, 0 < *lambda* < 1, using update formula m = (1-lambda)*m + lambda*x

Order Statistics ¶

Quantiles and Histograms are based on the P2-algorithm:

"The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations." Raj Jain and Imrich Chlamtac, Communications of the ACM Volume 28 Issue 10, October 1985 Pages 1076-1085

Count Distinct ¶

Count distinct is provided by an implementation of the HyperLogLog data structure based on:

"Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm" Philippe Flajolet and Éric Fusy and Olivier Gandouet and et al. in AOFA ’07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON ANALYSIS OF ALGORITHMS

This implementation includes some of the HyperLogLog++ enhancements such as the 64-bit hash function which eliminates the large cardinality correction for hash collisions and an empirical bias correction for small cardinalities The implementation is space in-efficient since bits are used to store the counts which could be at most 60 < 2^6

An additional LinearCounting implementation that is backed by a BitVector is available as well. If the maximum possible cardinality is known, this structure uses only 12.5% of the memory as the HyperLogLog and runs much faster for both `Add` and `Distinct`. However, the data structure saturates at the maximum value while HyperLogLog can count to virtually unlimited cardinalities.

Set Membership ¶

Approximate set membership is provided by a BloomFilter implementation based on:

"Space/time trade-offs in hash coding with allowable errors" Burton H. Bloom Communications of the ACM Volume 13 Issue 7, July 1970 Pages 422-426

the size m of the filter is rounded up to the nearest power of two for speed of addition and membership check, which could result in a larger filter depending on the cardinality and false positive target you supply. the k different hash functions are derived from top (`h1`) and bottom (`h2`) 32-bits of a 64-bit hash function using h[i] = h1 + i* h2 mod m for i in 0...m-1 based on

"Less hashing, same performance: Building a better Bloom filter" Adam Kirsch, Michael Mitzenmacher Random Structures & Algorithms Volume 33 Issue 2, September 2008 Pages 187-218

Index ¶

type BitVector
- func NewBitVector(L uint64) BitVector
- func (b BitVector) Clear(N uint64)
- func (b BitVector) Get(N uint64) uint64
- func (b BitVector) PopCount() uint64
- func (b BitVector) Set(N uint64)
- func (b BitVector) String() string
type BloomFilter
- func NewBloomFilter(Nitems uint64, FalsePositiveRate float64, hash hash.Hash64) *BloomFilter
- func (bf *BloomFilter) Add(item []byte)
- func (bf BloomFilter) Check(item []byte) bool
- func (bf BloomFilter) Distinct() uint64
- func (bf BloomFilter) FalsePositiveRate() float64
- func (bf BloomFilter) Intersect(bfB *BloomFilter) (*BloomFilter, error)
- func (bf BloomFilter) Occupancy() float64
- func (bf BloomFilter) Union(bfB *BloomFilter) (*BloomFilter, error)
type BoxPlot
- func NewBoxPlot() BoxPlot
- func (bp BoxPlot) InterQuartileRange() float64
- func (bp BoxPlot) IsOutlier(x float64) bool
- func (bp BoxPlot) LowerQuartile() float64
- func (bp BoxPlot) LowerWhisker() float64
- func (bp BoxPlot) Median() float64
- func (bp BoxPlot) MidHinge() float64
- func (bp BoxPlot) MidRange() float64
- func (bp BoxPlot) String() string
- func (bp BoxPlot) TriMean() float64
- func (bp BoxPlot) UpperQuartile() float64
- func (bp BoxPlot) UpperWhisker() float64
type CovarStats
- func NewCovarStats() *CovarStats
- func (c *CovarStats) Add(x, y float64)
- func (c *CovarStats) Combine(b *CovarStats) CovarStats
- func (c *CovarStats) Correlation() float64
- func (c *CovarStats) Intercept() float64
- func (c *CovarStats) N() uint64
- func (c *CovarStats) Slope() float64
- func (c *CovarStats) XKurtosis() float64
- func (c *CovarStats) XMean() float64
- func (c *CovarStats) XSkewness() float64
- func (c *CovarStats) XStdDev() float64
- func (c *CovarStats) XVariance() float64
- func (c *CovarStats) YKurtosis() float64
- func (c *CovarStats) YMean() float64
- func (c *CovarStats) YSkewness() float64
- func (c *CovarStats) YStdDev() float64
- func (c *CovarStats) YVariance() float64
type CumulativeDensity
type EWMA
- func NewEWMA(initialValue float64, lambda float64) EWMA
- func (e *EWMA) Add(x float64)
- func (e *EWMA) Mean() float64
type HyperLogLog
- func NewHyperLogLog(p byte, hash hash.Hash64) *HyperLogLog
- func (hll *HyperLogLog) Add(item []byte)
- func (hll *HyperLogLog) BiasCorrected() uint64
- func (hll *HyperLogLog) Compress(factor byte) *HyperLogLog
- func (hll *HyperLogLog) Distinct() uint64
- func (hll *HyperLogLog) ExpectedError() float64
- func (hll *HyperLogLog) Intersect(hllB *HyperLogLog) (*HyperLogLog, error)
- func (hll *HyperLogLog) LinearCounting() uint64
- func (hll *HyperLogLog) RawEstimate() uint64
- func (hll *HyperLogLog) Reset()
- func (hll *HyperLogLog) String() string
- func (hll *HyperLogLog) Union(hllB *HyperLogLog) (*HyperLogLog, error)
type LinearCounting
- func NewLinearCounting(p byte, hash hash.Hash64) *LinearCounting
- func (lc *LinearCounting) Add(item []byte)
- func (lc *LinearCounting) Compress(factor byte) *LinearCounting
- func (lc LinearCounting) Distinct() uint64
- func (lc LinearCounting) ExpectedError() float64
- func (lc *LinearCounting) Intersect(lcB *LinearCounting) (*LinearCounting, error)
- func (lc LinearCounting) Occupancy() float64
- func (lc LinearCounting) String() string
- func (lc *LinearCounting) Union(lcB *LinearCounting) (*LinearCounting, error)
type MomentStats
- func NewMomentStats() *MomentStats
- func (m *MomentStats) Add(x float64)
- func (m *MomentStats) Combine(b *MomentStats) MomentStats
- func (m *MomentStats) Kurtosis() float64
- func (m *MomentStats) Mean() float64
- func (m *MomentStats) N() uint64
- func (m *MomentStats) Skewness() float64
- func (m *MomentStats) StdDev() float64
- func (m *MomentStats) String() string
- func (m *MomentStats) Variance() float64
type P2Histogram
- func NewP2Histogram(b uint64) P2Histogram
- func (h *P2Histogram) Add(x float64)
- func (h *P2Histogram) CDF(x float64) float64
- func (h *P2Histogram) Histogram() []CumulativeDensity
- func (h *P2Histogram) Max() float64
- func (h *P2Histogram) Min() float64
- func (h *P2Histogram) N() uint64
- func (h *P2Histogram) Quantile(p float64) float64
type P2Quantile
- func NewP2Quantile(p float64) P2Quantile
- func (p *P2Quantile) Add(x float64)
- func (p *P2Quantile) LowerQuantile() float64
- func (p *P2Quantile) Max() float64
- func (p *P2Quantile) Min() float64
- func (p *P2Quantile) N() uint64
- func (p *P2Quantile) P() float64
- func (p *P2Quantile) Quantile() float64
- func (p *P2Quantile) UpperQuantile() float64

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type BitVector ¶

type BitVector []uint64

BitVector represents an arbitrary length vector of bits backed by 64-bit words it is used as the data structure backing the Bloom Filter and Linear Counting implementations

func NewBitVector ¶

func NewBitVector(L uint64) BitVector

NewBitVector returns a new BitVector of length L

func (BitVector) Clear ¶

func (b BitVector) Clear(N uint64)

Clear clears the bit at position N for N >= L this will access memory out of bounds of the backing array and panic

func (BitVector) Get ¶

func (b BitVector) Get(N uint64) uint64

Get returns the bit at position N as a uint64 for N >= L this will access memory out of bounds of the backing array and panic

func (BitVector) PopCount ¶

func (b BitVector) PopCount() uint64

PopCount returns the nubmer of set bits in the bit vector the algorithm for PopCount on a single 64-bit word is from 1957 due to Donald B. Gillies and Jeffrey C. P. Miller and referenced by Donald Knuth

func (BitVector) Set ¶

func (b BitVector) Set(N uint64)

Set sets the bit at position N for N >= L this will access memory out of bounds of the backing array and panic

func (BitVector) String ¶

func (b BitVector) String() string

String outputs a string representation of the binary string with the first bit at the left note that any padding zeros are present on the right hand side

type BloomFilter ¶

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter is a datastructure for approximate set membership with no false negatives and limited false positives

func NewBloomFilter ¶

func NewBloomFilter(Nitems uint64, FalsePositiveRate float64, hash hash.Hash64) *BloomFilter

NewBloomFilter returns a pointer to a new BloomFilter that has been sized in m with k hash functions to target the given false positive rate at the given number of items using the given hash function

func (*BloomFilter) Add ¶

func (bf *BloomFilter) Add(item []byte)

Add puts an item in the set represented by the BloomFilter

func (BloomFilter) Check ¶

func (bf BloomFilter) Check(item []byte) bool

Check returns false if an item in is definitely not in the set represented by the BloomFilter

func (BloomFilter) Distinct ¶

func (bf BloomFilter) Distinct() uint64

Distinct estimates the number of elements in the filter by using the LinearCounting estimate accounting for the k hash functions calculated for each element

func (BloomFilter) FalsePositiveRate ¶

func (bf BloomFilter) FalsePositiveRate() float64

FalsePositiveRate returns the expected false positive rate based on the ratio of filled buckets in the BloomFilter

func (BloomFilter) Intersect ¶

func (bf BloomFilter) Intersect(bfB *BloomFilter) (*BloomFilter, error)

Intersect combines two BloomFilters producing one that contains only of the elements in both BloomFilters the BloomFilters must be the same size m and k as well as use the same hash function

func (BloomFilter) Occupancy ¶

func (bf BloomFilter) Occupancy() float64

Occupancy returns the ratio of filled buckets in the BloomFilter

func (BloomFilter) Union ¶

func (bf BloomFilter) Union(bfB *BloomFilter) (*BloomFilter, error)

Union combines two BloomFilters producing one that contains all of the elements in either BloomFilter the BloomFilters must be the same size m and k as well as use the same hash function

type BoxPlot ¶

type BoxPlot struct {
	P2Quantile
}

BoxPlot represents a BoxPlot with interquartile range and whiskers backed by a P2-Quantile tracking the median, P=0.5

func NewBoxPlot ¶

func NewBoxPlot() BoxPlot

NewBoxPlot returns a new BoxPlot

func (BoxPlot) InterQuartileRange ¶

func (bp BoxPlot) InterQuartileRange() float64

InterQuartileRange returns the estimated interquartile range

func (BoxPlot) IsOutlier ¶

func (bp BoxPlot) IsOutlier(x float64) bool

IsOutlier returns true if the data is outside the whiskers

func (BoxPlot) LowerQuartile ¶

func (bp BoxPlot) LowerQuartile() float64

LowerQuartile returns the estimated lower quartile

func (BoxPlot) LowerWhisker ¶

func (bp BoxPlot) LowerWhisker() float64

LowerWhisker returns the estimated lower whisker, Q1 - 1.5 * IQR

func (BoxPlot) Median ¶

func (bp BoxPlot) Median() float64

Median returns the estimated median

func (BoxPlot) MidHinge ¶

func (bp BoxPlot) MidHinge() float64

MidHinge returns the MidHinge of the data, average of upper and lower quantiles

func (BoxPlot) MidRange ¶

func (bp BoxPlot) MidRange() float64

MidRange returns the MidRange of the data, average of max and min

func (BoxPlot) String ¶

func (bp BoxPlot) String() string

func (BoxPlot) TriMean ¶

func (bp BoxPlot) TriMean() float64

TriMean returns the TriMean of the data, average of Median and MidHinge

func (BoxPlot) UpperQuartile ¶

func (bp BoxPlot) UpperQuartile() float64

UpperQuartile returns the estimated upper quartile

func (BoxPlot) UpperWhisker ¶

func (bp BoxPlot) UpperWhisker() float64

UpperWhisker returns the estimated upper whisker, Q3 + 1.5 * IQR

type CovarStats ¶

type CovarStats struct {
	// contains filtered or unexported fields
}

CovarStats is a data structure for computing stats on two related variables x,y from a stream

func NewCovarStats ¶

func NewCovarStats() *CovarStats

NewCovarStats returns an empty CovarStats structure with no values

func (*CovarStats) Add ¶

func (c *CovarStats) Add(x, y float64)

Add adds a sample of the two variables to the CovarStats data structure

func (*CovarStats) Combine ¶

func (c *CovarStats) Combine(b *CovarStats) CovarStats

Combine returns the combination of two CovarStats datastructures

func (*CovarStats) Correlation ¶

func (c *CovarStats) Correlation() float64

Correlation returns the Pearson product-moment correlation coefficient of the x and y samples seen so far

func (*CovarStats) Intercept ¶

func (c *CovarStats) Intercept() float64

Intercept returns the intercept of the correlation between x and y samples seen so far

func (*CovarStats) N ¶

func (c *CovarStats) N() uint64

N returns the number of samples seen so far

func (*CovarStats) Slope ¶

func (c *CovarStats) Slope() float64

Slope returns the slope of the correlation between x and y samples seen so far

func (*CovarStats) XKurtosis ¶

func (c *CovarStats) XKurtosis() float64

XKurtosis returns the kurtorsis of the x values seen so far

func (*CovarStats) XMean ¶

func (c *CovarStats) XMean() float64

XMean returns the mean of the x values seen so far

func (*CovarStats) XSkewness ¶

func (c *CovarStats) XSkewness() float64

XSkewness returns the skewness of the x values seen so far

func (*CovarStats) XStdDev ¶

func (c *CovarStats) XStdDev() float64

XStdDev returns the standard deviation of the x values seen so far

func (*CovarStats) XVariance ¶

func (c *CovarStats) XVariance() float64

XVariance returns the variance of the x values seen so far

func (*CovarStats) YKurtosis ¶

func (c *CovarStats) YKurtosis() float64

YKurtosis returns the kurtosis of the y values seen so far

func (*CovarStats) YMean ¶

func (c *CovarStats) YMean() float64

YMean returns the mean of the y values seen so far

func (*CovarStats) YSkewness ¶

func (c *CovarStats) YSkewness() float64

YSkewness returns the skewness of the y values seen so far

func (*CovarStats) YStdDev ¶

func (c *CovarStats) YStdDev() float64

YStdDev returns the standard deviation of the y values seen so far

func (*CovarStats) YVariance ¶

func (c *CovarStats) YVariance() float64

YVariance returns the variance of the y values seen so far

type CumulativeDensity ¶

type CumulativeDensity struct {
	X float64
	P float64
}

CumulativeDensity represents the probability P of observing a value less than or equal to X

type EWMA ¶

type EWMA struct {
	// contains filtered or unexported fields
}

EWMA data structure for exponentially weighted moving average

func NewEWMA ¶

func NewEWMA(initialValue float64, lambda float64) EWMA

NewEWMA initializes an EWMA with weighting lambda and given initial value

func (*EWMA) Add ¶

func (e *EWMA) Add(x float64)

Add updates the average value with the stored weight

func (*EWMA) Mean ¶

func (e *EWMA) Mean() float64

Mean returns the exponentially weighted average value

type HyperLogLog ¶

type HyperLogLog struct {
	// contains filtered or unexported fields
}

HyperLogLog a data structure for computing count distinct on arbitrary sized data

func NewHyperLogLog ¶

func NewHyperLogLog(p byte, hash hash.Hash64) *HyperLogLog

NewHyperLogLog returns a new HyperLogLog data structure with 2^p buckets based on Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm Philippe Flajolet and Éric Fusy and Olivier Gandouet and et al. IN AOFA ’07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON ANALYSIS OF ALGORITHMS This implementation does not include any of the HyperLogLog++ enhancments except for the 64-bit hash function which eliminates the large cardinality correction for hash collisions this is also space in-efficient since bytes are used to store the counts which could be at most 60 < 2^6

func (*HyperLogLog) Add ¶

func (hll *HyperLogLog) Add(item []byte)

Add adds an item to the multiset represented by the HyperLogLog

func (*HyperLogLog) BiasCorrected ¶

func (hll *HyperLogLog) BiasCorrected() uint64

BiasCorrected returns the bias corrected estimated number of distinct items in the multiset

func (*HyperLogLog) Compress ¶

func (hll *HyperLogLog) Compress(factor byte) *HyperLogLog

Compress produces a new HyperLogLog with reduced size by 2^factor with reduced precision if new p < minimumHyperLogLogP, p=minimumHyperLogLogP , if factor=0 it just produces a copy

func (*HyperLogLog) Distinct ¶

func (hll *HyperLogLog) Distinct() uint64

Distinct returns the estimated number of distinct items in the multiset

func (*HyperLogLog) ExpectedError ¶

func (hll *HyperLogLog) ExpectedError() float64

ExpectedError returns the estimated error in the number of distinct items in the multiset

func (*HyperLogLog) Intersect ¶

func (hll *HyperLogLog) Intersect(hllB *HyperLogLog) (*HyperLogLog, error)

Intersect the estimate of two HyperLogLog reducing the precision to the minimum of the two sets the function will return nil and an error if the hash functions mismatch Intersect will always overestimate the size of the intersection

func (*HyperLogLog) LinearCounting ¶

func (hll *HyperLogLog) LinearCounting() uint64

LinearCounting returns the linear counting estimated number of distinct items in the multiset

func (*HyperLogLog) RawEstimate ¶

func (hll *HyperLogLog) RawEstimate() uint64

RawEstimate returns the raw estimated number of distinct items in the multiset

func (*HyperLogLog) Reset ¶

func (hll *HyperLogLog) Reset()

Reset zeros out the estimated number of distinct items in the multiset

func (*HyperLogLog) String ¶

func (hll *HyperLogLog) String() string

func (*HyperLogLog) Union ¶

func (hll *HyperLogLog) Union(hllB *HyperLogLog) (*HyperLogLog, error)

Union the estimate of two HyperLogLog reducing the precision to the minimum of the two sets the function will return nil and an error if the hash functions mismatch

type LinearCounting ¶

type LinearCounting struct {
	// contains filtered or unexported fields
}

LinearCounting is a space efficient data structure for count distinct with hard upper bound

func NewLinearCounting ¶

func NewLinearCounting(p byte, hash hash.Hash64) *LinearCounting

NewLinearCounting initializes a LinearCounting structure with size m=2^p and the given hash function

func (*LinearCounting) Add ¶

func (lc *LinearCounting) Add(item []byte)

Add adds an item to the multiset represented by the LinearCounting structure

func (*LinearCounting) Compress ¶

func (lc *LinearCounting) Compress(factor byte) *LinearCounting

Compress produces a new LinearCouting with reduced size by 2^factor with reduced precision if new p < minLinearCountingP, p=minLinearCountingP , if factor=0 it just produces a copy

func (LinearCounting) Distinct ¶

func (lc LinearCounting) Distinct() uint64

Distinct returns the estimate of the number of distinct elements seen if the backing BitVector is full it returns m, the size of the BitVector

func (LinearCounting) ExpectedError ¶

func (lc LinearCounting) ExpectedError() float64

ExpectedError returns the expected error at the current filling in the LinearCounting

func (*LinearCounting) Intersect ¶

func (lc *LinearCounting) Intersect(lcB *LinearCounting) (*LinearCounting, error)

Intersect the estimate of two LinearCounting reducing the precision to the minimum of the two sets the function will return nil and an error if the hash functions mismatch

func (LinearCounting) Occupancy ¶

func (lc LinearCounting) Occupancy() float64

Occupancy returns the ratio of filled buckets in the LinearCounting bitvector

func (LinearCounting) String ¶

func (lc LinearCounting) String() string

func (*LinearCounting) Union ¶

func (lc *LinearCounting) Union(lcB *LinearCounting) (*LinearCounting, error)

Union the estimate of two LinearCounting reducing the precision to the minimum of the two sets the function will return nil and an error if the hash functions mismatch

type MomentStats ¶

type MomentStats struct {
	// contains filtered or unexported fields
}

MomentStats is a datastructure for computing the first four moments of a stream

func NewMomentStats ¶

func NewMomentStats() *MomentStats

NewMomentStats returns an empty MomentStats structure with no values

func (*MomentStats) Add ¶

func (m *MomentStats) Add(x float64)

Add updates the moment stats

func (*MomentStats) Combine ¶

func (m *MomentStats) Combine(b *MomentStats) MomentStats

Combine combines the stats from two MomentStats structures

func (*MomentStats) Kurtosis ¶

func (m *MomentStats) Kurtosis() float64

Kurtosis returns the excess kurtosis of the samples seen so far

func (*MomentStats) Mean ¶

func (m *MomentStats) Mean() float64

Mean returns the mean of the observations seen so far

func (*MomentStats) N ¶

func (m *MomentStats) N() uint64

N returns the observations stored so far

func (*MomentStats) Skewness ¶

func (m *MomentStats) Skewness() float64

Skewness returns the skewness of the samples seen so far

func (*MomentStats) StdDev ¶

func (m *MomentStats) StdDev() float64

StdDev returns the standard deviation of the samples seen so far

func (*MomentStats) String ¶

func (m *MomentStats) String() string

String returns the standard string representation of the samples seen so far

func (*MomentStats) Variance ¶

func (m *MomentStats) Variance() float64

Variance returns the variance of the observations seen so far

type P2Histogram ¶

type P2Histogram struct {
	// contains filtered or unexported fields
}

P2Histogram is an O(1) time and space data structure for estimating the evenly spaced histogram bins of a series of N data points based on the "The P2 Algorithm for Dynamic Computing Calculation of Quantiles and Histograms Without Storing Observations" by RAJ JAIN and IIMRICH CHLAMTAC Communications of the ACM Volume 28 Issue 10, Oct. 1985 Pages 1076-1085

func NewP2Histogram ¶

func NewP2Histogram(b uint64) P2Histogram

NewP2Histogram intializes the data structure to track b bins

func (*P2Histogram) Add ¶

func (h *P2Histogram) Add(x float64)

Add updates the data structure with a given x value

func (*P2Histogram) CDF ¶

func (h *P2Histogram) CDF(x float64) float64

CDF returns the linear approximation to the CDF at x based on the histogram data

func (*P2Histogram) Histogram ¶

func (h *P2Histogram) Histogram() []CumulativeDensity

Histogram returns the histogram of observations seen so far

func (*P2Histogram) Max ¶

func (h *P2Histogram) Max() float64

Max returns the maximum of observations seen so far

func (*P2Histogram) Min ¶

func (h *P2Histogram) Min() float64

Min returns the minimum of observations seen so far

func (*P2Histogram) N ¶

func (h *P2Histogram) N() uint64

N returns the number of observations seen so far

func (*P2Histogram) Quantile ¶

func (h *P2Histogram) Quantile(p float64) float64

Quantile returns the linear approximation to the given quantile based on the histogram data

type P2Quantile ¶

type P2Quantile struct {
	// contains filtered or unexported fields
}

P2Quantile is an O(1) time and space data structure for estimating the p-quantile of a series of N data points based on the "The P2 Algorithm for Dynamic Computing Calculation of Quantiles and Histograms Without Storing Observations" by RAJ JAIN and IIMRICH CHLAMTAC Communications of the ACM Volume 28 Issue 10, Oct. 1985 Pages 1076-1085

func NewP2Quantile ¶

func NewP2Quantile(p float64) P2Quantile

NewP2Quantile intializes the data structure to track the p-quantile

func (*P2Quantile) Add ¶

func (p *P2Quantile) Add(x float64)

Add updates the data structure with a given x value

func (*P2Quantile) LowerQuantile ¶

func (p *P2Quantile) LowerQuantile() float64

LowerQuantile returns the estimate for the lower quantile, p/2

func (*P2Quantile) Max ¶

func (p *P2Quantile) Max() float64

Max returns the exact maximum value seen so far

func (*P2Quantile) Min ¶

func (p *P2Quantile) Min() float64

Min returns the exact minimum value seen so far

func (*P2Quantile) N ¶

func (p *P2Quantile) N() uint64

N returns the number of observations seen so far

func (*P2Quantile) P ¶

func (p *P2Quantile) P() float64

P returns the quantile being tracked

func (*P2Quantile) Quantile ¶

func (p *P2Quantile) Quantile() float64

Quantile returns the estimated value for the p-quantile

func (*P2Quantile) UpperQuantile ¶

func (p *P2Quantile) UpperQuantile() float64

UpperQuantile returns the estimate for the upper quantile, (1+p/2)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL