stats

package

v0.4.0 Latest Latest Go to latest Published: Mar 21, 2023 License: Apache-2.0 Imports: 13 Imported by: 10

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/stockparfait/stockparfait

Links

Open Source Insights

README ¶

Statistics package

Variable substitution in Monte Carlo integration

Given the original integral:

I = \integral_{x_min..x_max} f(x) dx

replace x by x(t), so the integral becomes over dt:

I = integral_{t_min..t_max} f(x(t)) x'(t) dt

where x(t_min) = x_min, x(t_max) = x_max, and x'(t) = dx/dt is the derivative of x(t) over t.

The interesting case supported here is an N-dimensional integral over a vector X=(x_1, ..., x_N) in R^N, that is the N-dimensional real hyperspace. The original integral is assumed to be of the form:

I = E[g(X)] = \integral g(X)*f(X)*dX

where f(X) is the p.d.f. of some multivariate distribution of X. The simplest way to compute it is to generate random samples of X using the same distribution. Then the integral I can be approximated as:

I ~= 1/N * sum_{i=1..K} g(X_i)

for K number of samples.

In practice, the distribution f(X) may require too many samples to generate enough samples in the area of interest, e.g. where g(X) is sufficiently large and significantly contributes to the integral. Therefore, it may be beneficial to replace each x in the vector X with another variable t uniformly distributed in (-1..1), such that x(t -> -1) -> -Inf, x(t -> 1) -> Inf, x(t) is monotonically increasing and differentiable over (-1..1), and the probability of "interesting" values of x(t) is significant, so the number of required samples can be reduced.

Specificially, our g(X) will often be a unit function on a subspace, usually for computing a bucket value in a histogram for the N-compounded sample:

g(X) = (sum(X) in [low .. high]) ? 1 : 0

The substitution is

x(t) = r * t / (1 - t^(2*b))

where r controls the width of a near-uniform distribution of x values around zero, and b controls the portion of samples falling beyond the interval [-r..r].

Empirically, for the N-sum over [low..high], a good choice of parameters is:

r = max(|low|, |high|) / sqrt(N)
b=ceiling(sqrt(N))

However, rather than computing each bucket value separately, since we are effectively sampling x over the entire range, we can use every sample to increase the appropriate bucket by f(x(t))*x'(t), thus computing many g(x)'s in one go. The value of r in this case is the maximum absolute value in the buckets' range.

Documentation ¶

Overview ¶

Package stats implements statistical utilities.

Index ¶

func ExpectationMC(f func(x float64) float64, random func() float64, low, high float64, ...) float64
func PreciseEnough(x, deviation, epsilon float64, relative bool) bool
func SafeLog(x float64) float64
func TimeseriesIntersectIndices(tss ...*Timeseries) [][]int
func VarPrime(t, scale, power float64) float64
func VarSubst(t, scale, power, shift float64) float64
type Buckets
- func NewBuckets(n int, minval, maxval float64, spacing SpacingType) (*Buckets, error)
- func (b *Buckets) Bucket(x float64) int
- func (b *Buckets) FitTo(data []float64) error
- func (b *Buckets) InitMessage(js any) error
- func (b *Buckets) SameAs(b2 *Buckets) bool
- func (b *Buckets) Size(i int) float64
- func (b Buckets) String() string
- func (b *Buckets) X(i int, shift float64) float64
- func (b *Buckets) Xs(shift float64) []float64
type Distribution
type DistributionWithHistogram
type FastCompoundState
type Histogram
- func CompoundHistogram(ctx context.Context, source Distribution, n int, c *ParallelSamplingConfig) *Histogram
- func NewHistogram(buckets *Buckets) *Histogram
- func (h *Histogram) Add(xs ...float64)
- func (h *Histogram) AddHistogram(h2 *Histogram) error
- func (h *Histogram) AddWeights(weights []float64) error
- func (h *Histogram) AddWithWeight(x, weight float64)
- func (h *Histogram) Buckets() *Buckets
- func (h *Histogram) CDF(x float64) float64
- func (h *Histogram) Count(i int) uint
- func (h *Histogram) Counts() []uint
- func (h *Histogram) CountsTotal() uint
- func (h *Histogram) MAD() float64
- func (h *Histogram) Mean() float64
- func (h *Histogram) PDF(i int) float64
- func (h *Histogram) PDFs() []float64
- func (h *Histogram) Prob(x float64) float64
- func (h *Histogram) Quantile(q float64) float64
- func (h *Histogram) Sigma() float64
- func (h *Histogram) StdError(i int) float64
- func (h *Histogram) StdErrors() []float64
- func (h *Histogram) Sum(i int) float64
- func (h *Histogram) SumTotal() float64
- func (h *Histogram) Sums() []float64
- func (h *Histogram) Variance() float64
- func (h *Histogram) Weight(i int) float64
- func (h *Histogram) Weights() []float64
- func (h *Histogram) WeightsTotal() float64
- func (h *Histogram) X(i int) float64
- func (h *Histogram) Xs() []float64
type HistogramDistribution
- func NewHistogramDistribution(h *Histogram) *HistogramDistribution
- func (d *HistogramDistribution) CDF(x float64) float64
- func (d *HistogramDistribution) Copy() Distribution
- func (d *HistogramDistribution) Histogram() *Histogram
- func (d *HistogramDistribution) MAD() float64
- func (d *HistogramDistribution) Mean() float64
- func (d *HistogramDistribution) Prob(x float64) float64
- func (d *HistogramDistribution) Quantile(x float64) float64
- func (d *HistogramDistribution) Rand() float64
- func (d *HistogramDistribution) Seed(seed uint64)
- func (d *HistogramDistribution) Variance() float64
type Normal
- func NewNormalDistribution(mean, MAD float64) *Normal
- func (d *Normal) Copy() Distribution
- func (d *Normal) MAD() float64
- func (d *Normal) Mean() float64
- func (d *Normal) Seed(seed uint64)
type ParallelSamplingConfig
- func (c *ParallelSamplingConfig) InitMessage(js any) error
type PriceField
type RandDistribution
- func CompoundRandDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *RandDistribution[struct{}]
- func FastCompoundRandDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *RandDistribution[FastCompoundState]
- func NewRandDistribution[S any](ctx context.Context, source Distribution, xform *Transform[S], ...) *RandDistribution[S]
- func (d *RandDistribution[S]) CDF(x float64) float64
- func (d *RandDistribution[S]) Copy() Distribution
- func (d *RandDistribution[S]) Histogram() *Histogram
- func (d *RandDistribution[S]) MAD() float64
- func (d *RandDistribution[S]) Mean() float64
- func (d *RandDistribution[S]) Prob(x float64) float64
- func (d *RandDistribution[S]) Quantile(x float64) float64
- func (d *RandDistribution[S]) Rand() float64
- func (d *RandDistribution[S]) Seed(seed uint64)
- func (d *RandDistribution[S]) Variance() float64
type Sample
- func NewSample(data []float64) *Sample
- func (s *Sample) Copy() *Sample
- func (s *Sample) Data() []float64
- func (s *Sample) MAD() float64
- func (s *Sample) Mean() float64
- func (s *Sample) Normalize() (*Sample, error)
- func (s *Sample) Sigma() float64
- func (s *Sample) Sum() float64
- func (s *Sample) SumDev() float64
- func (s *Sample) SumSquaredDev() float64
- func (s *Sample) Variance() float64
type SampleDistribution
- func CompoundSampleDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *SampleDistribution
- func FastCompoundSampleDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *SampleDistribution
- func NewSampleDistribution(sample []float64, buckets *Buckets) *SampleDistribution
- func NewSampleDistributionFromRand(d Distribution, samples int, buckets *Buckets) *SampleDistribution
- func NewSampleDistributionFromRandDist[S any](d *RandDistribution[S], samples int, buckets *Buckets) *SampleDistribution
- func (d *SampleDistribution) CDF(x float64) float64
- func (d *SampleDistribution) Copy() Distribution
- func (d *SampleDistribution) Histogram() *Histogram
- func (d *SampleDistribution) MAD() float64
- func (d *SampleDistribution) Mean() float64
- func (d *SampleDistribution) Prob(x float64) float64
- func (d *SampleDistribution) Quantile(x float64) float64
- func (d *SampleDistribution) Rand() float64
- func (d *SampleDistribution) Sample() *Sample
- func (d *SampleDistribution) Seed(seed uint64)
- func (d *SampleDistribution) Variance() float64
type SpacingType
- func (s *SpacingType) InitMessage(js any) error
- func (s SpacingType) String() string
type StandardError
- func (e *StandardError) Add(x float64)
- func (e *StandardError) AddZeros(n uint)
- func (e StandardError) Mean() float64
- func (e *StandardError) Merge(other StandardError)
- func (e StandardError) N() uint
- func (e StandardError) Sigma() float64
- func (e StandardError) Variance() float64
type StudentsT
- func NewStudentsTDistribution(alpha, mean, MAD float64) *StudentsT
- func (d *StudentsT) Copy() Distribution
- func (d *StudentsT) MAD() float64
- func (d *StudentsT) Mean() float64
- func (d *StudentsT) Seed(seed uint64)
type Timeseries
- func NewTimeseries(dates []db.Date, data []float64) *Timeseries
- func NewTimeseriesFromPrices(prices []db.PriceRow, f PriceField) *Timeseries
- func TimeseriesIntersect(tss ...*Timeseries) []*Timeseries
- func (t *Timeseries) Add(t2 *Timeseries) *Timeseries
- func (t *Timeseries) AddC(c float64) *Timeseries
- func (t *Timeseries) BinaryOp(f func(x, y float64) float64, t2 *Timeseries) *Timeseries
- func (t *Timeseries) Check() error
- func (t *Timeseries) Copy() *Timeseries
- func (t *Timeseries) Data() []float64
- func (t *Timeseries) Dates() []db.Date
- func (t *Timeseries) Div(t2 *Timeseries) *Timeseries
- func (t *Timeseries) DivC(c float64) *Timeseries
- func (t *Timeseries) Exp() *Timeseries
- func (t *Timeseries) Filter(f func(int) bool) *Timeseries
- func (t *Timeseries) Log() *Timeseries
- func (t *Timeseries) LogProfits(n int, intraday bool) *Timeseries
- func (t *Timeseries) Mult(t2 *Timeseries) *Timeseries
- func (t *Timeseries) MultC(c float64) *Timeseries
- func (t *Timeseries) Range(start, end db.Date) *Timeseries
- func (t *Timeseries) Shift(shift int) *Timeseries
- func (t *Timeseries) Sub(t2 *Timeseries) *Timeseries
- func (t *Timeseries) SubC(c float64) *Timeseries
- func (t *Timeseries) UnaryOp(f func(float64) float64) *Timeseries
type Transform

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ExpectationMC ¶

func ExpectationMC(f func(x float64) float64, random func() float64,
	low, high float64, minIter, maxIter uint, precision float64, relative bool) float64

ExpectationMC computes a (potentially partial) expectation integral: \integral_{ low .. high } [ f(x) * d.Prob(x) * dx ] using the simple Monte-Carlo method of sampling f(x) with the given distribution sampler and computing the average. The bounds are inclusive. Note, that low may be -Inf, and high may be +Inf.

The sampling stops either when the maxIter samples have been reached, or when the estimated standard error becomes less than the required relative or absolute precision. See PreciseEnough for the exact semantics.

In any case, minIter iterations are guaranteed; it should normally be a small number (e.g. 100) to accumulate a reasonable initial error estimate.

func PreciseEnough ¶ added in v0.1.7

func PreciseEnough(x, deviation, epsilon float64, relative bool) bool

PreciseEnough determines if the value of x with an estimated deviation is within epsilon neighborhood of its true value. This can be used as a termination criteria in iterative approximation methods when a desired precision has been reached.

Note, that epsilon provides a relative precision: the true value of x is assumed to be within [x-dev..x+dev] interval, and the precision is reached when dev/x < epsilon for |x| >= 1, otherwise dev < epsilon.

func SafeLog ¶

func SafeLog(x float64) float64

SafeLog is a "safe" natural logarithm, which for x <= 0 returns -Inf.

func TimeseriesIntersectIndices ¶ added in v0.2.7

func TimeseriesIntersectIndices(tss ...*Timeseries) [][]int

TimeseriesIntersectIndices returns the slice of indices S effectively intersecting the given Timeseries by Date. That is:

- len(S) is the number of distinct Dates present in all of the tss;

- len(S[i]) = len(tss) for any i<len(S), so each S[i] is the slice of indices in the corresponding Timeseries such that tss[j].Dates()[S[i][j]] == tss[k].Dates()[S[i][k]] for any j, k < len(tss).

func VarPrime ¶ added in v0.1.7

func VarPrime(t, scale, power float64) float64

VarPrime is the value of x'(t), the first derivative of x(t).

func VarSubst ¶ added in v0.1.7

func VarSubst(t, scale, power, shift float64) float64

VarSubst computes the value of

x(t) = shift + scale * t / (1 - t^(2*power))

to be used as a variable substitution in an integral over x in (-Inf..Inf). The new bounds for t become (-1..1), excluding the boundaries.

In Monte Carlo integration, the integral_{-Inf..Inf} f(x)dx is approximated by the sample average E[ f(x(t))*x'(t) ] for a uniformly distributed t over (-1..1).

Types ¶

type Buckets ¶

type Buckets struct {
	N int `json:"n" default:"101"`
	// Indicate that spacing / min / max can be set automatically.
	Auto    bool        `json:"auto bounds" default:"true"`
	Spacing SpacingType `json:"spacing"` // choices:"linear,exponential,symmetric exponential"
	Min     float64     `json:"min" default:"-50"`
	Max     float64     `json:"max" default:"50"`
	Bounds  []float64   `json:"-"` // n+1 bucket boundaries, auto-generated
}

Buckets configures the properties of histogram buckets. It implements message.Message, thus can be directly used in configs.

func NewBuckets ¶

func NewBuckets(n int, minval, maxval float64, spacing SpacingType) (*Buckets, error)

NewBuckets creates and initializes a new buckets object.

func (*Buckets) Bucket ¶

func (b *Buckets) Bucket(x float64) int

Bucket computes the bucket index for a sample.

func (*Buckets) FitTo ¶ added in v0.1.6

func (b *Buckets) FitTo(data []float64) error

FitTo data the bucket parameters such as spacing, min & max. Assumes that data is sorted in ascending order. In case of an error, the original value is preserved.

func (*Buckets) InitMessage ¶ added in v0.0.6

func (b *Buckets) InitMessage(js any) error

func (*Buckets) SameAs ¶ added in v0.0.7

func (b *Buckets) SameAs(b2 *Buckets) bool

SameAs checks if b defines the same buckets as b2.

func (*Buckets) Size ¶

func (b *Buckets) Size(i int) float64

Size of the i'th bucket.

func (Buckets) String ¶ added in v0.0.7

func (b Buckets) String() string

String prints Buckets. It is a value method, so non-pointer Buckets will print correctly in fmt.Printf.

func (*Buckets) X ¶

func (b *Buckets) X(i int, shift float64) float64

X computes the representative value of x for the i'th bucket, optionally adjusted by the relative shift amount (shift=1.0 is the next bucket boundary).

func (*Buckets) Xs ¶

func (b *Buckets) Xs(shift float64) []float64

Xs returns the list of representative values for all buckets, optionally adjusted by the relative shift amount. It always returns a newly allocated slice, so it is safe to modify it.

type Distribution ¶

type Distribution interface {
	distuv.Rander
	distuv.Quantiler
	Prob(float64) float64 // the p.d.f. value at x
	Mean() float64
	MAD() float64 // mean absolute deviation
	Variance() float64
	CDF(x float64) float64 // returns max. quantile for x
	Copy() Distribution    // shallow-copy with a new instance of rand.Source
	// Set random seed when applicable. Mostly used in tests.
	Seed(uint64)
}

Distribution API for common operations.

type DistributionWithHistogram ¶ added in v0.1.5

type DistributionWithHistogram interface {
	Distribution
	Histogram() *Histogram
}

type FastCompoundState ¶ added in v0.2.1

type FastCompoundState []float64

FastCompoundState is used in Transform by FastCompoundRandDistribution.

type Histogram ¶

type Histogram struct {
	// contains filtered or unexported fields
}

Histogram stores sample counts for each bucket. The counts are continuous (float64) so that Histogram can be used to represent c.d.f.-based distributions derived numerically.

func CompoundHistogram ¶ added in v0.1.7

func CompoundHistogram(ctx context.Context, source Distribution, n int, c *ParallelSamplingConfig) *Histogram

CompoundHistogram computes a histogram of an n-compounded source distribution from its p.d.f. source.Prob(x) method.

func NewHistogram ¶

func NewHistogram(buckets *Buckets) *Histogram

NewHistogram creates and initializes a Histogram. It panics if buckets is nil.

func (*Histogram) Add ¶

func (h *Histogram) Add(xs ...float64)

Add samples to the Histogram.

func (*Histogram) AddHistogram ¶ added in v0.0.7

func (h *Histogram) AddHistogram(h2 *Histogram) error

AddHistogram adds h2 samples into the Histogram. h2 must have the same buckets as self.

func (*Histogram) AddWeights ¶ added in v0.1.7

func (h *Histogram) AddWeights(weights []float64) error

AddWeights to the histogram directly. Assumes len(weights) = h.Buckets().N.

func (*Histogram) AddWithWeight ¶ added in v0.1.7

func (h *Histogram) AddWithWeight(x, weight float64)

func (*Histogram) Buckets ¶

func (h *Histogram) Buckets() *Buckets

Buckets value of the Histogram.

func (*Histogram) CDF ¶

func (h *Histogram) CDF(x float64) float64

CDF value at x, approximated using histogram weights. It is effectively an inverse of Quantile(), interpolating values of x when it falls between bucket boundaries.

func (*Histogram) Count ¶

func (h *Histogram) Count(i int) uint

Count of the i'th bucket. Returns 0 if i is out of range.

func (*Histogram) Counts ¶

func (h *Histogram) Counts() []uint

Counts of the actual (possibly biased) samples in the Histogram. For p.d.f. estimates use Weights.

func (*Histogram) CountsTotal ¶ added in v0.1.7

func (h *Histogram) CountsTotal() uint

CountsTotal is the sum total of all counts.

func (*Histogram) MAD ¶ added in v0.0.7

func (h *Histogram) MAD() float64

MAD esmimates mean absolute deviation.

func (*Histogram) Mean ¶

func (h *Histogram) Mean() float64

Mean computes the approximate mean of the distribution.

func (*Histogram) PDF ¶

func (h *Histogram) PDF(i int) float64

PDF value at the i'th bucket. Return 0 if i is out of range. It integrates to 1.0 when dx = h.Buckets().Size(i).

func (*Histogram) PDFs ¶

func (h *Histogram) PDFs() []float64

PDFs lists all the values of PDF for all the buckets. This is suitable for plotting against Xs().

func (*Histogram) Prob ¶ added in v0.1.5

func (h *Histogram) Prob(x float64) float64

Prob is the p.d.f. value at x, approximated using histogram weights.

func (*Histogram) Quantile ¶

func (h *Histogram) Quantile(q float64) float64

Quantile computes the approximation of the q'th quantile, where e.g. q=0.5 is the 50th percentile. Quantiles of 0 and 1 can be used as approximations of the minimum and maximum sample values. Panics if q is not within [0..1].

func (*Histogram) Sigma ¶ added in v0.0.7

func (h *Histogram) Sigma() float64

Sigma is the estimated standard deviation.

func (*Histogram) StdError ¶ added in v0.1.7

func (h *Histogram) StdError(i int) float64

StdError estimates the standard deviation of the p.d.f. value at each bucket.

func (*Histogram) StdErrors ¶ added in v0.1.7

func (h *Histogram) StdErrors() []float64

StdErrors is a slice of estimated standard errors for all buckets.

func (*Histogram) Sum ¶ added in v0.0.7

func (h *Histogram) Sum(i int) float64

Sum of samples for the i'th bucket. Returns 0 if i is out of range.

func (*Histogram) SumTotal ¶ added in v0.0.7

func (h *Histogram) SumTotal() float64

SumTotal of all samples.

func (*Histogram) Sums ¶ added in v0.0.7

func (h *Histogram) Sums() []float64

Sums of samples per bucket.

func (*Histogram) Variance ¶ added in v0.0.7

func (h *Histogram) Variance() float64

Variance esmimation.

func (*Histogram) Weight ¶ added in v0.1.7

func (h *Histogram) Weight(i int) float64

Weight of the i'th bucket. Returns 0 if i is out of range.

func (*Histogram) Weights ¶ added in v0.1.7

func (h *Histogram) Weights() []float64

Weights of the buckets in the Histogram. These are the true "sizes" of the buckets in a traditional sense of a histogram.

func (*Histogram) WeightsTotal ¶ added in v0.1.8

func (h *Histogram) WeightsTotal() float64

WeightsTotal is the sum total of all weights.

func (*Histogram) X ¶ added in v0.0.7

func (h *Histogram) X(i int) float64

X returns the mean x value of the i'th bucket, or the logical middle of the bucket if it has no samples.

func (*Histogram) Xs ¶ added in v0.0.7

func (h *Histogram) Xs() []float64

Xs returns the list of mean values for all buckets. The slice is always newly allocated.

type HistogramDistribution ¶ added in v0.1.5

type HistogramDistribution struct {
	// contains filtered or unexported fields
}

HistogramDistribution creates a Distribution out of a Histogram.

func NewHistogramDistribution ¶ added in v0.1.5

func NewHistogramDistribution(h *Histogram) *HistogramDistribution

NewHistogramDistribution creates a new distribution out of h. Note, that h is stored as the original pointer, and not deep-copied. The caller must assure that h is not modified after creating this distribution, otherwise the behavior may be unpredictable.

func (*HistogramDistribution) CDF ¶ added in v0.1.5

func (d *HistogramDistribution) CDF(x float64) float64

func (*HistogramDistribution) Copy ¶ added in v0.1.5

func (d *HistogramDistribution) Copy() Distribution

Copy shallow-copies the distribution. Note, that the underlying Histogram is copied by pointer, and not deep-copied.

func (*HistogramDistribution) Histogram ¶ added in v0.1.5

func (d *HistogramDistribution) Histogram() *Histogram

func (*HistogramDistribution) MAD ¶ added in v0.1.5

func (d *HistogramDistribution) MAD() float64

func (*HistogramDistribution) Mean ¶ added in v0.1.5

func (d *HistogramDistribution) Mean() float64

func (*HistogramDistribution) Prob ¶ added in v0.1.5

func (d *HistogramDistribution) Prob(x float64) float64

func (*HistogramDistribution) Quantile ¶ added in v0.1.5

func (d *HistogramDistribution) Quantile(x float64) float64

func (*HistogramDistribution) Rand ¶ added in v0.1.5

func (d *HistogramDistribution) Rand() float64

func (*HistogramDistribution) Seed ¶ added in v0.1.5

func (d *HistogramDistribution) Seed(seed uint64)

func (*HistogramDistribution) Variance ¶ added in v0.1.5

func (d *HistogramDistribution) Variance() float64

type Normal ¶

type Normal struct {
	distuv.Normal
}

Normal distribution.

func NewNormalDistribution ¶

func NewNormalDistribution(mean, MAD float64) *Normal

NewNormalDistribution creates an instance of a normal distribution scaled and shifted for the given mean and MAD (mean absolute deviation).

func (*Normal) Copy ¶

func (d *Normal) Copy() Distribution

func (*Normal) MAD ¶

func (d *Normal) MAD() float64

func (*Normal) Mean ¶

func (d *Normal) Mean() float64

func (*Normal) Seed ¶

func (d *Normal) Seed(seed uint64)

type ParallelSamplingConfig ¶ added in v0.1.7

type ParallelSamplingConfig struct {
	BatchMin int     `json:"batch size min" default:"10"`
	BatchMax int     `json:"batch size max" default:"10000"`
	Samples  int     `json:"samples" default:"10000"` // for histogram
	Buckets  Buckets `json:"buckets"`
	// Biased sampling parameters, when applicable. Zero values indicate that the
	// caller must set appropriate defaults.
	Scale   float64 `json:"bias scale"` // size of uniform distribution area
	Power   float64 `json:"bias power"` // approach +-Inf near +-1 as 1/(1-t^(2*Power))
	Shift   float64 `json:"bias shift"` // value of x(t=0)
	Workers int     `json:"workers"`    // default: 2*runtime.NumCPU()
	Seed    int     `json:"seed"`       // for use in tests when > 0
}

ParallelSamplingConfig is a set of configuration parameters for RandDistribution suitable for use in user config file schema.

func (*ParallelSamplingConfig) InitMessage ¶ added in v0.1.7

func (c *ParallelSamplingConfig) InitMessage(js any) error

type PriceField ¶

type PriceField uint8

PriceField is an enum type indicating which PriceRow field to use.

const (
	PriceOpenUnadjusted PriceField = iota
	PriceOpenSplitAdjusted
	PriceOpenFullyAdjusted
	PriceHighUnadjusted
	PriceHighSplitAdjusted
	PriceHighFullyAdjusted
	PriceLowUnadjusted
	PriceLowSplitAdjusted
	PriceLowFullyAdjusted
	PriceCloseUnadjusted
	PriceCloseSplitAdjusted
	PriceCloseFullyAdjusted
	PriceCashVolume
)

type RandDistribution ¶

type RandDistribution[State any] struct {
	// contains filtered or unexported fields
}

RandDistribution uses a transformed Rand method of a source distribution to create another distribution. In particular, its own Rand function simply calls the source's Rand and applies the transform. It estimates and caches mean, MAD and quantiles (as a histogram) from a set number of samples. It never stores the generated samples, so its memory footprint remains small.

func CompoundRandDistribution ¶

func CompoundRandDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *RandDistribution[struct{}]

CompoundRandDistribution creates a RandDistribution out of source compounded n times. That is, source.Rand() is invoked n times and the sum of its samples is a new single sample in the new distribution.

func FastCompoundRandDistribution ¶ added in v0.1.4

func FastCompoundRandDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *RandDistribution[FastCompoundState]

FastCompoundRandDistribution creates a RandDistribution out of source compounded n times. However, the source.Rand() values are not recomputed n times for each new sample, but are taken as the sum of a sliding window in a single sequence of source samples. This reduces the number of generated source samples from N*numSamples to N+numSamples. In practice, multiple such sequences are generated in parallel for further speedup.

func NewRandDistribution ¶

func NewRandDistribution[S any](ctx context.Context, source Distribution, xform *Transform[S], cfg *ParallelSamplingConfig) *RandDistribution[S]

NewRandDistribution creates a Distribution using the transformation of the random sampler function of the source distribution. The source distribution is copied using Distribution.Copy method, and therefore can be sampled independently and in parallel with the original source. It uses the given number of samples to estimate and lazily cache mean, MAD and quantiles.

func (*RandDistribution[S]) CDF ¶

func (d *RandDistribution[S]) CDF(x float64) float64

func (*RandDistribution[S]) Copy ¶

func (d *RandDistribution[S]) Copy() Distribution

func (*RandDistribution[S]) Histogram ¶

func (d *RandDistribution[S]) Histogram() *Histogram

Histogram of the generator, lazily cached.

func (*RandDistribution[S]) MAD ¶

func (d *RandDistribution[S]) MAD() float64

func (*RandDistribution[S]) Mean ¶

func (d *RandDistribution[S]) Mean() float64

func (*RandDistribution[S]) Prob ¶

func (d *RandDistribution[S]) Prob(x float64) float64

func (*RandDistribution[S]) Quantile ¶

func (d *RandDistribution[S]) Quantile(x float64) float64

func (*RandDistribution[S]) Rand ¶

func (d *RandDistribution[S]) Rand() float64

func (*RandDistribution[S]) Seed ¶

func (d *RandDistribution[S]) Seed(seed uint64)

func (*RandDistribution[S]) Variance ¶ added in v0.1.2

func (d *RandDistribution[S]) Variance() float64

type Sample ¶

type Sample struct {
	// contains filtered or unexported fields
}

Sample stores unordered set of numerical data (float64) and computes various statistics over it.

func NewSample ¶

func NewSample(data []float64) *Sample

NewSample creates a new sample initialized with data. Note, that it reuses the slice without copying. Use Copy() if you need to decouple your input from the Sample.

func (*Sample) Copy ¶

func (s *Sample) Copy() *Sample

Copy creates a deep copy of the Sample. This can be useful, e.g. like this:

s := NewSample(data).Copy()
// can safely modify data in place without affecting s.

func (*Sample) Data ¶

func (s *Sample) Data() []float64

Data returns the sample data.

func (*Sample) MAD ¶

func (s *Sample) MAD() float64

MAD computes mean absolute deviation of the Sample, cached.

func (*Sample) Mean ¶

func (s *Sample) Mean() float64

Mean computes the mean of the Sample, cached.

func (*Sample) Normalize ¶ added in v0.0.5

func (s *Sample) Normalize() (*Sample, error)

Normalize creates a new Sample of {(x - mean) / MAD}, thus its Mean and MAD are 0 and 1, respectively.

func (*Sample) Sigma ¶

func (s *Sample) Sigma() float64

Sigma computes the standard deviation of the Sample, cached.

func (*Sample) Sum ¶

func (s *Sample) Sum() float64

Sum of samples, cached.

func (*Sample) SumDev ¶

func (s *Sample) SumDev() float64

SumDev computes the sum of absolute deviations from the mean, cached.

func (*Sample) SumSquaredDev ¶

func (s *Sample) SumSquaredDev() float64

SumSquaredDev computes the sum of squared deviations from the mean, cached.

func (*Sample) Variance ¶

func (s *Sample) Variance() float64

Variance of the Sample (sigma squared), cached.

type SampleDistribution ¶

type SampleDistribution struct {
	// contains filtered or unexported fields
}

SampleDistribution implements a distribution of a sample.

func CompoundSampleDistribution ¶

func CompoundSampleDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *SampleDistribution

CompoundSampleDistribution creates a SampleDistribution out of a random generator compounded n times. That is, `rnd` is invoked n times and the sum of its samples is a new single sample in the new distribution.

func FastCompoundSampleDistribution ¶ added in v0.1.4

func FastCompoundSampleDistribution(ctx context.Context, source Distribution, n int, cfg *ParallelSamplingConfig) *SampleDistribution

FastCompoundSampleDistribution creates a SampleDistribution out of a random generator compounded n times. See FastCompoundRandDistribution.

func NewSampleDistribution ¶

func NewSampleDistribution(sample []float64, buckets *Buckets) *SampleDistribution

NewSampleDistribution creates an instance of a SampleDistribution. It requires Buckets to create a Histogram for computing a reasonable p.d.f. NOTE: it will sort the sample in place and store the slice as is, without deep copying. The caller is responsible for making a copy if the original order is important, or if the sample will later be modified by the caller.

func NewSampleDistributionFromRand ¶

func NewSampleDistributionFromRand(d Distribution, samples int, buckets *Buckets) *SampleDistribution

NewSampleDistributionFromRand creates an instance of a SampleDistribution by sampling a given distribution. It requires Buckets to create a Histogram for computing a reasonable p.d.f.

func NewSampleDistributionFromRandDist ¶ added in v0.1.4

func NewSampleDistributionFromRandDist[S any](d *RandDistribution[S], samples int, buckets *Buckets) *SampleDistribution

NewSampleDistributionFromRandDist is similar to NewSampleDistributionFromRand except that it uses fast stateful sample generation of RandDistribution.

func (*SampleDistribution) CDF ¶

func (d *SampleDistribution) CDF(x float64) float64

CDF of the sample distribution.

func (*SampleDistribution) Copy ¶

func (d *SampleDistribution) Copy() Distribution

func (*SampleDistribution) Histogram ¶

func (d *SampleDistribution) Histogram() *Histogram

Histogram of the sample distribution.

func (*SampleDistribution) MAD ¶

func (d *SampleDistribution) MAD() float64

func (*SampleDistribution) Mean ¶

func (d *SampleDistribution) Mean() float64

func (*SampleDistribution) Prob ¶

func (d *SampleDistribution) Prob(x float64) float64

func (*SampleDistribution) Quantile ¶

func (d *SampleDistribution) Quantile(x float64) float64

func (*SampleDistribution) Rand ¶

func (d *SampleDistribution) Rand() float64

func (*SampleDistribution) Sample ¶

func (d *SampleDistribution) Sample() *Sample

Sample as the source of the distribution.

func (*SampleDistribution) Seed ¶

func (d *SampleDistribution) Seed(seed uint64)

func (*SampleDistribution) Variance ¶ added in v0.1.2

func (d *SampleDistribution) Variance() float64

type SpacingType ¶

type SpacingType uint8

SpacingType is enum for different ways buckets are spaced out.

const (
	LinearSpacing SpacingType = iota
	ExponentialSpacing
	SymmetricExponentialSpacing
)

Values of SpacingType: - LinearSpacing divides the interval into n equal parts.

ExponentialSpacing divides the log-space interval into n equal parts, thus the buckets in the original interval grow exponentially away from zero. Note, that Min must be > 0.
SymmetricExponentialSpacing makes the exponential spacing symmetric around zero. That is, the buckets grow exponentially away from zero in both directions, and the middle bucket spans [-Min..Min]. It requires n to be odd, and Min > 0, but the actual interval is [-Max..Max].

func (*SpacingType) InitMessage ¶ added in v0.0.6

func (s *SpacingType) InitMessage(js any) error

func (SpacingType) String ¶ added in v0.0.7

func (s SpacingType) String() string

String prints SpacingType. It's a value method, so it prints correctly in fmt.Printf.

type StandardError ¶ added in v0.1.7

type StandardError struct {
	// contains filtered or unexported fields
}

StandardError accumulates and estimates the stardand deviation of an online sequence of samples. The accumulation of the stardand deviation is done in a computationally stable way using a generalization of the Youngs and Cramer formulas, a variant of the more popular Welford's algorithm.

A zero value of StandardError is ready for use, and represents 0 samples.

func (*StandardError) Add ¶ added in v0.1.7

func (e *StandardError) Add(x float64)

Add a single sample.

func (*StandardError) AddZeros ¶ added in v0.1.7

func (e *StandardError) AddZeros(n uint)

AddZeros adds n zero-valued samples.

func (StandardError) Mean ¶ added in v0.1.7

func (e StandardError) Mean() float64

Mean value of all samples.

func (*StandardError) Merge ¶ added in v0.1.7

func (e *StandardError) Merge(other StandardError)

Merge the other StandardError into e, so the resulting error estimate is for the union of samples.

func (StandardError) N ¶ added in v0.1.7

func (e StandardError) N() uint

N returns the number of accumulated samples.

func (StandardError) Sigma ¶ added in v0.1.7

func (e StandardError) Sigma() float64

Sigma is the standard deviation of the accumulated samples.

func (StandardError) Variance ¶ added in v0.1.7

func (e StandardError) Variance() float64

Variance of the accumulated samples.

type StudentsT ¶

type StudentsT struct {
	distuv.StudentsT
}

StudentsT distribution.

func NewStudentsTDistribution ¶

func NewStudentsTDistribution(alpha, mean, MAD float64) *StudentsT

NewStudentsTDistribution creates an instance of a Student's T distribution scaled and shifted to have a given mean and MAD (mean absolute deviation).

func (*StudentsT) Copy ¶

func (d *StudentsT) Copy() Distribution

func (*StudentsT) MAD ¶

func (d *StudentsT) MAD() float64

func (*StudentsT) Mean ¶

func (d *StudentsT) Mean() float64

func (*StudentsT) Seed ¶

func (d *StudentsT) Seed(seed uint64)

type Timeseries ¶

type Timeseries struct {
	// contains filtered or unexported fields
}

Timeseries stores numeric values along with timestamps. The timestamps are always sorted in ascending order.

func NewTimeseries ¶

func NewTimeseries(dates []db.Date, data []float64) *Timeseries

NewTimeseries creates a new Timeseries. The dates are expected to be sorted in ascending order (not checked). It panics if dates and data have different lengths. Note, that the argument slices are used as is, not copied. Use Copy() if arguments need to be modified after the call.

func NewTimeseriesFromPrices ¶ added in v0.3.0

func NewTimeseriesFromPrices(prices []db.PriceRow, f PriceField) *Timeseries

NewTimeseriesFromPrices initializes Timeseries from PriceRow slice.

func TimeseriesIntersect ¶ added in v0.2.7

func TimeseriesIntersect(tss ...*Timeseries) []*Timeseries

TimeseriesIntersect creates new list of Timeseries whose Dates are identical by dropping the mismatching Dates and Data elements out. The resulting slice is guaranteed to be of the same length as the number of arguments and contain valid Timeseries, even if they are empty.

func (*Timeseries) Add ¶ added in v0.2.9

func (t *Timeseries) Add(t2 *Timeseries) *Timeseries

Add two Timeseries pointwise.

func (*Timeseries) AddC ¶ added in v0.2.9

func (t *Timeseries) AddC(c float64) *Timeseries

AddC adds a constant to Timeseries data, pointwise.

func (*Timeseries) BinaryOp ¶ added in v0.2.9

func (t *Timeseries) BinaryOp(f func(x, y float64) float64, t2 *Timeseries) *Timeseries

BinaryOp applies f to the two Timeseries element-wise. It panics if the lengths or dates (pointwise) differ.

func (*Timeseries) Check ¶

func (t *Timeseries) Check() error

Check that Timeseries is consistent: the lengths of dates and data are the same and the dates are ordered in ascending order.

func (*Timeseries) Copy ¶

func (t *Timeseries) Copy() *Timeseries

Copy makes a deep copy of the Timeseries.

func (*Timeseries) Data ¶

func (t *Timeseries) Data() []float64

Data of the Timeseries.

func (*Timeseries) Dates ¶

func (t *Timeseries) Dates() []db.Date

Dates of the Timeseries.

func (*Timeseries) Div ¶ added in v0.2.9

func (t *Timeseries) Div(t2 *Timeseries) *Timeseries

Div divides Timeseries by another, pointwise.

func (*Timeseries) DivC ¶ added in v0.2.9

func (t *Timeseries) DivC(c float64) *Timeseries

DivC divides Timeseries by a constant, pointwise.

func (*Timeseries) Exp ¶ added in v0.2.9

func (t *Timeseries) Exp() *Timeseries

Exp of the Timeseries data, pointwise.

func (*Timeseries) Filter ¶ added in v0.3.6

func (t *Timeseries) Filter(f func(int) bool) *Timeseries

Filter elements of the Timeseries to only those that satisfy f, by index.

func (*Timeseries) Log ¶ added in v0.2.9

func (t *Timeseries) Log() *Timeseries

Log of the Timeseries data, pointwise.

func (*Timeseries) LogProfits ¶ added in v0.0.5

func (t *Timeseries) LogProfits(n int, intraday bool) *Timeseries

LogProfits computes a new Timeseries of log-profits {log(x[t+n]) - log(x[t])}. The associated log-profit date is t+n. When intarday is true, skip log-profits spanning more than one day.

func (*Timeseries) Mult ¶ added in v0.2.9

func (t *Timeseries) Mult(t2 *Timeseries) *Timeseries

Mult multiplies two Timeseries pointwise.

func (*Timeseries) MultC ¶ added in v0.2.9

func (t *Timeseries) MultC(c float64) *Timeseries

MultC multiplies Timeseries data by a constant, pointwise.

func (*Timeseries) Range ¶

func (t *Timeseries) Range(start, end db.Date) *Timeseries

Range extracts the sub-series from the inclusive time interval. It may return an empty Timeseries, but never nil.

func (*Timeseries) Shift ¶

func (t *Timeseries) Shift(shift int) *Timeseries

Shift the timeseries in time. A positive shift moves the values into the future, negative - into the past. The values outside of the date range are dropped. It may return an empty Timeseries, but never nil.

func (*Timeseries) Sub ¶ added in v0.2.9

func (t *Timeseries) Sub(t2 *Timeseries) *Timeseries

Sub subtracts another Timeseries from self, pointwise.

func (*Timeseries) SubC ¶ added in v0.2.9

func (t *Timeseries) SubC(c float64) *Timeseries

SubC subtracts a constant from Timeseries, pointwise.

func (*Timeseries) UnaryOp ¶ added in v0.2.9

func (t *Timeseries) UnaryOp(f func(float64) float64) *Timeseries

UnaryOp applies f pointwise to the Timeseries data.

type Transform ¶ added in v0.1.4

type Transform[State any] struct {
	InitState func() State
	Fn        func(d Distribution, state State) (float64, State)
}

Transform is a stateful random variable transformer used by RandDistribution to generate its random values. The initial state generator and the transform function must be go routine safe.

The random values Y_i are generated as Y_i, S_i = Fn(d, S_(i-1)), where S_0=InitState(). It is assumed that, asymptotically, generating multiple short sequences is statistically equivalent to generating a single long sequence. If this property doesn't hold, the Y values likely cannot be directly modeled by a random variable.

As an example, a sliding window compounding (the sum of last N d.Rand() values, or the log-profit over N steps) satisfies this property, but the unbounded sum (such as log-price) does not.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL