bayesian

package
v0.0.0-...-2cc3bea Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2024 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package bayesian implements bayesian analysis for detecting change points.

Index

Constants

View Source
const (
	// Confidence interval tail is the default tail value for confidence analysis.
	// 0.005 tail gives 99% confidence interval.
	ConfidenceIntervalTail = 0.005
)

Variables

This section is empty.

Functions

func AddLogLikelihoods

func AddLogLikelihoods(x []float64) float64

func Lgamma

func Lgamma(x float64) float64

Types

type BetaDistribution

type BetaDistribution struct {
	// Alpha parameter of Beta distribution.
	// Must be greater than 0.
	Alpha float64
	// Beta parameter of the Beta distribution.
	// Must be greater than 0.
	Beta float64
}

BetaDistribution specifies the parameters of a Beta distribution. See https://en.wikipedia.org/wiki/Beta_distribution.

In this package, we use beta distribution as the prior for a test's propensity to produce unexpected results, because it simplifies the maths.

Alpha = 1, Beta = 1 indicates an uninformative (uniform) prior.

type ChangepointPredictor

type ChangepointPredictor struct {
	// Threshold for creating new change points.
	ChangepointLikelihood float64

	// The prior for the rate at which a test's runs have any
	// unexpected test result.
	// This is the prior for estimating the ratio
	// HasUnexpected / Runs of a segment.
	//
	// Generally tests tend to be either consistently passing or
	// consistently failing, with a bias towards consistently
	// passing, so shape parameters Alpha < 1, Beta < 1, Alpha < Beta
	// are typically selected (e.g. alpha = 0.3, beta = 0.5).
	HasUnexpectedPrior BetaDistribution

	// The prior for the rate at which a test's runs have
	// only unexpected results, given they have at least
	// two results and one is unexpected.
	//
	// This is the prior for estimating UnexpectedAfterRetry / Retried.
	// Generally the result of retrying a fail inside a test run
	// either leads to a pass (fairly consistently) or another failure
	// (fairly consistently). Consequently, shape parameters Alpha < 1,
	// Beta < 1 are advised (e.g. alpha = 0.5, beta = 0.5).
	UnexpectedAfterRetryPrior BetaDistribution
}

func (ChangepointPredictor) ChangePoints

ChangePoints runs change point detection and confidence interval analysis for history. history is sorted by commit position ascendingly (oldest commit first).

func (ChangepointPredictor) FindBestChangepoint

func (a ChangepointPredictor) FindBestChangepoint(history []inputbuffer.PositionVerdict) (relativeLikelihood float64, position int)

FindBestChangepoint finds the change point position that maximises the likelihood of observing the given test history.

It returns the position of the change point in the history slice, as well as the change in log-likelihood attributable to the change point, relative to the `no change point` case.

The semantics of the returned position are as follows: a position p means the history is segmented as history[:p] and history[p:]. If the returned position is 0, it means no change point position was better than the `no change point` case.

This method requires the provided history to be sorted by commit position (either ascending or descending is fine). It allows multiple verdicts to be specified per commit position, by including those verdicts as adjacent elements in the history slice.

Note that if multiple verdicts are specified per commit position, the returned position will only ever be between two commit positions in the history, i.e. it holds that history[position-1].CommitPosition != history[position].CommitPosition (or position == 0).

This method assumes a uniform prior for all change point positions, including the no change point case. If we are to bias towards the no change point case, thresholding should be applied to relativeLikelihood before considering the change point real.

type SequenceLikelihood

type SequenceLikelihood struct {
	// contains filtered or unexported fields
}

SequenceLikelihood is used to compute the likelihood of observing a particular sequence of pass/fail results, assuming the failure rate p of the test remains constant over the sequence, and p is unknown but has the known prior Beta(alpha,beta) for some alpha and beta where alpha > 0 and beta > 0.

See go/bayesian-changepoint-estimation for details.

func NewSequenceLikelihood

func NewSequenceLikelihood(prior BetaDistribution) SequenceLikelihood

func (SequenceLikelihood) LogLikelihood

func (s SequenceLikelihood) LogLikelihood(x, n int) float64

LogLikelihood returns the log-likelihood of observing a particular sequence of pass/fail results (Y_0, Y_1 ... Y_{n-1}) of a known length. i.e. returns log(P(Y)).

See LogLikelihoodPartial for more details.

func (SequenceLikelihood) LogLikelihoodPartial

func (s SequenceLikelihood) LogLikelihoodPartial(x, n int, maxProportion float64) float64

LogLikelihoodPartial returns the log-likelihood of:

  • observing a particular binary sequence Y_0, Y_1 ... Y_{n-1}, knowing its length AND
  • concurrently, the underlying proportion p (= P(Y_i = 1)) that led to that sequence being less than or equal to maxProportion.

i.e. the method returns log(P(Y AND p <= maxProportion)).

The following assumptions are made:

  • Y ~ BernoilliProcess(p, n), i.e. Y is n independent samples of a bernoilli process with probability p of observing a 1.
  • p ~ Beta(a, b), i.e. p is unknown but taken from a Beta distribution with parameters a and b. If a = b = 1 is provided, the Beta distribution collapses to the uniform (uninformative) prior. In practice, where test failure rate is being modelled, a and b < 1 make more sense as tests tend towards being either consistently failing or consistently passing.

maxProportion must be between 0.0 and 1.0 (inclusive), and if it is 1.0, the method returns P(Y).

While the method returns the probability of observing a precise sequence Y, the only statistics needed to evaluate this probability is x (the number of elements that are true) and n (the total), so these are the only values accepted as arguments. (As such, inputs are also constrained as 0 <= x <= n.)

Log-likelihoods are returned, as for larger values of n (e.g. n > 100) the probabilities for any particular sequence can get very small. Working in log-probabilities avoids floating point precision issues.

[1] https://en.wikipedia.org/wiki/Beta_distribution.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL