dynsampler

package module

v0.6.0 Latest Latest Go to latest Published: Jan 12, 2024 License: Apache-2.0 Imports: 7 Imported by: 20

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/honeycombio/dynsampler-go

Links

Open Source Insights

README ¶

dynsampler-go

Dynsampler is a golang library for doing dynamic sampling of traffic before sending it on to Honeycomb (or another analytics system) It contains several sampling algorithms to help you select a representative set of events instead of a full stream.

A "sample rate" of 100 means that for every 100 requests, we capture a single event and indicate that it represents 100 similar requests.

For full documentation, look at the official documentation.

For more information about using Honeycomb, see our docs.

Sampling Techniques

This package is intended to help sample a stream of tracking events, where events are typically created in response to a stream of traffic (for the purposes of logging or debugging). In general, sampling is used to reduce the total volume of events necessary to represent the stream of traffic in a meaningful way.

There are a variety of available techniques for reducing a high-volume stream of incoming events to a lower-volume, more manageable stream of events. Depending on the shape of your traffic, one may serve better than another, or you may need to write a new one! Please consider contributing it back to this package if you do.

If your system has a completely homogeneous stream of requests: use Static sampling to use a constant sample rate.
If your system has a steady stream of requests and a well-known low cardinality partition key (e.g. http status): use Static sampling and override sample rates on a per-key basis (e.g. if you know want to sample HTTP 200/OK events at a different rate from HTTP 503/Server Error).
If your logging system has a strict cap on the rate it can receive events, use TotalThroughput, which will calculate sample rates based on keeping the entire system's representative event throughput right around (or under) particular cap.
If you need a throughput sampler that is responsive to spikes, but also averages sample rates over a longer period of time, use WindowedThroughput.
If your system has a rough cap on the rate it can receive events and your partitioned keyspace is fairly steady, use PerKeyThroughput, which will calculate sample rates based on keeping the event throughput roughly constant per key/partition (e.g. per user id)
The best choice for a system with a large key space and a large disparity between the highest volume and lowest volume keys is AvgSampleRateWithMin - it will increase the sample rate of higher volume traffic proportionally to the logarithm of the specific key's volume. If total traffic falls below a configured minimum, it stops sampling to avoid any sampling when the traffic is too low to warrant it.
EMASampleRate works like AvgSampleRate, but calculates sample rates based on a moving average (Exponential Moving Average) of many measurement intervals rather than a single isolated interval. In addition, it can detect large bursts in traffic and will trigger a recalculation of sample rates before the regular interval.
If you want the benefit of a key-based sampler that also has limits on throughput, use EMAThroughput. It will adjust sample rates across a key space to achieve a given throughput while still ensuring that all keys are represented.

Documentation ¶

Overview ¶

Package dynsampler contains several sampling algorithms to help you select a representative set of events instead of a full stream.

This package is intended to help sample a stream of tracking events, where events are typically created in response to a stream of traffic (for the purposes of logging or debugging). In general, sampling is used to reduce the total volume of events necessary to represent the stream of traffic in a meaningful way.

For the purposes of these examples, the "traffic" will be a set of HTTP requests being handled by a server, and "event" will be a blob of metadata about a given HTTP request that might be useful to keep track of later. A "sample rate" of 100 means that for every 100 requests, we capture a single event and indicate that it represents 100 similar requests.

Use ¶

Use the `Sampler` interface in your code. Each different sampling algorithm implements the Sampler interface.

The following guidelines can help you choose a sampler. Depending on the shape of your traffic, one may serve better than another, or you may need to write a new one! Please consider contributing it back to this package if you do.

* If your system has a completely homogeneous stream of requests: use `Static` to use a constant sample rate.

* If your system has a steady stream of requests and a well-known low cardinality partition key (e.g. http status): use `Static` and override sample rates on a per-key basis (e.g. if you know want to sample `HTTP 200/OK` events at a different rate from `HTTP 503/Server Error`).

* If your logging system has a strict cap on the rate it can receive events, use `TotalThroughput`, which will calculate sample rates based on keeping *the entire system's* representative event throughput right around (or under) particular cap.

* If your system has a rough cap on the rate it can receive events and your partitioned keyspace is fairly steady, use `PerKeyThroughput`, which will calculate sample rates based on keeping the event throughput roughly constant *per key/partition* (e.g. per user id)

* The best choice for a system with a large key space and a large disparity between the highest volume and lowest volume keys is `AvgSampleRateWithMin` - it will increase the sample rate of higher volume traffic proportionally to the logarithm of the specific key's volume. If total traffic falls below a configured minimum, it stops sampling to avoid any sampling when the traffic is too low to warrant it.

* `EMASampleRate` works like `AvgSampleRate`, but calculates sample rates based on a moving average (Exponential Moving Average) of many measurement intervals rather than a single isolated interval. In addition, it can detect large bursts in traffic and will trigger a recalculation of sample rates before the regular interval.

Each sampler implementation below has additional configuration parameters and a detailed description of how it chooses a sample rate.

Some implementations implement `SaveState` and `LoadState` - enabling you to serialize the Sampler's internal state and load it back. This is useful, for example, if you want to avoid losing calculated sample rates between process restarts.

Index ¶

type AvgSampleRate
- func (a *AvgSampleRate) GetMetrics(prefix string) map[string]int64
- func (a *AvgSampleRate) GetSampleRate(key string) int
- func (a *AvgSampleRate) GetSampleRateMulti(key string, count int) int
- func (a *AvgSampleRate) LoadState(state []byte) error
- func (a *AvgSampleRate) SaveState() ([]byte, error)
- func (a *AvgSampleRate) Start() error
- func (a *AvgSampleRate) Stop() error
type AvgSampleWithMin
- func (a *AvgSampleWithMin) GetMetrics(prefix string) map[string]int64
- func (a *AvgSampleWithMin) GetSampleRate(key string) int
- func (a *AvgSampleWithMin) GetSampleRateMulti(key string, count int) int
- func (a *AvgSampleWithMin) LoadState(state []byte) error
- func (a *AvgSampleWithMin) SaveState() ([]byte, error)
- func (a *AvgSampleWithMin) Start() error
- func (a *AvgSampleWithMin) Stop() error
type Block
type BlockList
- func NewBoundedBlockList(maxKeys int) BlockList
- func NewUnboundedBlockList() BlockList
type BoundedBlockList
- func (b *BoundedBlockList) AggregateCounts(currentIndex int64, lookbackIndex int64) (aggregateCounts map[string]int)
- func (b *BoundedBlockList) IncrementKey(key string, keyIndex int64, count int) error
type EMASampleRate
- func (e *EMASampleRate) GetMetrics(prefix string) map[string]int64
- func (e *EMASampleRate) GetSampleRate(key string) int
- func (e *EMASampleRate) GetSampleRateMulti(key string, count int) int
- func (e *EMASampleRate) LoadState(state []byte) error
- func (e *EMASampleRate) SaveState() ([]byte, error)
- func (e *EMASampleRate) Start() error
- func (e *EMASampleRate) Stop() error
type EMAThroughput
- func (e *EMAThroughput) GetMetrics(prefix string) map[string]int64
- func (e *EMAThroughput) GetSampleRate(key string) int
- func (e *EMAThroughput) GetSampleRateMulti(key string, count int) int
- func (e *EMAThroughput) LoadState(state []byte) error
- func (e *EMAThroughput) SaveState() ([]byte, error)
- func (e *EMAThroughput) Start() error
- func (e *EMAThroughput) Stop() error
type IndexGenerator
type MaxSizeError
- func (e MaxSizeError) Error() string
type OnlyOnce
- func (o *OnlyOnce) GetMetrics(prefix string) map[string]int64
- func (o *OnlyOnce) GetSampleRate(key string) int
- func (o *OnlyOnce) GetSampleRateMulti(key string, count int) int
- func (o *OnlyOnce) LoadState(state []byte) error
- func (o *OnlyOnce) SaveState() ([]byte, error)
- func (o *OnlyOnce) Start() error
- func (o *OnlyOnce) Stop() error
type PerKeyThroughput
- func (p *PerKeyThroughput) GetMetrics(prefix string) map[string]int64
- func (p *PerKeyThroughput) GetSampleRate(key string) int
- func (p *PerKeyThroughput) GetSampleRateMulti(key string, count int) int
- func (p *PerKeyThroughput) LoadState(state []byte) error
- func (p *PerKeyThroughput) SaveState() ([]byte, error)
- func (p *PerKeyThroughput) Start() error
- func (p *PerKeyThroughput) Stop() error
type Sampler
type Static
- func (s *Static) GetMetrics(prefix string) map[string]int64
- func (s *Static) GetSampleRate(key string) int
- func (s *Static) GetSampleRateMulti(key string, count int) int
- func (s *Static) LoadState(state []byte) error
- func (s *Static) SaveState() ([]byte, error)
- func (s *Static) Start() error
- func (s *Static) Stop() error
type TotalThroughput
- func (t *TotalThroughput) GetMetrics(prefix string) map[string]int64
- func (t *TotalThroughput) GetSampleRate(key string) int
- func (t *TotalThroughput) GetSampleRateMulti(key string, count int) int
- func (t *TotalThroughput) LoadState(state []byte) error
- func (t *TotalThroughput) SaveState() ([]byte, error)
- func (t *TotalThroughput) Start() error
- func (t *TotalThroughput) Stop() error
type UnboundedBlockList
- func (b *UnboundedBlockList) AggregateCounts(currentIndex int64, lookbackIndex int64) map[string]int
- func (b *UnboundedBlockList) IncrementKey(key string, keyIndex int64, count int) error
type UnixSecondsIndexGenerator
- func (g *UnixSecondsIndexGenerator) DurationToIndexes(duration time.Duration) int64
- func (g *UnixSecondsIndexGenerator) GetCurrentIndex() int64
type WindowedThroughput
- func (t *WindowedThroughput) GetMetrics(prefix string) map[string]int64
- func (t *WindowedThroughput) GetSampleRate(key string) int
- func (t *WindowedThroughput) GetSampleRateMulti(key string, count int) int
- func (t *WindowedThroughput) LoadState(state []byte) error
- func (t *WindowedThroughput) SaveState() ([]byte, error)
- func (t *WindowedThroughput) Start() error
- func (t *WindowedThroughput) Stop() error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AvgSampleRate ¶

type AvgSampleRate struct {
	// DEPRECATED -- use ClearFrequencyDuration.
	// ClearFrequencySec is how often the counters reset in seconds.
	ClearFrequencySec int

	// ClearFrequencyDuration is how often the counters reset as a Duration.
	// Note that either this or ClearFrequencySec can be specified, but not both.
	// If neither one is set, the default is 30s.
	ClearFrequencyDuration time.Duration

	// GoalSampleRate is the average sample rate we're aiming for, across all
	// events. Default 10
	GoalSampleRate int

	// MaxKeys, if greater than 0, limits the number of distinct keys used to build
	// the sample rate map within the interval defined by `ClearFrequencyDuration`. Once
	// MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	MaxKeys int
	// contains filtered or unexported fields
}

AvgSampleRate implements Sampler and attempts to average a given sample rate, weighting rare traffic and frequent traffic differently so as to end up with the correct average. This method breaks down when total traffic is low because it will be excessively sampled.

Keys that occur only once within ClearFrequencyDuration will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. In other words, every key will be represented at least once per ClearFrequencyDuration and more frequent keys will have their sample rate increased proportionally to wind up with the goal sample rate.

func (*AvgSampleRate) GetMetrics ¶ added in v0.5.0

func (a *AvgSampleRate) GetMetrics(prefix string) map[string]int64

func (*AvgSampleRate) GetSampleRate ¶

func (a *AvgSampleRate) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*AvgSampleRate) GetSampleRateMulti ¶ added in v0.4.0

func (a *AvgSampleRate) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*AvgSampleRate) LoadState ¶ added in v0.2.0

func (a *AvgSampleRate) LoadState(state []byte) error

LoadState accepts a byte array with a JSON representation of a previous instance's state

func (*AvgSampleRate) SaveState ¶ added in v0.2.0

func (a *AvgSampleRate) SaveState() ([]byte, error)

SaveState returns a byte array with a JSON representation of the sampler state

func (*AvgSampleRate) Start ¶

func (a *AvgSampleRate) Start() error

func (*AvgSampleRate) Stop ¶ added in v0.4.0

func (a *AvgSampleRate) Stop() error

type AvgSampleWithMin ¶

type AvgSampleWithMin struct {
	// DEPRECATED -- use ClearFrequencyDuration.
	// ClearFrequencySec is how often the counters reset in seconds.
	ClearFrequencySec int

	// ClearFrequencyDuration is how often the counters reset as a Duration.
	// Note that either this or ClearFrequencySec can be specified, but not both.
	// If neither one is set, the default is 30s.
	ClearFrequencyDuration time.Duration

	// GoalSampleRate is the average sample rate we're aiming for, across all
	// events. Default 10
	GoalSampleRate int

	// MaxKeys, if greater than 0, limits the number of distinct keys used to build
	// the sample rate map within the interval defined by `ClearFrequencyDuration`. Once
	// MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	MaxKeys int

	// MinEventsPerSec - when the total number of events drops below this
	// threshold, sampling will cease. default 50
	MinEventsPerSec int
	// contains filtered or unexported fields
}

AvgSampleWithMin implements Sampler and attempts to average a given sample rate, with a minimum number of events per second (i.e. it will reduce sampling if it would end up sending fewer than the mininum number of events). This method attempts to get the best of the normal average sample rate method, without the failings it shows on the low end of total traffic throughput

Keys that occur only once within ClearFrequencyDuration will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. In other words, every key will be represented at least once per ClearFrequencyDuration and more frequent keys will have their sample rate increased proportionally to wind up with the goal sample rate.

func (*AvgSampleWithMin) GetMetrics ¶ added in v0.5.0

func (a *AvgSampleWithMin) GetMetrics(prefix string) map[string]int64

func (*AvgSampleWithMin) GetSampleRate ¶

func (a *AvgSampleWithMin) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*AvgSampleWithMin) GetSampleRateMulti ¶ added in v0.4.0

func (a *AvgSampleWithMin) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*AvgSampleWithMin) LoadState ¶ added in v0.2.0

func (a *AvgSampleWithMin) LoadState(state []byte) error

LoadState is not implemented

func (*AvgSampleWithMin) SaveState ¶ added in v0.2.0

func (a *AvgSampleWithMin) SaveState() ([]byte, error)

SaveState is not implemented

func (*AvgSampleWithMin) Start ¶

func (a *AvgSampleWithMin) Start() error

func (*AvgSampleWithMin) Stop ¶ added in v0.4.0

func (a *AvgSampleWithMin) Stop() error

type Block ¶ added in v0.4.0

type Block struct {
	// contains filtered or unexported fields
}

type BlockList ¶ added in v0.4.0

type BlockList interface {
	IncrementKey(key string, keyIndex int64, count int) error
	AggregateCounts(currentIndex int64, lookbackIndex int64) map[string]int
}

BlockList is a data structure that keeps track of how often keys occur in a given time range in order to perform windowed lookback sampling. BlockList operates with monotonically increasing indexes, instead of timestamps. A BlockList is a single linked list of Blocks. Each Block has a frequency hashmap and a unique index.

func NewBoundedBlockList ¶ added in v0.4.0

func NewBoundedBlockList(maxKeys int) BlockList

Creates a new BlockList with a sentinel node.

func NewUnboundedBlockList ¶ added in v0.4.0

func NewUnboundedBlockList() BlockList

Creates a new BlockList with a sentinel node.

type BoundedBlockList ¶ added in v0.4.0

type BoundedBlockList struct {
	// contains filtered or unexported fields
}

BoundedBlockList have a limit on the maximum number of keys within the blocklist. Additional keys will be dropped by IncrementKey. This is implemented with another data structure ontop of an UnboundedBlockList that keeps track of total keys. We use a map from keys to indexes that the key appears in.

func (*BoundedBlockList) AggregateCounts ¶ added in v0.4.0

func (b *BoundedBlockList) AggregateCounts(
	currentIndex int64,
	lookbackIndex int64,
) (aggregateCounts map[string]int)

func (*BoundedBlockList) IncrementKey ¶ added in v0.4.0

func (b *BoundedBlockList) IncrementKey(key string, keyIndex int64, count int) error

IncrementKey will always increment an existing key. If the key is new, it will be rejected if there are maxKeys existing entries.

type EMASampleRate ¶ added in v0.2.0

type EMASampleRate struct {
	// DEPRECATED -- use AdjustmentIntervalDuration
	// AdjustmentInterval defines how often (in seconds) we adjust the moving average from
	// recent observations.
	AdjustmentInterval int

	// AdjustmentIntervalDuration is how often we adjust the moving average from
	// recent observations.
	// Note that either this or AdjustmentInterval can be specified, but not both.
	// If neither one is set, the default is 15s.
	AdjustmentIntervalDuration time.Duration

	// Weight is a value between (0, 1) indicating the weighting factor used to adjust
	// the EMA. With larger values, newer data will influence the average more, and older
	// values will be factored out more quickly.  In mathematical literature concerning EMA,
	// this is referred to as the `alpha` constant.
	// Default is 0.5
	Weight float64

	// GoalSampleRate is the average sample rate we're aiming for, across all
	// events. Default 10
	GoalSampleRate int

	// MaxKeys, if greater than 0, limits the number of distinct keys tracked in EMA.
	// Once MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	MaxKeys int

	// AgeOutValue indicates the threshold for removing keys from the EMA. The EMA of any key will approach 0
	// if it is not repeatedly observed, but will never truly reach it, so we have to decide what constitutes "zero".
	// Keys with averages below this threshold will be removed from the EMA. Default is the same as Weight, as this prevents
	// a key with the smallest integer value (1) from being aged out immediately. This value should generally be <= Weight,
	// unless you have very specific reasons to set it higher.
	AgeOutValue float64

	// BurstMultiple, if set, is multiplied by the sum of the running average of counts to define
	// the burst detection threshold. If total counts observed for a given interval exceed the threshold
	// EMA is updated immediately, rather than waiting on the AdjustmentIntervalDuration.
	// Defaults to 2; negative value disables. With a default of 2, if your traffic suddenly doubles,
	// burst detection will kick in.
	BurstMultiple float64

	// BurstDetectionDelay indicates the number of intervals to run after Start is called before burst detection kicks in.
	// Defaults to 3
	BurstDetectionDelay uint
	// contains filtered or unexported fields
}

EMASampleRate implements Sampler and attempts to average a given sample rate, weighting rare traffic and frequent traffic differently so as to end up with the correct average. This method breaks down when total traffic is low because it will be excessively sampled.

Based on the AvgSampleRate implementation, EMASampleRate differs in that rather than compute rate based on a periodic sample of traffic, it maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. The weight applied to more recent intervals is defined by `weight`, a number between (0, 1) - larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time.

Keys that are not found in the EMA will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. In other words, every key will be represented at least once in any given window and more frequent keys will have their sample rate increased proportionally to wind up with the goal sample rate.

func (*EMASampleRate) GetMetrics ¶ added in v0.5.0

func (e *EMASampleRate) GetMetrics(prefix string) map[string]int64

func (*EMASampleRate) GetSampleRate ¶ added in v0.2.0

func (e *EMASampleRate) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*EMASampleRate) GetSampleRateMulti ¶ added in v0.4.0

func (e *EMASampleRate) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*EMASampleRate) LoadState ¶ added in v0.2.0

func (e *EMASampleRate) LoadState(state []byte) error

LoadState accepts a byte array with a JSON representation of a previous instance's state

func (*EMASampleRate) SaveState ¶ added in v0.2.0

func (e *EMASampleRate) SaveState() ([]byte, error)

SaveState returns a byte array with a JSON representation of the sampler state

func (*EMASampleRate) Start ¶ added in v0.2.0

func (e *EMASampleRate) Start() error

func (*EMASampleRate) Stop ¶ added in v0.4.0

func (e *EMASampleRate) Stop() error

type EMAThroughput ¶ added in v0.4.0

type EMAThroughput struct {
	// AdjustmentInterval defines how often we adjust the moving average from
	// recent observations. Default 15s.
	AdjustmentInterval time.Duration

	// Weight is a value between (0, 1) indicating the weighting factor used to adjust
	// the EMA. With larger values, newer data will influence the average more, and older
	// values will be factored out more quickly.  In mathematical literature concerning EMA,
	// this is referred to as the `alpha` constant.
	// Default is 0.5
	Weight float64

	// InitialSampleRate is the sample rate to use during startup, before we
	// have accumulated enough data to calculate a reasonable desired sample
	// rate. This is mainly useful in situations where unsampled throughput is
	// high enough to cause problems.
	// Default 10.
	InitialSampleRate int

	// GoalThroughputPerSec is the target number of events to send per second.
	// Sample rates are generated to squash the total throughput down to match the
	// goal throughput. Actual throughput may exceed goal throughput. default 100
	GoalThroughputPerSec int

	// MaxKeys, if greater than 0, limits the number of distinct keys tracked in EMA.
	// Once MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	// Defaults to 0
	MaxKeys int

	// AgeOutValue indicates the threshold for removing keys from the EMA. The EMA of any key will approach 0
	// if it is not repeatedly observed, but will never truly reach it, so we have to decide what constitutes "zero".
	// Keys with averages below this threshold will be removed from the EMA. Default is the same as Weight, as this prevents
	// a key with the smallest integer value (1) from being aged out immediately. This value should generally be <= Weight,
	// unless you have very specific reasons to set it higher.
	AgeOutValue float64

	// BurstMultiple, if set, is multiplied by the sum of the running average of counts to define
	// the burst detection threshold. If total counts observed for a given interval exceed the threshold
	// EMA is updated immediately, rather than waiting on the AdjustmentInterval.
	// Defaults to 2; negative value disables. With a default of 2, if your traffic suddenly doubles,
	// burst detection will kick in.
	BurstMultiple float64

	// BurstDetectionDelay indicates the number of intervals to run after Start is called before burst detection kicks in.
	// Defaults to 3
	BurstDetectionDelay uint
	// contains filtered or unexported fields
}

EMAThroughput implements Sampler and attempts to achieve a given throughput rate, weighting rare traffic and frequent traffic differently so as to end up with the the desired throughput.

Based on the EMASampleRate implementation, EMAThroughput differs in that instead of trying to achieve a given sample rate, it tries to reach a given throughput of events. During bursts of traffic, it will reduce sample rates so as to keep the number of events per second roughly constant.

Like the EMA sampler, it maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. The weight applied to more recent intervals is defined by `weight`, a number between (0, 1) - larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates to more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time.

New keys that are not found in the EMA will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. In other words, every key will be represented at least once in any given window and more frequent keys will have their sample rate increased proportionally to wind up with the goal throughput.

func (*EMAThroughput) GetMetrics ¶ added in v0.5.0

func (e *EMAThroughput) GetMetrics(prefix string) map[string]int64

func (*EMAThroughput) GetSampleRate ¶ added in v0.4.0

func (e *EMAThroughput) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*EMAThroughput) GetSampleRateMulti ¶ added in v0.4.0

func (e *EMAThroughput) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*EMAThroughput) LoadState ¶ added in v0.4.0

func (e *EMAThroughput) LoadState(state []byte) error

LoadState accepts a byte array with a JSON representation of a previous instance's state

func (*EMAThroughput) SaveState ¶ added in v0.4.0

func (e *EMAThroughput) SaveState() ([]byte, error)

SaveState returns a byte array with a JSON representation of the sampler state

func (*EMAThroughput) Start ¶ added in v0.4.0

func (e *EMAThroughput) Start() error

func (*EMAThroughput) Stop ¶ added in v0.4.0

func (e *EMAThroughput) Stop() error

type IndexGenerator ¶ added in v0.4.0

type IndexGenerator interface {
	// Get the index corresponding to the current time.
	GetCurrentIndex() int64

	// Return the index differential for a particular duration -- i.e. 5 seconds = how many ticks of
	// the index.
	DurationToIndexes(duration time.Duration) int64
}

An index generator turns timestamps into indexes. This is essentially a bucketing mechanism.

type MaxSizeError ¶ added in v0.4.0

type MaxSizeError struct {
	// contains filtered or unexported fields
}

Error encounted when the BoundedBlockList has reached maxKeys capacity.

func (MaxSizeError) Error ¶ added in v0.4.0

func (e MaxSizeError) Error() string

type OnlyOnce ¶

type OnlyOnce struct {
	// DEPRECATED -- use ClearFrequencyDuration.
	// ClearFrequencySec is how often the counters reset in seconds.
	ClearFrequencySec int

	// ClearFrequencyDuration is how often the counters reset as a Duration.
	// Note that either this or ClearFrequencySec can be specified, but not both.
	// If neither one is set, the default is 30s.
	ClearFrequencyDuration time.Duration
	// contains filtered or unexported fields
}

OnlyOnce implements Sampler and returns a sample rate of 1 the first time a key is seen and 1,000,000,000 every subsequent time. Essentially, this means that every key will be reported the first time it's seen during each ClearFrequencySec and never again. Set ClearFrequencySec to a negative number to report each key only once for the life of the process.

(Note that it's not guaranteed that each key will be reported exactly once, just that the first seen event will be reported and subsequent events are unlikely to be reported. It is probable that an additional event will be reported for every billion times the key appears.)

This emulates what you might expect from something catching stack traces - the first one is important but every subsequent one just repeats the same information.

func (*OnlyOnce) GetMetrics ¶ added in v0.5.0

func (o *OnlyOnce) GetMetrics(prefix string) map[string]int64

func (*OnlyOnce) GetSampleRate ¶

func (o *OnlyOnce) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*OnlyOnce) GetSampleRateMulti ¶ added in v0.4.0

func (o *OnlyOnce) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*OnlyOnce) LoadState ¶ added in v0.2.0

func (o *OnlyOnce) LoadState(state []byte) error

LoadState is not implemented

func (*OnlyOnce) SaveState ¶ added in v0.2.0

func (o *OnlyOnce) SaveState() ([]byte, error)

SaveState is not implemented

func (*OnlyOnce) Start ¶

func (o *OnlyOnce) Start() error

Start initializes the static dynsampler

func (*OnlyOnce) Stop ¶ added in v0.4.0

func (o *OnlyOnce) Stop() error

type PerKeyThroughput ¶

type PerKeyThroughput struct {
	// DEPRECATED -- use ClearFrequencyDuration.
	// ClearFrequencySec is how often the counters reset in seconds.
	ClearFrequencySec int

	// ClearFrequencyDuration is how often the counters reset as a Duration.
	// Note that either this or ClearFrequencySec can be specified, but not both.
	// If neither one is set, the default is 30s.
	ClearFrequencyDuration time.Duration

	// PerKeyThroughputPerSec is the target number of events to send per second
	// per key. Sample rates are generated on a per key basis to squash the
	// throughput down to match the goal throughput. default 10
	PerKeyThroughputPerSec int

	// MaxKeys, if greater than 0, limits the number of distinct keys used to build
	// the sample rate map within the interval defined by `ClearFrequencyDuration`. Once
	// MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	MaxKeys int
	// contains filtered or unexported fields
}

PerKeyThroughput implements Sampler and attempts to meet a goal of a fixed number of events per key per second sent to Honeycomb.

This method is to guarantee that at most a certain number of events per key get transmitted, no matter how many keys you have or how much traffic comes through. In other words, if capturing a minimum amount of traffic per key is important but beyond that doesn't matter much, this is the best method.

func (*PerKeyThroughput) GetMetrics ¶ added in v0.5.0

func (p *PerKeyThroughput) GetMetrics(prefix string) map[string]int64

func (*PerKeyThroughput) GetSampleRate ¶

func (p *PerKeyThroughput) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*PerKeyThroughput) GetSampleRateMulti ¶ added in v0.4.0

func (p *PerKeyThroughput) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*PerKeyThroughput) LoadState ¶ added in v0.2.0

func (p *PerKeyThroughput) LoadState(state []byte) error

LoadState is not implemented

func (*PerKeyThroughput) SaveState ¶ added in v0.2.0

func (p *PerKeyThroughput) SaveState() ([]byte, error)

SaveState is not implemented

func (*PerKeyThroughput) Start ¶

func (p *PerKeyThroughput) Start() error

func (*PerKeyThroughput) Stop ¶ added in v0.4.0

func (p *PerKeyThroughput) Stop() error

type Sampler ¶

type Sampler interface {
	// Start initializes the sampler. You should call Start() before using the
	// sampler.
	Start() error

	// Stop halts the sampler and any background goroutines
	Stop() error

	// GetSampleRate will return the sample rate to use for the given key
	// string. You should call it with whatever key you choose to use to
	// partition traffic into different sample rates. It assumes that you're
	// calling it for a single item to be sampled (typically a span from a
	// trace), and simply calls GetSampleRateMulti with 1 for the second
	// parameter.
	GetSampleRate(string) int

	// GetSampleRateMulti will return the sample rate to use for the given key
	// string. You should call it with whatever key you choose to use to
	// partition traffic into different sample rates. It assumes you're calling
	// it for a group of samples. The second parameter is the number of samples
	// this call represents.
	GetSampleRateMulti(string, int) int

	// SaveState returns a byte array containing the state of the Sampler implementation.
	// It can be used to persist state between process restarts.
	SaveState() ([]byte, error)

	// LoadState accepts a byte array containing the serialized, previous state of the sampler
	// implementation. It should be called before `Start`.
	LoadState([]byte) error

	// GetMetrics returns a map of metrics about the sampler's performance.
	// All values are returned as int64; counters are cumulative and the names
	// always end with "_count", while gauges are instantaneous with no particular naming convention.
	// All names are prefixed with the given string.
	GetMetrics(prefix string) map[string]int64
}

Sampler is the interface to samplers using different methods to determine sample rate. You should instantiate one of the actual samplers in this package, depending on the sample method you'd like to use. Each sampling method has its own set of struct variables you should set before Start()ing the sampler.

type Static ¶

type Static struct {
	// Rates is the set of sample rates to use
	Rates map[string]int
	// Default is the value to use if the key is not whitelisted in Rates
	Default int
	// contains filtered or unexported fields
}

Static implements Sampler with a static mapping for sample rates. This is useful if you have a known set of keys that you want to sample at specific rates and apply a default to everything else.

func (*Static) GetMetrics ¶ added in v0.5.0

func (s *Static) GetMetrics(prefix string) map[string]int64

func (*Static) GetSampleRate ¶

func (s *Static) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*Static) GetSampleRateMulti ¶ added in v0.4.0

func (s *Static) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*Static) LoadState ¶ added in v0.2.0

func (s *Static) LoadState(state []byte) error

LoadState is not implemented

func (*Static) SaveState ¶ added in v0.2.0

func (s *Static) SaveState() ([]byte, error)

SaveState is not implemented

func (*Static) Start ¶

func (s *Static) Start() error

Start initializes the static dynsampler

func (*Static) Stop ¶ added in v0.4.0

func (s *Static) Stop() error

type TotalThroughput ¶

type TotalThroughput struct {
	// DEPRECATED -- use ClearFrequencyDuration.
	// ClearFrequencySec is how often the counters reset in seconds.
	ClearFrequencySec int

	// ClearFrequencyDuration is how often the counters reset as a Duration.
	// Note that either this or ClearFrequencySec can be specified, but not both.
	// If neither one is set, the default is 30s.
	ClearFrequencyDuration time.Duration

	// GoalThroughputPerSec is the target number of events to send per second.
	// Sample rates are generated to squash the total throughput down to match the
	// goal throughput. Actual throughput may exceed goal throughput. default 100
	GoalThroughputPerSec int

	// MaxKeys, if greater than 0, limits the number of distinct keys used to build
	// the sample rate map within the interval defined by `ClearFrequencySec`. Once
	// MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	MaxKeys int
	// contains filtered or unexported fields
}

TotalThroughput implements Sampler and attempts to meet a goal of a fixed number of events per second sent to Honeycomb.

If your key space is sharded across different servers, this is a good method for making sure each server sends roughly the same volume of content to Honeycomb. It performs poorly when the active keyspace is very large.

GoalThroughputSec * ClearFrequencyDuration (in seconds) defines the upper limit of the number of keys that can be reported and stay under the goal, but with that many keys, you'll only get one event per key per ClearFrequencySec, which is very coarse. You should aim for at least 1 event per key per sec to 1 event per key per 10sec to get reasonable data. In other words, the number of active keys should be less than 10*GoalThroughputSec.

func (*TotalThroughput) GetMetrics ¶ added in v0.5.0

func (t *TotalThroughput) GetMetrics(prefix string) map[string]int64

func (*TotalThroughput) GetSampleRate ¶

func (t *TotalThroughput) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*TotalThroughput) GetSampleRateMulti ¶ added in v0.4.0

func (t *TotalThroughput) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*TotalThroughput) LoadState ¶ added in v0.2.0

func (t *TotalThroughput) LoadState(state []byte) error

LoadState is not implemented

func (*TotalThroughput) SaveState ¶ added in v0.2.0

func (t *TotalThroughput) SaveState() ([]byte, error)

SaveState is not implemented

func (*TotalThroughput) Start ¶

func (t *TotalThroughput) Start() error

func (*TotalThroughput) Stop ¶ added in v0.4.0

func (t *TotalThroughput) Stop() error

type UnboundedBlockList ¶ added in v0.4.0

type UnboundedBlockList struct {
	// contains filtered or unexported fields
}

UnboundedBlockList can have unlimited keys.

func (*UnboundedBlockList) AggregateCounts ¶ added in v0.4.0

func (b *UnboundedBlockList) AggregateCounts(
	currentIndex int64,
	lookbackIndex int64,
) map[string]int

AggregateCounts returns a frequency hashmap of all counts from the currentIndex to the lookbackIndex. It also drops old blocks. This is an O(N) operation, where N is the length of the linked list.

func (*UnboundedBlockList) IncrementKey ¶ added in v0.4.0

func (b *UnboundedBlockList) IncrementKey(key string, keyIndex int64, count int) error

IncrementKey is used when we've encounted a new key. The current keyIndex is also provided. This function will increment the key in the current block or create a new block, if needed. The happy path invocation is very fast, O(1). The count is the number of events that this call represents.

type UnixSecondsIndexGenerator ¶ added in v0.4.0

type UnixSecondsIndexGenerator struct {
	DurationPerIndex time.Duration
}

The standard implementation of the index generator.

func (*UnixSecondsIndexGenerator) DurationToIndexes ¶ added in v0.4.0

func (g *UnixSecondsIndexGenerator) DurationToIndexes(duration time.Duration) int64

func (*UnixSecondsIndexGenerator) GetCurrentIndex ¶ added in v0.4.0

func (g *UnixSecondsIndexGenerator) GetCurrentIndex() int64

type WindowedThroughput ¶ added in v0.4.0

type WindowedThroughput struct {
	// UpdateFrequency is how often the sampling rate is recomputed, default is 1s.
	UpdateFrequencyDuration time.Duration

	// LookbackFrequency is how far back in time we lookback to dynamically adjust our sampling
	// rate. Default is 30 * UpdateFrequencyDuration. This will be 30s assuming the default
	// configuration of UpdateFrequencyDuration. We enforce this to be an _integer multiple_ of
	// UpdateFrequencyDuration.
	LookbackFrequencyDuration time.Duration

	// Target throughput per second.
	GoalThroughputPerSec float64

	// MaxKeys, if greater than 0, limits the number of distinct keys used to build
	// the sample rate map within the interval defined by `LookbackFrequencyDuration`. Once
	// MaxKeys is reached, new keys will not be included in the sample rate map, but
	// existing keys will continue to be be counted.
	// If MaxKeys is set to 0 (default), there is no upper bound on the number of distinct keys.
	MaxKeys int
	// contains filtered or unexported fields
}

Windowed Throughput sampling is an enhanced version of total throughput sampling. Just like the original throughput sampler, it attempts to meet the goal of fixed number of events per second sent to Honeycomb.

The original throughput sampler updates the sampling rate every "ClearFrequency" seconds. While this parameter is configurable, it suffers from the following tradeoff:

Decreasing it makes you more responsive to load spikes, but with the cost of making the sampling decision on less data.
Increasing it makes you less responsive to load spikes, but your sample rates will be more stable because they are made with more data.

The windowed throughput sampler resolves this by introducing two different, tunable parameters:

UpdateFrequency: how often the sampling rate is recomputed
LookbackFrequency: how far back we look back in time to recompute our sampling rate.

A standard configuration would be to set UpdateFrequency to 1s and LookbackFrequency to 30s. In this configuration, every second, we lookback at the last 30s of data in order to compute the new sampling rate. The actual sampling rate computation is nearly identical to the original throughput sampler, but this variant has better support for floating point numbers.

Because our lookback window is _rolling_ instead of static, we need a special datastructure to quickly and efficiently store our data. The code and additional information for this datastructure can be found in blocklist.go.

func (*WindowedThroughput) GetMetrics ¶ added in v0.5.0

func (t *WindowedThroughput) GetMetrics(prefix string) map[string]int64

func (*WindowedThroughput) GetSampleRate ¶ added in v0.4.0

func (t *WindowedThroughput) GetSampleRate(key string) int

GetSampleRate takes a key and returns the appropriate sample rate for that key.

func (*WindowedThroughput) GetSampleRateMulti ¶ added in v0.4.0

func (t *WindowedThroughput) GetSampleRateMulti(key string, count int) int

GetSampleRateMulti takes a key representing count spans and returns the appropriate sample rate for that key.

func (*WindowedThroughput) LoadState ¶ added in v0.4.0

func (t *WindowedThroughput) LoadState(state []byte) error

LoadState is not implemented

func (*WindowedThroughput) SaveState ¶ added in v0.4.0

func (t *WindowedThroughput) SaveState() ([]byte, error)

SaveState is not implemented

func (*WindowedThroughput) Start ¶ added in v0.4.0

func (t *WindowedThroughput) Start() error

func (*WindowedThroughput) Stop ¶ added in v0.4.0

func (t *WindowedThroughput) Stop() error

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL