templates

package

v0.0.0-...-de45acf Latest Latest Go to latest Published: Jul 13, 2023 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/gocardless/slo-builder

Links

Open Source Insights

Documentation ¶

Overview ¶

This package contains the implementation of various SLO templates.

When creating SLOs for services, once the key SLIs have been defined it's necessary to forumulate an expression that can represent how well the service is performing in accordance to its objectives.

Finding the expression that will take into account many key performance objectives of the system while being understandable can be difficult. The value of this package is to provide pre-configured templates that map to different categories of system that cater for the common properties people care about, enabling developers to apply sensible SLOs without having to dive deep into SLO-theory and Prometheus details.

Each template registers itself with a global registry, at which point it's possible to use the template in a definition file provided to the build command. Pipelines then construct a rule group in the order required to power each different template, while feeding into a common set of alerting windows that apply to all SLOs.

Index ¶

Variables
func MustRegisterTemplate(slo SLO, rules ...rulefmt.Rule)
type BatchProcessingSLO
- func (b BatchProcessingSLO) GetName() string
- func (b BatchProcessingSLO) Rules() []rulefmt.Rule
type ErrorRateSLO
- func (b ErrorRateSLO) GetName() string
- func (e ErrorRateSLO) Rules() []rulefmt.Rule
type LatencySLO
- func (b LatencySLO) GetName() string
- func (l LatencySLO) Rules() []rulefmt.Rule
type Pipeline
- func NewPipeline(name string) *Pipeline
- func (p *Pipeline) Build() rulefmt.RuleGroups
- func (p *Pipeline) MustRegister(slos ...SLO)
type SLO
- func ParseDefinitions(payload []byte) ([]SLO, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// Templates stores a mapping of template name to registered template. This is used to
	// unmarshal template definitions from their yaml source and to provide users with
	// feedback about what templates this tool supports.
	Templates = map[string]SLO{}

	// TemplateRules implement the translation from the rules produced by each instance of
	// SLO templates into the generic SLO error:ratio<I> format, which then power alerts.
	TemplateRules = []rulefmt.Rule{}

	// AlertWindows are common interval windows we want to precompute
	AlertWindows = []string{"1m", "5m", "30m", "1h", "2h", "6h", "1d", "3d", "7d", "28d"}

	// AlertRules every SLO type produces rules that terminate in job:slo_error:ratio<I> and
	// job:slo_error_budget's. Together, we can use these rules to power generic
	// multi-window SLO error budget burn alerts, and these alert rules are run as the final
	// part of the Pipeline generated RuleGroup.
	AlertRules = []rulefmt.Rule{
		rulefmt.Rule{
			Alert: "SLOErrorBudgetFastBurn",
			For:   model.Duration(time.Minute),
			Labels: map[string]string{
				"severity": "ticket",
			},
			Expr: `
((
  job:slo_error:ratio1h > on(name) group_left() (14.4 * job:slo_error_budget:ratio)
and
  job:slo_error:ratio5m > on(name) group_left() (14.4 * job:slo_error_budget:ratio)
)
or
(
  job:slo_error:ratio6h > on(name) group_left() (6.0 * job:slo_error_budget:ratio)
and
  job:slo_error:ratio30m > on(name) group_left() (6.0 * job:slo_error_budget:ratio)
)) * on(name) group_left(channel) job:slo_labels_info
			`,
		},
		rulefmt.Rule{
			Alert: "SLOErrorBudgetSlowBurn",
			For:   model.Duration(time.Hour),
			Labels: map[string]string{
				"severity": "ticket",
			},
			Expr: `
((
  job:slo_error:ratio1d > on(name) group_left() (3.0 * job:slo_error_budget:ratio)
and
  job:slo_error:ratio2h > on(name) group_left() (3.0 * job:slo_error_budget:ratio)
)
or
(
  job:slo_error:ratio3d > on(name) group_left() (1.0 * job:slo_error_budget:ratio)
and
  job:slo_error:ratio6h > on(name) group_left() (1.0 * job:slo_error_budget:ratio)
)) * on(name) group_left(channel) jobs:slo_labels_info
			`,
		},
	}
)

View Source

var (
	// BatchProcessingTemplateRules map from the job:slo_batch_* time series to
	// the SLO-compliant job:slo_error:ratio<I> series that are used to power
	// alerts.
	BatchProcessingTemplateRules = flattenRules(

		rulefmt.Rule{
			Record: "job:slo_batch_error:interval",
			Expr: `
1.0 - clamp_max(
  job:slo_batch_throughput:interval / job:slo_batch_throughput_target:max,
  1.0
)
			`,
		},

		forIntervals(AlertWindows,
			rulefmt.Rule{
				Record: "job:slo_error:ratio%s",
				Expr:   `avg_over_time(job:slo_batch_error:interval[%s])`,
			},
		),
	)
)

View Source

var (
	// ErrorRateTemplateRules map from the job:slo_error_rate_total and
	// job:slo_error_rate_errors time series to the SLO-compliant
	// job:slo_error:ratio<I> series that are used to power alerts.
	ErrorRateTemplateRules = flattenRules(

		forIntervals(AlertWindows, rulefmt.Rule{
			Record: "job:slo_error:ratio%s",
			Expr:   `((job:slo_error_rate_errors:rate%[1]s) or (0 * job:slo_error_rate_total:rate%[1]s)) / job:slo_error_rate_total:rate%[1]s`,
		}),
	)
)

View Source

var (
	// LatencyTemplateRules map from the job:slo_latency_* time series to the
	// SLO-compliant job:slo_error:ratio<I> series than are used to power
	// alerts.
	LatencyTemplateRules = flattenRules(

		forIntervals(AlertWindows, rulefmt.Rule{
			Record: "job:slo_error:ratio%s",
			Expr:   `(job:slo_latency_total:rate%[1]s - job:slo_latency_observation:rate%[1]s) / job:slo_latency_total:rate%[1]s`,
		}),
	)
)

Functions ¶

func MustRegisterTemplate ¶

func MustRegisterTemplate(slo SLO, rules ...rulefmt.Rule)

MustRegisterTemplate installs the rules that map template specific SLO intermediate calculations to the job:slo_error:ratio<I> series that power alerts. This is called from the place a template is implemented.

Types ¶

type BatchProcessingSLO ¶

type BatchProcessingSLO struct {
	Deadline   serializeableDuration // time after starting the batch that it must finish
	Volume     string                // expected maximum volume to be processed by a single batch run
	Throughput string                // measure of batch throughput
	// contains filtered or unexported fields
}

BatchProcessingSLO is used to construct SLOs around large batch processes that the business demands finishes within a given deadline.

To use this template, you provide a measure of throughput for the batch process which is only present when the job is underway. The SLO then uses an estimated measure of maximum expected volume and the business deadline to compute a target throughput, then measures SLO compliance against how well the batch process meets the target.

It can be a good idea to compute the volume measurement by taking a record of previous historic maximums and applying a growth multiplier that is appropriate for the business context. If you're processing a number of payments, and your peak volume comes once a month, expecting 1.5x the maximum volume processed by the batch job in the last 60 days might be a good starting point.

The important characteristics of this SLO are:

- Error budget is consumed at a rate proportional to unmet target performance - Error budget is consumed even by batches that process less-than-maximum volume

One thing to note is that throughput exceeding the target threshold is considered 0% error, rather than some negative error value. This is a deliberate choice to avoid encouraging spiky throughput values, but may be toggled in future.

func (BatchProcessingSLO) GetName ¶

func (b BatchProcessingSLO) GetName() string

func (BatchProcessingSLO) Rules ¶

func (b BatchProcessingSLO) Rules() []rulefmt.Rule

type ErrorRateSLO ¶

type ErrorRateSLO struct {
	Errors string
	Total  string
	// contains filtered or unexported fields
}

To use this template, you provide a parameterised rate of requests and errors that are sliced across multiple time windows.

func (ErrorRateSLO) GetName ¶

func (b ErrorRateSLO) GetName() string

func (ErrorRateSLO) Rules ¶

func (e ErrorRateSLO) Rules() []rulefmt.Rule

type LatencySLO ¶

type LatencySLO struct {
	RequestClass string // request class references a latency target
	Total        string // parameterized rate of total requests
	Observation  string // parameterized rate of histogram bucket
	// contains filtered or unexported fields
}

LatencySLO is used to construct SLOs based on latency.

To use this template, you provide a parameterized rate of total requests, a parameterized counter that tracks the number of observations (histogram bucket) and request class that references a latency target.

This template allows defining SLOs as follows:

90% requests < 300ms 99% requests < 1000ms

func (LatencySLO) GetName ¶

func (b LatencySLO) GetName() string

func (LatencySLO) Rules ¶

func (l LatencySLO) Rules() []rulefmt.Rule

type Pipeline ¶

type Pipeline struct {
	// Name defines the RuleGroup name in Prometheus
	Name string

	// SLORules is where each SLO should place the appropriate rules that power the
	// post-processing and alert trailers.
	SLORules []rulefmt.Rule
}

Pipeline can build a RuleGroup that powers the generation of SLO time series. The RuleGroup generated by the Pipeline will include rules installed by templates and the global alerting windows, with each SLOs registered on a Pipeline instance via the MustRegister() method.

func NewPipeline ¶

func NewPipeline(name string) *Pipeline

func (*Pipeline) Build ¶

func (p *Pipeline) Build() rulefmt.RuleGroups

func (*Pipeline) MustRegister ¶

func (p *Pipeline) MustRegister(slos ...SLO)

type SLO ¶

type SLO interface {
	// GetName returns a globally unique name for the SLO
	GetName() string
	// Rules generates Prometheus recording rules that implement the SLO definition
	Rules() []rulefmt.Rule
}

SLO the base interface type for all SLOs

func ParseDefinitions ¶

func ParseDefinitions(payload []byte) ([]SLO, error)

ParseDefinitions loads a YAML file of configured templates that looks like this:

---
definitions:
  - template: BatchProcessingSLO
    definition:
      name: MarkPaymentsAsPaidMeetsDeadline
      ...

and produces a list of SLOs. This is the file format we expect users to be providing to the slo-builder.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL