watchdog

package module
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 12, 2022 License: Apache-2.0, MIT Imports: 14 Imported by: 24

README

Go memory watchdog

🐺 A library to curb OOMs by running Go GC according to a user-defined policy.

godocs build status

Package watchdog runs a singleton memory watchdog in the process, which watches memory utilization and forces Go GC in accordance with a user-defined policy.

There three kinds of watchdogs:

  1. heap-driven (watchdog.HeapDriven()): applies a heap limit, adjusting GOGC dynamically in accordance with the policy.
  2. system-driven (watchdog.SystemDriven()): applies a limit to the total system memory used, obtaining the current usage through elastic/go-sigar.
  3. cgroups-driven (watchdog.CgroupDriven()): discovers the memory limit from the cgroup of the process (derived from /proc/self/cgroup), or from the root cgroup path if the PID == 1 (which indicates that the process is running in a container). It uses the cgroup stats to obtain the current usage.

The watchdog's behaviour is controlled by the policy, a pluggable function that determines when to trigger GC based on the current utilization. This library ships with two policies:

  1. watermarks policy (watchdog.NewWatermarkPolicy()): runs GC at configured watermarks of memory utilisation.
  2. adaptive policy (watchdog.NewAdaptivePolicy()): runs GC when the current usage surpasses a dynamically-set threshold.

You can easily write a custom policy tailored to the allocation patterns of your program.

The recommended way to set up the watchdog is as follows, in descending order of precedence. This logic assumes that the library supports setting a heap limit through an environment variable (e.g. MYAPP_HEAP_MAX) or config key.

  1. If heap limit is set and legal, initialize a heap-driven watchdog.
  2. Otherwise, try to use the cgroup-driven watchdog. If it succeeds, return.
  3. Otherwise, try to initialize a system-driven watchdog. If it succeeds, return.
  4. Watchdog initialization failed. Log a warning to inform the user that they're flying solo.

Running the tests

Given the low-level nature of this component, some tests need to run in isolation, so that they don't carry over Go runtime metrics. For completeness, this module uses a Docker image for testing, so we can simulate cgroup memory limits.

The test execution and docker builds have been conveniently packaged in a Makefile. Run with:

$ make

Why is this even needed?

The garbage collector that ships with the go runtime is pretty good in some regards (low-latency, negligible no stop-the-world), but it's insatisfactory in a number of situations that yield ill-fated outcomes:

  1. it is incapable of dealing with bursty/spiky allocations efficiently; depending on the workload, the program may OOM as a consequence of not scheduling GC in a timely manner.
  2. part of the above is due to the fact that go doesn't concern itself with any limits. To date, it is not possible to set a maximum heap size.
  3. its default policy of scheduling GC when the heap doubles, coupled with its ignorance of system or process limits, can easily cause it to OOM.

For more information, check out these GitHub issues:

License

Dual-licensed: MIT, Apache Software License v2, by way of the Permissive License Stack.

Documentation

Overview

Package watchdog runs a singleton memory watchdog in the process, which watches memory utilization and forces Go GC in accordance with a user-defined policy.

There three kinds of watchdogs:

  1. heap-driven (watchdog.HeapDriven()): applies a heap limit, adjusting GOGC dynamically in accordance with the policy.
  2. system-driven (watchdog.SystemDriven()): applies a limit to the total system memory used, obtaining the current usage through elastic/go-sigar.
  3. cgroups-driven (watchdog.CgroupDriven()): discovers the memory limit from the cgroup of the process (derived from /proc/self/cgroup), or from the root cgroup path if the PID == 1 (which indicates that the process is running in a container). It uses the cgroup stats to obtain the current usage.

The watchdog's behaviour is controlled by the policy, a pluggable function that determines when to trigger GC based on the current utilization. This library ships with two policies:

  1. watermarks policy (watchdog.NewWatermarkPolicy()): runs GC at configured watermarks of memory utilisation.
  2. adaptive policy (watchdog.NewAdaptivePolicy()): runs GC when the current usage surpasses a dynamically-set threshold.

You can easily write a custom policy tailored to the allocation patterns of your program.

The recommended way to set up the watchdog is as follows, in descending order of precedence. This logic assumes that the library supports setting a heap limit through an environment variable (e.g. MYAPP_HEAP_MAX) or config key.

  1. If heap limit is set and legal, initialize a heap-driven watchdog.
  2. Otherwise, try to use the cgroup-driven watchdog. If it succeeds, return.
  3. Otherwise, try to initialize a system-driven watchdog. If it succeeds, return.
  4. Watchdog initialization failed. Log a warning to inform the user that they're flying solo.

Index

Constants

View Source
const PolicyTempDisabled uint64 = math.MaxUint64

PolicyTempDisabled is a marker value for policies to signal that the policy is temporarily disabled. Use it when all hope is lost to turn around from significant memory pressure (such as when above an "extreme" watermark).

Variables

View Source
var (
	// Logger is the logger to use. If nil, it will default to a logger that
	// proxies to a standard logger using the "[watchdog]" prefix.
	Logger logger = &stdlog{log: log.New(log.Writer(), "[watchdog] ", log.LstdFlags|log.Lmsgprefix)}

	// Clock can be used to inject a mock clock for testing.
	Clock = clock.New()

	// ForcedGCFunc specifies the function to call when forced GC is necessary.
	// Its default value is runtime.GC, but it can be set to debug.FreeOSMemory
	// to force the release of memory to the OS.
	ForcedGCFunc = runtime.GC

	// NotifyGC, if non-nil, will be called when a GC has happened.
	// Deprecated: use RegisterPostGCNotifee instead.
	NotifyGC func() = func() {}

	// HeapProfileThreshold sets the utilization threshold that will trigger a
	// heap profile to be taken automatically. A zero value disables this feature.
	// By default, it is disabled.
	HeapProfileThreshold float64

	// HeapProfileMaxCaptures sets the maximum amount of heap profiles a process will generate.
	// This limits the amount of episodes that will be captured, in case the
	// utilization climbs repeatedly over the threshold. By default, it is 10.
	HeapProfileMaxCaptures = uint(10)

	// HeapProfileDir is the directory where the watchdog will write the heap profile.
	// It will be created if it doesn't exist upon initialization. An error when
	// creating the dir will not prevent heapdog initialization; it will just
	// disable the heap profile capture feature. If zero-valued, the feature is
	// disabled.
	//
	// HeapProfiles will be written to path <HeapProfileDir>/<RFC3339Nano formatted timestamp>.heap.
	HeapProfileDir string
)

The watchdog is designed to be used as a singleton; global vars are OK for that reason.

View Source
var (
	// ErrAlreadyStarted is returned when the user tries to start the watchdog more than once.
	ErrAlreadyStarted = fmt.Errorf("singleton memory watchdog was already started")
)
View Source
var ErrNotSupported = errors.New("watchdog run mode not supported")

ErrNotSupported is returned when the watchdog does not support the requested run mode in the current OS/arch.

Functions

func CgroupDriven added in v1.0.1

func CgroupDriven(frequency time.Duration, policyCtor PolicyCtor) (err error, stopFn func())

CgroupDriven initializes a cgroups-driven watchdog. It will try to discover the memory limit from the cgroup of the process (derived from /proc/self/cgroup), or from the root cgroup path if the PID == 1 (which indicates that the process is running in a container).

Memory usage is calculated by querying the cgroup stats.

This function will return an error immediately if the OS does not support cgroups, or if another error occurs during initialization. The caller can then safely fall back to the system driven watchdog.

func HeapDriven added in v1.0.1

func HeapDriven(limit uint64, minGOGC int, policyCtor PolicyCtor) (err error, stopFn func())

HeapDriven starts a singleton heap-driven watchdog, which adjusts GOGC dynamically after every GC, to honour the policy requirements.

Providing a zero-valued limit will error. A minimum GOGC value is required, so as to avoid overscheduling GC, and overfitting to a specific target.

func RegisterPostGCNotifee added in v1.2.0

func RegisterPostGCNotifee(f func()) (unregister func())

RegisterPostGCNotifee registers a function that is called every time a GC has happened, both GC runs triggered by the Go runtime and by watchdog. The unregister function returned can be used to unregister this notifee.

func RegisterPreGCNotifee added in v1.2.0

func RegisterPreGCNotifee(f func()) (unregister func())

RegisterPreGCNotifee registers a function that is called before watchdog triggers a GC run. It is ONLY called when watchdog triggers a GC run, not when the Go runtime triggers it. The unregister function returned can be used to unregister this notifee.

func SystemDriven added in v1.0.1

func SystemDriven(limit uint64, frequency time.Duration, policyCtor PolicyCtor) (err error, stopFn func())

SystemDriven starts a singleton system-driven watchdog.

The system-driven watchdog keeps a threshold, above which GC will be forced. The watchdog polls the system utilization at the specified frequency. When the actual utilization exceeds the threshold, a GC is forced.

This threshold is calculated by querying the policy every time that GC runs, either triggered by the runtime, or forced by us.

Types

type Policy

type Policy interface {
	// Evaluate determines when the next GC should take place. It receives the
	// current usage, and it returns the next usage at which to trigger GC.
	Evaluate(scope UtilizationType, used uint64) (next uint64)
}

Policy is polled by the watchdog to determine the next utilisation at which a GC should be forced.

type PolicyCtor added in v1.0.1

type PolicyCtor func(limit uint64) (Policy, error)

PolicyCtor is a policy constructor.

func NewAdaptivePolicy added in v1.0.1

func NewAdaptivePolicy(factor float64) PolicyCtor

NewAdaptivePolicy creates a policy that forces GC when the usage surpasses a user-configured percentage (factor) of the available memory.

This policy recalculates the next target as usage+(limit-usage)*factor.

func NewWatermarkPolicy added in v1.0.1

func NewWatermarkPolicy(watermarks ...float64) PolicyCtor

NewWatermarkPolicy creates a watchdog policy that schedules GC at concrete watermarks. When queried, it will determine the next trigger point based on the current utilisation. If the last watermark is surpassed, the policy will be disarmed. It is recommended to set an extreme watermark as the last element (e.g. 0.99) to prevent the policy from disarming too soon.

type UtilizationType added in v1.0.1

type UtilizationType int

UtilizationType is the utilization metric in use.

const (
	// UtilizationSystem specifies that the policy compares against actual used
	// system memory.
	UtilizationSystem UtilizationType = iota
	// UtilizationProcess specifies that the watchdog is using process limits.
	UtilizationProcess
	// UtilizationHeap specifies that the policy compares against heap used.
	UtilizationHeap
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL