kubernetes-pressurecooker

module

v0.2.5 Latest Latest Go to latest Published: Jun 5, 2020 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/rtreffer/kubernetes-pressurecooker

Links

Open Source Insights

README ¶

Kubernetes Pressure Cooker

Automatically taint and evict nodes with high CPU overload. Derived from kubernetes-loadwatcher.

The load average describes the average length of the run queue whenever a scheduling decision is made. But it does not tell us how often processes were waiting for CPU time. The kernel pressure metrics (psi by facebook) describes how often there was not enough CPU available.

Synopsis

A kubernetes node can be overcommited on CPU: there might be more processes that want more CPU than requested. This can easily happen due to variable resource usage per pod, variance in hardware or variance in pod distributions. By default, Kubernetes will not evict Pods from a node based on CPU usage, since CPU is considered a compressible resource. However if a node does not have enough CPU resources to handle all pods it will impose additional latencies that can be undesirable based on the workload (e.g. web/interactive traffic).

This project contains a small Kubernetes controller that watches each node's CPU pressure; when a certain threshold is exceeded, the node will be tainted (so that no additional workloads are scheduled on an already-overloaded node) and finally the controller will start to evict Pods from the node.

Pressure is more sensitive for small overloads, e.g. with pressure information it is easy to express "there is an up to 20% chance to not get CPU instantly when needed".

How it works

This controller can be started with two threshold flags: -taint-threshold and -evict-threshold. There are also safeguard flags -min-pod-age and -eviction-backoff. The controller will continuously monitor a node's CPU pressure.

If the CPU pressure (5min average) exceeds the taint threshold, the node will be tainted with a pressurecooker/load-exceeded taint with the PreferNoSchedule effect. This will instruct Kubernetes to not schedule any additional workloads on this node if at all possible.
If the CPU load (both 5min and 15min average) falls back below the taint threshold, the taint will be removed again.
If the CPU load (15 min average) exceeds the eviction threshold, the controller will pick a suitable Pod running on the node and evict it. However, the following types of Pods will not be evicted:
- Pods with the Guaranteed QoS class
- Pods belonging to Stateful Sets
- Pods belonging to Daemon Sets
- Standalone pods not managed by any kind of controller
- Pods running in the kube-system namespace or with a critical priorityClassName
- Pods newer than min-pod-age

After a Pod was evicted, the next Pod will be evicted after a configurable eviction backoff (controllable using the evict-backoff argument) if the load15 is still above the eviction threshold.

Older pods will be evicted first. The ration to remove old pods first is tat it is usually better to move well behaving pods away from bad neighbors than moving bad neighbors through the cluster. And as a node will always stay in a healthy state it can be assumed that the older pods are less likely to be the cause of an overload.

Directories ¶

Path	Synopsis
cmd
pkg
config
jsonpatch
pressurecooker

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL