Azure ScheduledEvents Manager
Manager for Linux VMs and Kubernetes clusters for Azure ScheduledEvents (planned VM maintenance) with Prometheus metrics support.
Drains nodes automatically when Redeploy
, Reboot
, Preemt
or Terminate
is detected and approves (start event ASAP) the event automatically.
Kubernetes support
Automatically drains and uncordon nodes before ScheduledEvents (Reboot, Redeploy, Terminate) to ensure service reliability.
AKS and custom Kubernetes clusters on Azure are supported.
VM support
Automatically executes commands for drain and uncordon before ScheduledEvents (Reboot, Redeploy, Terminate) to ensure service reliability.
Notification support
Supports shoutrrr for notifications.
Configuration
Usage:
azure-scheduledevents-manager [OPTIONS]
Application Options:
--debug debug mode [$DEBUG]
-v, --verbose verbose mode [$VERBOSE]
--log.json Switch log output to json format [$LOG_JSON]
--server.bind= Server address (default: :8080) [$SERVER_BIND]
--server.timeout.read= Server read timeout (default: 5s) [$SERVER_TIMEOUT_READ]
--server.timeout.write= Server write timeout (default: 10s) [$SERVER_TIMEOUT_WRITE]
--scrape.time= Scrape time in seconds (default: 1m) [$SCRAPE_TIME]
--azure.metadatainstance-url= Azure ScheduledEvents API URL (default:
http://169.254.169.254/metadata/instance?api-version=2019-08-01)
[$AZURE_METADATAINSTANCE_URL]
--azure.scheduledevents-url= Azure ScheduledEvents API URL (default:
http://169.254.169.254/metadata/scheduledevents?api-version=2019-08-01)
[$AZURE_SCHEDULEDEVENTS_URL]
--azure.timeout= Azure API timeout (seconds) (default: 30s) [$AZURE_TIMEOUT]
--azure.error-threshold= Azure API error threshold (after which app will panic) (default: 0)
[$AZURE_ERROR_THRESHOLD]
--azure.approve-scheduledevent Approve ScheduledEvent and start (if possible) start them ASAP
[$AZURE_APPROVE_SCHEDULEDEVENT]
--vm.nodename= VM node name [$VM_NODENAME]
--drain.enable Enable drain handling [$DRAIN_ENABLE]
--drain.mode=[kubernetes|command] Mode [$DRAIN_MODE]
--drain.not-before= Dont drain before this time (default: 5m) [$DRAIN_NOT_BEFORE]
--drain.events= Enable drain handling (default: reboot, redeploy, preempt, terminate) [$DRAIN_EVENTS]
--drain.wait-before-cmd= Wait duration before trigger drain command (default: 0) [$DRAIN_WAIT_BEFORE_CMD]
--drain.wait-after-cmd= Wait duration before trigger drain command (default: 0) [$DRAIN_WAIT_AFTER_CMD]
--command.test.cmd= Test command in command mode [$COMMAND_TEST_CMD]
--command.drain.cmd= Drain command in command mode [$COMMAND_DRAIN_CMD]
--command.uncordon.cmd= Uncordon command in command mode [$COMMAND_UNCORDON_CMD]
--kube.nodename= Kubernetes node name [$KUBE_NODENAME]
--kube.drain.args= Arguments for kubectl drain [$KUBE_DRAIN_ARGS]
--kube.drain.dry-run Do not drain, uncordon or label any node [$KUBE_DRAIN_DRY_RUN]
--notification= Shoutrrr url for notifications (https://containrrr.github.io/shoutrrr/) [$NOTIFICATION]
--notification.messagetemplate= Notification template (default: %v) [$NOTIFICATION_MESSAGE_TEMPLATE]
--metrics-requeststats Enable request stats metrics [$METRICS_REQUESTSTATS]
Help Options:
-h, --help Show this help message
Metrics
Metric |
Description |
azure_scheduledevent_document_incarnation |
Document incarnation number (version) |
azure_scheduledevent_event |
Fetched events from API |
azure_scheduledevent_event_drain |
Timestamp of drain (start and finish time) |
azure_scheduledevent_event_approval |
Timestamp of last event acknowledge |
azure_scheduledevent_request |
Request histogram (count and request duration; disabled by default) |
azure_scheduledevent_request_error |
Counter for failed requests |
VM support
This example executes /host-drain.sh
on the host when ScheduledEvent is received.
The docker container needs to access the host so it needs privileged permissions (privileged, pid=host, must run as root).
Container can be run as readonly container.
Run via docker:
docker run --restart=always --read-only --user=0 --privileged --pid=host --restart=always --name=azure-scheduledevents-manager \
webdevops/azure-scheduledevents-manager:latest \
--drain.enable \
--drain.mode=command \
--drain.not-before=15m \
--azure.approve-scheduledevent \
--command.test.cmd="nsenter -m/proc/1/ns/mnt -- /usr/bin/test -x /host-drain.sh" \
--command.drain.cmd="nsenter -m/proc/1/ns/mnt -- /host-drain.sh \$EVENT_TYPE"
This example will also pass
docker-compose:
version: "3"
services:
scheduledEvents:
image: webdevops/azure-scheduledevents-manager:latest
command:
- --drain.enable
- --drain.mode=command
- --drain.not-before=15m
- --azure.approve-scheduledevent
- --command.test.cmd="nsenter -m/proc/1/ns/mnt -- /usr/bin/test -x /host-drain.sh"
- --command.drain.cmd="nsenter -m/proc/1/ns/mnt -- /host-drain.sh $$EVENT_TYPE"
user: 0:0
privileged: true
pid: "host"
read_only: true
restart: always
Environment variables
all Docker environment variables are passed to drain command, also following event variables:
- EVENT_ID
- EVENT_SOURCE
- EVENT_STATUS
- EVENT_TYPE
- EVENT_NOTBEFORE
- EVENT_RESOURCES
- EVENT_RESOURCETYPE
Kubernetes deployment
see deployment
HTTP endpoints
Endpoint |
Description |
/metrics |
Prometheus metric endpoint |
/healthz |
Health endpoint (always HTTP 200 if running) |
/readyz |
Ready endpoint (always HTTP 200 if running and if no ScheduledEvent of type $DRAIN_EVENTS received) |
/drainz |
Ready endpoint (always HTTP 200 if running and if no ScheduledEvent of type $DRAIN_EVENTS received and drain was executed) |