configuration-anomaly-detection

module
v0.0.0-...-96040f1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: Apache-2.0

README

Go Report Card PkgGoDev codecov License


Configuration Anomaly Detection

Configuration Anomaly Detection

About

Configuration Anomaly Detection (CAD) is responsible for reducing manual SRE effort by pre-investigating alerts, detecting cluster anomalies and sending relevant communications to the cluster owner.

Contributing

To contribute to CAD, please see our CONTRIBUTING Document.

Documentation

CAD CLI

  • cadctl -- Performs investigation workflow.

Investigations

Every alert managed by CAD corresponds to an investigation, representing the executed code associated with the alert.

Investigation specific documentation can be found in the according investigation folder, e.g. for ClusterHasGoneMissing.

Integrations

  • AWS -- Logging into the cluster, retreiving instance info and AWS CloudTrail events.
  • PagerDuty -- Retrieving alert info, esclating or silencing incidents, and adding notes.
  • OCM -- Retrieving cluster info, sending service logs, and managing (post, delete) limited support reasons.
  • osd-network-verifier -- Tool to verify the pre-configured networking components for ROSA and OSD CCS clusters.

Overview

  • CAD is a command line tool that is run in tekton pipelines.
  • The tekton service is running on an app-sre cluster.
  • CAD is triggered by PagerDuty webhooks configured on selected services, meaning that all alerts in that service trigger a CAD pipeline.
  • CAD uses the data received via the webhook to determine which investigation to start.

CAD Overview CAD Overview

Templates

  • Update-Template -- Updating configuration-anomaly-detection-template.Template.yaml.
  • OpenShift -- Used by app-interface to deploy the CAD resources on a target cluster.

Dashboards

Grafana dashboard configmaps are stored in the Dashboards directory. See app-interface for further documentation on dashboards.

Deployment

  • Tekton -- Installation/configuration of Tekton and triggering pipeline runs.
  • Skip Webhooks -- Skipping the eventlistener and creating the pipelinerun directly.
  • Namespace -- Allowing the code to ignore the namespace.

Boilerplate

PipelinePruner

Required ENV variables

  • CAD_OCM_CLIENT_ID: refers to the OCM client ID used by CAD to initialize the OCM client
  • CAD_OCM_CLIENT_SECRET: refers to the OCM client secret used by CAD to initialize the OCM client
  • CAD_OCM_URL: refers to the used OCM url used by CAD to initialize the OCM client
  • AWS_ACCESS_KEY_ID: refers to the access key id of the base AWS account used by CAD
  • AWS_SECRET_ACCESS_KEY: refers to the secret access key of the base AWS account used by CAD
  • CAD_AWS_CSS_JUMPROLE: refers to the arn of the RH-SRE-CCS-Access jumprole
  • CAD_AWS_SUPPORT_JUMPROLE: refers to the arn of the RH-Technical-Support-Access jumprole
  • CAD_ESCALATION_POLICY: refers to the escalation policy CAD should use to escalate the incident to
  • CAD_PD_EMAIL: refers to the email for a login via mail/pw credentials
  • CAD_PD_PW: refers to the password for a login via mail/pw credentials
  • CAD_PD_TOKEN: refers to the generated private access token for token-based authentication
  • CAD_PD_USERNAME: refers to the username of CAD on PagerDuty
  • CAD_SILENT_POLICY: refers to the silent policy CAD should use if the incident shall be silent
  • PD_SIGNATURE: refers to the PagerDuty webhook signature (HMAC+SHA256)
  • X_SECRET_TOKEN: refers to our custom Secret Token for authenticating against our pipeline
  • CAD_PROMETHEUS_PUSHGATEWAY: refers to the URL cad will push metrics to
  • BACKPLANE_URL: refers to the backplane url to use
  • BACKPLANE_INITIAL_ARN: refers to the initial ARN used for the isolated backplane jumprole flow

Optional ENV variables

  • BACKPLANE_PROXY: refers to the proxy CAD uses for the isolated backplane access flow.

Note: BACKPLANE_PROXY is required for local development, as a backplane api is only accessible through the proxy.

For Red Hat employees, these environment variables can be found in the SRE-P vault.

Directories

Path Synopsis
Package main is the main package
Package main is the main package
cmd
Package cmd holds the cadctl cobra data
Package cmd holds the cadctl cobra data
cmd/investigate
Package investigate holds the investigate command
Package investigate holds the investigate command
hack
pkg
aws
Package aws contains functions related to aws sdk
Package aws contains functions related to aws sdk
aws/mock
Package awsmock is a generated GoMock package.
Package awsmock is a generated GoMock package.
investigations
Package investigation contains base functions for investigations
Package investigation contains base functions for investigations
investigations/ccam
Package ccam Cluster Credentials Are Missing (CCAM) provides a service for detecting missing cluster credentials
Package ccam Cluster Credentials Are Missing (CCAM) provides a service for detecting missing cluster credentials
investigations/chgm
Package chgm contains functionality for the chgm investigation
Package chgm contains functionality for the chgm investigation
investigations/cpd
Package cpd contains functionality for the ClusterProvisioningDelay investigation
Package cpd contains functionality for the ClusterProvisioningDelay investigation
logging
Package logging wraps the zap logging package to provide easier access and initialization of the logger
Package logging wraps the zap logging package to provide easier access and initialization of the logger
managedcloud
Package managedcloud contains functionality to access cloud environments of managed clusters
Package managedcloud contains functionality to access cloud environments of managed clusters
metrics
Package metrics provides prometheus instrumentation for CAD
Package metrics provides prometheus instrumentation for CAD
networkverifier
Package networkverifier contains functionality for running the network verifier
Package networkverifier contains functionality for running the network verifier
ocm
Package ocm contains ocm api related functions
Package ocm contains ocm api related functions
ocm/mock
Package ocmmock is a generated GoMock package.
Package ocmmock is a generated GoMock package.
pagerduty
Package pagerduty contains wrappers for pagerduty api calls
Package pagerduty contains wrappers for pagerduty api calls
pagerduty/mock
Package pdmock is a generated GoMock package.
Package pdmock is a generated GoMock package.
utils
Package utils contains utility functions
Package utils contains utility functions

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL