reacter

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2019 License: LGPL-2.1 Imports: 43 Imported by: 0

README

Reacter

A tool for generating, consuming, and handling system monitoring events

Overview

  1. Executes Nagios-compatible check scripts
  2. Collects output and prints it as a formatted JSON string to standard output
  3. Publish output to an AMQP message broker
  4. Consumes output from an AMQP message broker
  5. Conditionally executes handler scripts based on check details and handler criteria

Checks: reacter check

Check scripts are consistent with the Nagios Plugin API. Checks can be any shell-executable program that exits with status 0 (OK), 1 (Warning), 2 (Critical), or 3+ (Unknown). Plugin output and performance data is parsed from the check's standard output.

Configuration

Checks are configured via a YAML file placed in a directory that Reacter will load the definitions from (specified via the --config-dir flag.) An example check definition looks like the following:

---
checks:
- name:                'my_cool_check'
  command:             ['my_cool_check', '--warning', '10', '--critical', '5']
  directory:           '/usr/local/bin'
  interval:            30
  timeout:             5000
  fall:                3
  rise:                2
  flap_threshold_high: 0.35
  flap_threshold_low:  0.15
  environment:
    HOME:      '/srv/home'
    OTHER_VAR: 6
    IS_COOL:   true

The configuration consists of a top-level checks array populated with one or more check definitions. Check definition fields are:

Field Type Required Default Description
name String Yes The name of the check
command Array(String) Yes The command expressed as an array of command and command-line parameters
directory String No $(pwd) The working directory to use when executing the command
interval Integer No 60 How often (in seconds) to execute the check
timeout Integer No 3000 The timeout (in milliseconds) before killing the check if it hasn't finished
fall Integer No 1 How many checks need to fail before reporting the change in status
rise Integer No 1 How many checks need to succeed after failing before reporting okay
environment Hash(String,Any) No A hash of key-value pairs that will be passed to the command as environment variables; replaces the calling shell environment
flap_threshold_high Float No 0.5 Maximum instability a service needs to be (0.0-1.0) to start flapping
flap_threshold_low Float No 0.25 How unstable a service needs to be (0.0-1.0) to stop flapping
Publication

Check results can be emitted to standard output for consumption by the reacter handler invocation of this utility, or by another service/program. One of the intended use cases is to emit results an HTTP POST them to a web service which will enqueue the messages to an AMQP message broker for later consumption by handlers.

The output format of a check is as follows:

{
  "check":{
    "node_name":"myhost",
    "name":"my_cool_check",
    "command":["my_cool_check", "--warning", "10", "--critical", "5"],
    "timeout": 5000,
    "enabled":true,
    "state": 0,
    "hard":  true,
    "changed": true,
    "interval": 30,
    "rise": 3,
    "fall": 2,
    "observations": {
      "size": 21,
      "flapping": false,
      "flap_detection": true,
      "flap_threshold_low": 0.15,
      "flap_threshold_high":0.35,
      "flap_factor": 0
    }
  },
  "output": "OK",
  "error": false,
  "timestamp":"1970-01-01T12:59:00.000000000-04:00"
}

This is formatted to be readable, but is output from reacter check as a single line, each line representing the output from one check's execution. Some of the fields in the output are described below.

Field Type Description
check.node_name String The hostname of the host the check executed on, or the value of --node-name
check.enabled Boolean Whether the check is enabled or not
check.state Integer The exit status of the check script
check.hard Boolean If a check is in the process of rising or falling, the status will remain unchanged but this field will be false
check.changed Boolean If the previous state of a check is different from the current state, this field will be true
error Boolean If the check script experienced an error that prevented execution, this will be true
observations.flapping Boolean If the check is oscillating between an okay and non-okay state, this will be true
observations.size Integer How many of the most-recent check states are stored in memory for flap detection
observations.flap_factor Float The current flap factor, which is compared to the high/low thresholds to determine if the check if flapping
output String The standard output captured from the check script's execution

Handlers: reacter handle

Handlers are executed in response to check results read from standard input. The handler definitions define the conditions on which a handler will be executed. The conditions include factors such as node name, check name, state, whether the check is flapping, and whether the check has changed state. Using these conditions, handlers can be executed for only a subset of check results as they stream in. Multiple handlers can respond to the same result, as each result is evaluated against each handler definition as it is processed.

Configuration

Handlers, like checks, are configured via a YAML file placed in a directory that Reacter will load the definitions from (specified via the --config-dir flag.) An example handler definition looks like the following:

---
handlers:
- name:                'my_team_slack_chat'
  command:             ['reacter-slack']
  timeout:             6000
  directory:           '/usr/local/bin'
  query:               ['bash', '-c', 'get_my_nodes > /tmp/node-list.txt']
  query_timeout:       3000
  nodefile:            /tmp/node-list.txt

  node_names:
  - my_node1
  - my_node2

  checks:
  - my_cool_check

  flapping: false
  only_changes: true

  parameters:
    token:   abc123def456
    channel: my-channel

  environment:
    HOME:      '/srv/home'

The configuration consists of a top-level handlers array populated with one or more handler definitions. Handler definition fields are:

Field Type Required Default Description
checks Array(String) No A list of check names to respond to
command Array(String) Yes The handler command expressed as an array of command and command-line parameters
cooldown Duration No 3000 How long to wait after the handler has fired before firing again
directory String No $(pwd) The working directory to use when executing the command
disable Boolean No false Whether to disable the handler
environment Hash(String,Any) No A hash of key-value pairs to pass to the handler command as environment variables; replaces the calling shell environment
name String Yes The name of the handler
node_names Array(String) No A list of nodes to respond to (will override query and nodefile)
nodefile String No A path to a file containing a list of nodes to respond to
only_changes Boolean No false Whether to only handle state changes or not (uses the check result changed field)
parameters Hash(String,Any) No A hash of key-value pairs to pass to the handler command as environment variables; prefixed with REACTER_PARAM_
query_timeout Duration No 3000 How long to wait for the query command to execute before killing it
query Array(String) No A command to execute before the handler that will return a list of nodes to respond to
skip_flapping Boolean No true Whether to skip flapping checks or not
skip_ok Boolean No false Whether to only handle checks in a non-okay state
Handler Scripts

Handler scripts are executed only when a handler definition's conditions are met. These scripts can be built to do anything that you need done to respond to a check result. This typically includes things like sending a PagerDuty alert, posting a notification to a Slack channel, or forwarding check data to a time series database. Handler scripts are called with several well-know environment variables that the handler may use to provide context-specific details about the check result being handled. These variables include:

Environment Variable Description
REACTER_CHECK_ID The check's node name concatenated with the check name, joined by a :. This can be used to uniquely identify a check from a specific node for services that require stateful information to clear events after they are first generated.
REACTER_CHECK_NAME The name of the check being handled
REACTER_CHECK_NODE The node name that the check was emitted from (corresponds to --node-name from reacter check)
REACTER_EPOCH The epoch time of the check event (seconds since Jan 1 1970)
REACTER_EPOCH_MS The epoch time of the check event (milliseconds since Jan 1 1970)
REACTER_HANDLER The name of the handler as defined in the handler definition configuration
REACTER_STATE The state of the check result being handled; one of "okay", "warning", "critical", or "unknown"
REACTER_STATE_CHANGED 0 if the state is unchanged, 1 if the check's state has changed
REACTER_STATE_FLAPPING 0 if the check is not flapping, 1 if it is
REACTER_STATE_HARD 0 if the check is rising or falling, 1 if the check is in a hard state
REACTER_STATE_ID The numeric exit status of the check result that was emitted from the check script
REACTER_PARAM_* Expanded to include any parameters specified in the parameters hash for the handler definition. All keys are converted to uppercase.
Node Queries and Caching Features

Documentation

Index

Constants

View Source
const (
	DEFAULT_AMQP_PORT  = 5672
	DEFAULT_QUEUE_NAME = `reacter`
)
View Source
const (
	Unknown MeasurementUnit = 0
	Numeric                 = 1
	Time                    = 2
	Percent                 = 3
	Bytes                   = 4
	Counter                 = 5
)
View Source
const (
	SuccessState  ObservationState = 0
	WarningState                   = 1
	CriticalState                  = 2
	UnknownState                   = 3
)
View Source
const DefaultFlapHighThresh = 0.5
View Source
const DefaultFlapLowThresh = 0.25
View Source
const DefaultMaxObservations = 21
View Source
const FlapBaseCoefficient = 0.8
View Source
const FlapWeightMultiplier = 0.02

Variables

View Source
var DefaultCacheDir = executil.RootOrString(
	`/dev/shm/reacter/handler-queries`,
	`~/.cache/reacter`,
)
View Source
var DefaultCheckInterval = 60
View Source
var DefaultCheckTimeout = 10000
View Source
var DefaultConfigDir = executil.RootOrString(`/etc/reacter/conf.d`, `~/.config/reacter.d`)
View Source
var DefaultConfigFile = executil.RootOrString(`/etc/reacter.yml`, `~/.config/reacter.yml`)
View Source
var DefaultHandleExecTimeout = 6 * time.Second
View Source
var DefaultHandleQueryExecTimeout = 3 * time.Second
View Source
var ZeroconfInstanceName = `reacter`

Functions

func Dir added in v1.0.1

func Dir(useLocal bool, name string) http.FileSystem

Dir returns a http.Filesystem for the embedded assets on a given prefix dir. If useLocal is true, the filesystem's contents are instead used.

func DiscoverEC2ByTag added in v1.0.5

func DiscoverEC2ByTag(tagName string, values ...string) ([]*netutil.Service, error)

func FS added in v1.0.1

func FS(useLocal bool) http.FileSystem

FS returns a http.Filesystem for the embedded assets. If useLocal is true, the filesystem's contents are instead used.

func FSByte added in v1.0.1

func FSByte(useLocal bool, name string) ([]byte, error)

FSByte returns the named file from the embedded assets. If useLocal is true, the filesystem's contents are instead used.

func FSMustByte added in v1.0.1

func FSMustByte(useLocal bool, name string) []byte

FSMustByte is the same as FSByte, but panics if name is not present.

func FSMustString added in v1.0.1

func FSMustString(useLocal bool, name string) string

FSMustString is the string version of FSMustByte.

func FSString added in v1.0.1

func FSString(useLocal bool, name string) (string, error)

FSString is the string version of FSByte.

Types

type Check

type Check struct {
	UID               string                 `json:"id"`
	NodeName          string                 `json:"node_name"`
	Name              string                 `json:"name"`
	Command           interface{}            `json:"command"`
	Timeout           interface{}            `json:"timeout"`
	Enabled           bool                   `json:"enabled"`
	State             ObservationState       `json:"state"`
	HardState         bool                   `json:"hard"`
	StateChanged      bool                   `json:"changed"`
	Parameters        map[string]interface{} `json:"parameters"`
	Environment       map[string]string      `json:"environment"`
	Directory         string                 `json:"directory,omitempty"`
	Interval          interface{}            `json:"interval"`
	FlapThresholdHigh float64                `json:"flap_threshold_high"`
	FlapThresholdLow  float64                `json:"flap_threshold_low"`
	Rise              int                    `json:"rise"`
	Fall              int                    `json:"fall"`
	Observations      *Observations          `json:"observations"`
	EventStream       chan CheckEvent        `json:"-"`
	StopMonitorC      chan bool              `json:"-"`
}

func NewCheck

func NewCheck() *Check

func (*Check) Execute

func (self *Check) Execute() (Observation, error)

func (*Check) ID

func (self *Check) ID() string

func (*Check) IsFallen

func (self *Check) IsFallen() bool

func (*Check) IsFlapping

func (self *Check) IsFlapping() bool

func (*Check) IsOK

func (self *Check) IsOK() bool

func (*Check) IsRisen

func (self *Check) IsRisen() bool

func (*Check) Monitor

func (self *Check) Monitor(eventStream chan CheckEvent) error

func (*Check) StateString

func (self *Check) StateString() string

type CheckEvent

type CheckEvent struct {
	Check       *Check       `json:"check"`
	Observation *Observation `json:"observation,omitempty"`
	Output      string       `json:"output,omitempty"`
	Error       bool         `json:"error,omitempty"`
	Timestamp   time.Time    `json:"timestamp"`
}

type Config

type Config struct {
	ChecksDefinitions []Check `json:"checks"`
}

type Consumer

type Consumer struct {
	ID         string
	Host       string
	Port       int
	Username   string
	Password   string
	Vhost      string
	QueueName  string
	Durable    bool
	Autodelete bool
	Exclusive  bool
	// contains filtered or unexported fields
}

func NewConsumer

func NewConsumer(uri string) (*Consumer, error)

func (*Consumer) Close

func (self *Consumer) Close() error

func (*Consumer) Connect

func (self *Consumer) Connect() error

func (*Consumer) Subscribe

func (self *Consumer) Subscribe() (<-chan string, error)

func (*Consumer) SubscribeRaw

func (self *Consumer) SubscribeRaw() (<-chan amqp.Delivery, error)

type EventRouter

type EventRouter struct {
	NodeName   string
	Handlers   []*Handler
	ConfigFile string
	ConfigDir  string
	CacheDir   string
}

func NewEventRouter

func NewEventRouter() *EventRouter

func (*EventRouter) AddHandler

func (self *EventRouter) AddHandler(handler *Handler) error

func (*EventRouter) LoadConfig

func (self *EventRouter) LoadConfig(path string) error

func (*EventRouter) RegenerateCache

func (self *EventRouter) RegenerateCache()

func (*EventRouter) ReloadConfig

func (self *EventRouter) ReloadConfig() error

Loads the ConfigFile (if present), and recursively scans and load all *.yml files in ConfigDir.

func (*EventRouter) Run

func (self *EventRouter) Run(input io.Reader) error

func (*EventRouter) RunQueryCacher

func (self *EventRouter) RunQueryCacher(interval time.Duration) error

type Handler

type Handler struct {
	Name               string            `json:"name"`
	QueryCommand       interface{}       `json:"query,omitempty"`
	NodeFile           string            `json:"nodefile,omitempty"`
	NodeFileAutoreload bool              `json:"nodefile_autoreload,omitempty"`
	NodeNames          []string          `json:"node_names,omitempty"`
	SkipOK             bool              `json:"skip_ok"`
	CheckNames         []string          `json:"checks,omitempty"`
	States             []int             `json:"states,omitempty"`
	SkipFlapping       bool              `json:"skip_flapping"`
	OnlyChanges        bool              `json:"only_changes"`
	Command            interface{}       `json:"command,omitempty"`
	Environment        map[string]string `json:"environment,omitempty"`
	Parameters         map[string]string `json:"parameters,omitempty"`
	Directory          string            `json:"directory,omitempty"`
	Disable            bool              `json:"disable,omitempty"`
	Timeout            interface{}       `json:"timeout,omitempty"`
	Cooldown           interface{}       `json:"cooldown,omitempty"`
	QueryTimeout       interface{}       `json:"query_timeout,omitempty"`
	CacheDir           string            `json:"-"`
	// contains filtered or unexported fields
}

func (*Handler) Execute

func (self *Handler) Execute(event CheckEvent) error

func (*Handler) ExecuteNodeQuery

func (self *Handler) ExecuteNodeQuery() ([]string, error)

func (*Handler) GetCacheFilename

func (self *Handler) GetCacheFilename() string

func (*Handler) LoadNodeFile

func (self *Handler) LoadNodeFile()

func (*Handler) ShouldExec

func (self *Handler) ShouldExec(check *Check) bool

type HandlerConfig

type HandlerConfig struct {
	HandlerDefinitions []Handler `json:"handlers"`
}

type Measurement

type Measurement struct {
	Unit              MeasurementUnit `json:"unit"`
	Value             float64         `json:"value"`
	WarningThreshold  float64         `json:"warning"`
	CriticalThreshold float64         `json:"critical"`
	Minumum           float64         `json:"minimum"`
	Maximum           float64         `json:"maximum"`
}

func (*Measurement) SetValues

func (self *Measurement) SetValues(valueUOM string, warn string, crit string, min string, max string) error

type MeasurementUnit

type MeasurementUnit int32

type Observation

type Observation struct {
	Timestamp       time.Time              `json:"-"`
	State           ObservationState       `json:"-"`
	Output          []string               `json:"-"`
	Errors          []string               `json:"-"`
	PerformanceData map[string]Measurement `json:"measurements,omitempty"`
}

func (*Observation) SetState

func (self *Observation) SetState(state int)

type ObservationState

type ObservationState int32

func (ObservationState) String added in v1.0.1

func (self ObservationState) String() string

type Observations

type Observations struct {
	Values            []Observation `json:"-"`
	Size              int           `json:"size"`
	Flapping          bool          `json:"flapping"`
	FlapDetect        bool          `json:"flap_detection"`
	FlapThresholdLow  float64       `json:"flap_threshold_low"`
	FlapThresholdHigh float64       `json:"flap_threshold_high"`
	StateChangeFactor float64       `json:"flap_factor"`
}

func NewObservations

func NewObservations() *Observations

func (*Observations) Push

func (self *Observations) Push(observation Observation) error

type Reacter

type Reacter struct {
	NodeName         string             `json:"name"`
	Peers            []*netutil.Service `json:"peers"`
	Checks           []*Check           `json:"-"`
	Events           chan CheckEvent    `json:"-"`
	ConfigFile       string             `json:"-"`
	ConfigDir        string             `json:"-"`
	PrintJson        bool               `json:"-"`
	WriteJson        io.Writer          `json:"-"`
	OnlyPrintChanges bool               `json:"-"`
	SuppressFlapping bool               `json:"-"`
	// contains filtered or unexported fields
}

func NewReacter

func NewReacter() *Reacter

func (*Reacter) AddCheck

func (self *Reacter) AddCheck(checkConfig Check) error

func (*Reacter) LoadConfig

func (self *Reacter) LoadConfig(path string) error

Load the configuration file the given path and append any checks to this instance.

func (*Reacter) ReloadConfig

func (self *Reacter) ReloadConfig() error

Loads the ConfigFile (if present), and recursively scans and load all *.yml files in ConfigDir.

func (*Reacter) Run

func (self *Reacter) Run() error

func (*Reacter) StartEventProcessing

func (self *Reacter) StartEventProcessing()

type Server added in v1.0.1

type Server struct {
	ZeroconfMDNS   bool
	ZeroconfEC2Tag string
	PathPrefix     string
	// contains filtered or unexported fields
}

func NewServer added in v1.0.1

func NewServer(reacter *Reacter) *Server

func (*Server) ListenAndServe added in v1.0.1

func (self *Server) ListenAndServe(address string) error

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL