goplum

package module
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2023 License: MIT Imports: 19 Imported by: 0

README

:toc:
:toc-placement!:

image::.images/banner.png?raw=true[Goplum]

Goplum is an extensible monitoring and alerting daemon designed for
personal infrastructure and small businesses. It can monitor
websites and APIs, and send alerts by a variety of means if they go down.

toc::[]

== Features

**Highly extensible**: Goplum supports plugins written in Go
to define new monitoring rules and alert types, and it has an API
for integration with other services and tools.

**Get alerts anywhere**: Goplum supports a variety of ways to
alert you out-of-the-box:

[width="100%",cols="3",frame="none",grid="none"]
|=====
| image:.images/alerts/discord.png[Discord logo] Discord
| image:.images/alerts/mail.png[Mail icon] E-mail
| image:.images/alerts/msteams.png[Microsoft Teams icon] Microsoft Teams
| image:.images/alerts/phone.png[Phone icon] Phone call (via Twilio)
| image:.images/alerts/pushover.png[Pushover logo] Pushover
| image:.images/alerts/slack.png[Slack logo] Slack
| image:.images/alerts/sms.png[SMS icon] SMS (via Twilio)
| image:.images/alerts/webhook.png[Webhook logo] Webhook
|
|=====

**Lightweight**: Goplum has a small resource footprint, and all
checks are purpose-written in Go. No need to worry about chains
of interdependent scripts being executed.

**Heartbeat monitoring**: Have an offline service or a cron job
that you want to monitor? Have it send a heartbeat to Goplum
periodically and get alerted if it stops.

**Simple to get started**: If you're set up to run services in
containers, you can get Goplum up and running in a couple of minutes.

== Getting started

=== Basic configuration

Goplum works by running a number of _checks_ (which test to see
if a service is working or not), and when they change state running
an _alert_ that notifies you about the problem.

Checks and alerts are both defined in Goplum's config file. A
minimal example looks like this:

[source]
----
check http.get "example.com" { <1>
  url = "https://example.com/" <2>
}

alert twilio.sms "Text Bob" { <3>
  sid = "sid"
  token = "token"
  from = "+01 867 5309"
  to = "+01 867 5309" <4>
}
----
<1> Goplum's configuration consists of "blocks". The contents
    of the blocks are placed within braces (`{}`). This is
    a "check block"; these will likely make up the bulk of your
    configuration.
    * `http.get` is the type of check we want to execute. The
      `http` part indicates it comes from the HTTP plugin, while
      the `get` part is the type of check.
    * All checks (and alerts) have a unique name, in this case
      we've called it "example.com". If a check starts to fail,
      the alert you receive will contain the check name.
<2> Parameters for the check are specified as `key = value`
    pairs within the body of the check. The documentation for
    each check and alert will explain what parameters are available,
    and whether they're required or not.
<3> Like checks, alerts have both a type and a name. Here we're
    using the `sms` alert from the `twilio` plugin, and we've
    named it `Text Bob`.
<4> The `twilio.sms` alert has a number of required parameters
    that define the account you wish to use and the phone numbers
    involved. These are all just given as `key = value` pairs.

This simple example will try to retrieve \https://example.com/
every thirty seconds. If it fails three times in a row, a text
message will be sent using Twilio. Then if it consistently starts
passing again another message will be sent saying it has recovered.
Don't worry - these numbers are all configurable: see the
<<Default Settings>> section.

In this example we used the `http.get` check and the `twilio.sms`
alert. See the <<Available checks and alerts>> section for details
of the other types available by default.

There is a complete link:docs/syntax.adoc[syntax guide] available
in the `docs` folder if you need to look up a specific aspect of
the configuration syntax.

=== Docker

The easiest way to run Goplum is using Docker. Goplum doesn't require
any privileges, settings, or ports exposed to get a basic setup
running. It just needs the configuration file, and optionally a
persistent file it can use to persist data across restarts:

Running it via the command line:

[source, shell script]
----
# Create a configuration file
vi goplum.config

# Make a 'tombstone' file that Goplum's unprivileged user can write
touch goplum.tomb
chown 65532:65532 goplum.tomb

# Start goplum
docker run -d --restart always \
   -v $(PWD)/goplum.conf:/goplum.conf:ro \
   -v $(PWD)/goplum.tomb:/tmp/goplum.tomb \
   ghcr.io/csmith/goplum
----

Or using Docker Compose:

[source,yaml]
----
version: "3.8"

services:
  goplum:
    image: ghcr.io/csmith/goplum
    volumes:
      - ./goplum.conf:/goplum.conf
      - ./goplum.tomb:/tmp/goplum.tomb
    restart: always
----

The `latest` tag points to the latest stable release of Goplum, if
you wish to run the very latest build from this repository you can
use the `dev` tag.

=== Without Docker

While Docker is the easiest way to run Goplum, it's not that hard to run it
directly on a host without containerisation. See the
link:docs/baremetal.adoc[installing without Docker] guide for more information.

== Usage

=== Available checks and alerts

All checks and alerts in Goplum are implemented as plugins. The following are maintained in
this repository and are available by default in the Docker image. Each plugin has its own
documentation, that explains how its checks and alerts need to be configured.

|====
| Plugin | checks | alerts

| link:plugins/discord[discord]
| -
| message

| link:plugins/http[http]
| get, healthcheck
| webhook

| link:plugins/network[network]
| connect, portscan
| -

| link:plugins/heartbeat[heartbeat]
| received
| -

| link:plugins/msteams[msteams]
| -
| message

| link:plugins/pushover[pushover]
| -
| message

| link:plugins/slack[slack]
| -
| message

| link:plugins/smtp[smtp]
| -
| send

| link:plugins/snmp[snmp]
| int, string
| -

| link:plugins/twilio[twilio]
| -
| call, sms

| link:plugins/debug[debug]
| random
| sysout

| link:plugins/exec[exec]
| command
| -
|====

The `docs` folder contains link:docs/example.conf[an example configuration file]
that contains an example of every check and alert fully configured.

=== Settling and thresholds

When Goplum first starts, it is not aware of the current state of your services.
To avoid immediately sending alerts when the state is determined, Goplum waits for
each check to **settle** into a state, and then only alerts when that state
subsequently changes.

Goplum uses **thresholds** to decide how many times a check result must happen in
a row before it's considered settled. By default, this the threshold is two "good"
results or two "failing" results, but this can be changed - see <<Default Settings>>.

For example:

----
 Goplum                    Failing            Recovery
 starts                     Alert               Alert
   ↓                          ↓                   ↓
    ✓ ✓ ✓ ✓ ✓ ✓ ✓ 🗙 ✓ ✓ ✓ 🗙 🗙 🗙 🗙 🗙 ✓ 🗙 ✓ 🗙 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ …
       ↑                      ↑                   ↑
  State settles          State becomes       State becomes
    as "good"              "failing"            "good"
----

=== Default Settings

All checks have a number of additional settings to control how they work. These can be
specified for each check, or changed globally by putting them in the "defaults" section.
If they're not specified then Goplum's built-in defaults will be used.

|===
|Setting |Description |Default

|`interval`
|Length of time between each run of the check.
|`30s`

|`timeout`
|Maximum length of time the check can run for before it's terminated.
|`20s`

|`alerts`
|A list of alert names to trigger when the service changes state.
 Supports '*' as a wildcard.
|`["*"]`

|`failing_threshold`
|The number of checks that must fail in a row before a failure alert is raised.
|`2`

|`good_threshold`
|The number of checks that must pass in a row before a recovery alert is raised.
|`2`
|===

For example, to change the `interval` and `timeout` for all checks:

[source,goplum]
----
defaults {
  interval = 2m
  timeout = 30s
}
----

Or to specify a custom timeout and alerts for one check:

[source,goplum]
----
check http.get "get" {
  url = "https://www.example.com/"
  timeout = 60s
  alerts = ["Text Bob"]
}
----

== Advanced topics

=== Creating new plugins

Goplum is designed to be easily extensible. Plugins must have a main package which contains
a function named "Plum" that returns an implementation of `goplum.Plugin`. They are then
compiled with the `-buildtype=plugin` flag to create a shared library.

The Docker image loads plugins recursively from the `/plugins` directory, allowing you to
mount custom folders if you wish to supply your own plugins.

Note that the Go plugin loader does not work on Windows. For Windows-based development,
the `goplumdev` command hardcodes plugins, skipping the loader.

=== gRPC API

In addition to allowing plugins to define new checks and alerts, GoPlum provides a gRPC
API to enable development of custom tooling and facilitate use cases not supported by
GoPlum itself (e.g. persisting check history indefinitely). The API is currently in
development; more information can be found in the link:docs/api.adoc[API documentation].

=== plumctl command-line tool

Goplum comes with `plumctl`, a command-line interface to inspect the state of Goplum
as well as perform certain operations such as pausing and resuming a check. `plumctl`
uses the <<gRPC API>>. For more information see the
link:docs/plumctl.adoc[plumctl documentation].

== Licence and credits

Goplum is licensed under the MIT licence. A full copy of the licence is available in
the link:LICENCE[LICENCE] file.

Some icons in this README are modifications of the Material Design icons created by Google
and released under the https://www.apache.org/licenses/LICENSE-2.0.html[Apache 2.0 licence].

Goplum makes use of a number of third-party libraries. See the link:go.mod[go.mod] file
for a list of direct dependencies. Users of the docker image will find a copy of the
relevant licence and notice files under the `/notices` directory in the image.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultSettings = CheckSettings{
	Alerts:           []string{"*"},
	Interval:         time.Second * 30,
	Timeout:          time.Second * 20,
	GoodThreshold:    2,
	FailingThreshold: 2,
}

Functions

func Run added in v0.3.0

func Run(plugins map[string]PluginLoader, configPath string)

Run creates a new instance of Plum, registers plugins and loads configuration, and starts the main loop. Listens for interrupt and sigterm signals in order to save state and clean up. It is expected that flag.Parse has been called prior to calling this method.

Types

type Alert

type Alert interface {
	// Send dispatches an alert in relation to the given check event.
	Send(details AlertDetails) error
}

Alert defines the method to inform the user of a change to a service - e.g. when it comes up or goes down.

Alerts may also implement the Validator interface to validate their arguments when configured.

type AlertDetails

type AlertDetails struct {
	// Text is a short, pre-generated message describing the alert.
	Text string `json:"text"`
	// Name is the name of the check that transitioned.
	Name string `json:"name"`
	// Type is the type of check involved.
	Type string `json:"type"`
	// Config is the user-supplied parameters to the check.
	Config interface{} `json:"config"`
	// LastResult is the most recent result that caused the transition.
	LastResult *Result `json:"last_result"`
	// PreviousState is the state this check was previously in.
	PreviousState CheckState `json:"previous_state"`
	// NewState is the state this check is now in.
	NewState CheckState `json:"new_state"`
}

AlertDetails contains information about a triggered alert

type Check

type Check interface {
	// Execute performs the actual check to see if the service is up or not.
	// It should block until a result is available or the passed context is cancelled.
	Execute(ctx context.Context) Result
}

Check defines the method to see if a service is up or not. The check is persistent - its Execute method will be called repeatedly over the lifetime of the application.

Checks may also implement the Validator interface to validate their arguments when configured.

type CheckListener added in v0.4.0

type CheckListener func(*ScheduledCheck, Result)

type CheckSettings

type CheckSettings struct {
	Alerts           []string
	Interval         time.Duration
	Timeout          time.Duration
	GoodThreshold    int `config:"good_threshold"`
	FailingThreshold int `config:"failing_threshold"`
}

func (CheckSettings) Copy added in v0.2.0

func (c CheckSettings) Copy() CheckSettings

type CheckState

type CheckState int

CheckState describes the state of a check.

const (
	// StateIndeterminate indicates that it's not clear if the check passed or failed, e.g. it hasn't run yet.
	StateIndeterminate CheckState = iota
	// StateGood indicates the service is operating correctly.
	StateGood
	// StateFailing indicates a problem with the service.
	StateFailing
)

func (CheckState) MarshalJSON

func (c CheckState) MarshalJSON() ([]byte, error)

func (CheckState) String

func (c CheckState) String() string

String returns an english, lowercase name for the state.

func (*CheckState) UnmarshalJSON added in v0.3.0

func (c *CheckState) UnmarshalJSON(val []byte) error

type CheckTombStone added in v0.3.0

type CheckTombStone struct {
	LastRun     time.Time
	Settled     bool
	State       CheckState
	Suspended   bool
	History     ResultHistory
	PluginState json.RawMessage `json:"plugin_state,omitempty"`
}

type Fact added in v0.6.0

type Fact string

Fact defines a type of information that may be returned in a Result.

Fact names should consist of the package name that defines them, a '#' character, and then a short, human-friendly name for the metric in `snake_case`.

var (
	// ResponseTime denotes the length of time it took for a service to respond to a request.
	// Its value should be a time.Duration.
	ResponseTime Fact = "github.com/csmith/goplum#response_time"

	// CheckTime indicates how long the entire check took to invoke. Its value should be a time.Duration.
	CheckTime Fact = "github.com/csmith/goplum#check_time"
)

type GrpcServer added in v0.4.0

type GrpcServer struct {
	// contains filtered or unexported fields
}

func NewGrpcServer added in v0.4.0

func NewGrpcServer(plum *Plum) *GrpcServer

func (*GrpcServer) GetCheck added in v0.4.0

func (s *GrpcServer) GetCheck(_ context.Context, name *api.CheckName) (*api.Check, error)

func (*GrpcServer) GetChecks added in v0.4.0

func (s *GrpcServer) GetChecks(_ context.Context, _ *api.Empty) (*api.CheckList, error)

func (*GrpcServer) Results added in v0.4.0

func (s *GrpcServer) Results(_ *api.Empty, rs api.GoPlum_ResultsServer) error

func (*GrpcServer) ResumeCheck added in v0.4.0

func (s *GrpcServer) ResumeCheck(_ context.Context, name *api.CheckName) (*api.Check, error)

func (*GrpcServer) Start added in v0.4.0

func (s *GrpcServer) Start()

func (*GrpcServer) Stop added in v0.4.0

func (s *GrpcServer) Stop()

func (*GrpcServer) SuspendCheck added in v0.4.0

func (s *GrpcServer) SuspendCheck(_ context.Context, name *api.CheckName) (*api.Check, error)

type LongRunning added in v0.6.0

type LongRunning interface {
	// Timeout specifies the upper-bound for how long the check will take.
	Timeout() time.Duration
}

LongRunning is implemented by checks that intentionally run for a long period of time. Checks that implement this interface won't be subject to user-defined timeouts.

type Plugin

type Plugin interface {
	Check(kind string) Check
	Alert(kind string) Alert
}

Plugin is the API between plugins and the core. Plugins must provide an exported "Plum()" method in the main package which returns an instance of Plugin.

The Check and Alert funcs should provide new instances of the named type, or nil if such a type does not exist. Exported fields of the checks and alerts will then be populated according to the user's config, and the Validate() func will be called.

type PluginLoader added in v0.2.0

type PluginLoader func() (Plugin, error)

type Plum

type Plum struct {
	Alerts map[string]Alert
	Checks map[string]*ScheduledCheck
	// contains filtered or unexported fields
}

func NewPlum added in v0.2.0

func NewPlum() *Plum

func (*Plum) AddCheckListener added in v0.4.0

func (p *Plum) AddCheckListener(listener CheckListener)

func (*Plum) AlertsMatching

func (p *Plum) AlertsMatching(names []string) []Alert

func (*Plum) RaiseAlerts

func (p *Plum) RaiseAlerts(c *ScheduledCheck, previousState CheckState)

func (*Plum) ReadConfig added in v0.2.0

func (p *Plum) ReadConfig(path string) error

func (*Plum) RegisterPlugin added in v0.2.0

func (p *Plum) RegisterPlugin(name string, loader PluginLoader)

func (*Plum) RegisterPlugins added in v0.2.0

func (p *Plum) RegisterPlugins(plugins map[string]PluginLoader)

func (*Plum) RemoveCheckListener added in v0.4.0

func (p *Plum) RemoveCheckListener(listener CheckListener)

func (*Plum) RestoreState added in v0.3.0

func (p *Plum) RestoreState() error

func (*Plum) Run

func (p *Plum) Run()

func (*Plum) RunCheck

func (p *Plum) RunCheck(c *ScheduledCheck)

func (*Plum) SaveState added in v0.3.0

func (p *Plum) SaveState() error

func (*Plum) Suspend added in v0.4.0

func (p *Plum) Suspend(checkName string) *ScheduledCheck

Suspend sets the check with the given name to be suspended (i.e., it won't run until unsuspended). Returns the modified check, or nil if the check didn't exist.

func (*Plum) Unsuspend added in v0.4.0

func (p *Plum) Unsuspend(checkName string) *ScheduledCheck

Unsuspend sets the check with the given name to be resumed (i.e., it will run normally). Returns the modified check, or nil if the check didn't exist.

type Result

type Result struct {
	// State gives the current state of the service.
	State CheckState `json:"state"`
	// Time is the time the check was performed.
	Time time.Time `json:"time"`
	// Detail is an short, optional explanation of the current state.
	Detail string `json:"detail,omitempty"`
	// Facts provides details about the check and/or the remote service, such as the response time or version.
	Facts map[Fact]interface{} `json:"facts,omitempty"`
}

Result contains information about a check that was performed.

func FailingResult

func FailingResult(format string, a ...interface{}) Result

FailingResult creates a new result indicating the service is in a bad state.

func GoodResult

func GoodResult() Result

GoodResult creates a new result indicating the service is in a good state.

func IndeterminateResult added in v0.4.0

func IndeterminateResult(format string, a ...interface{}) Result

IndeterminateResult creates a new result indicating the check wasn't able to compute a state.

type ResultHistory

type ResultHistory [10]*Result

func (ResultHistory) State

func (h ResultHistory) State(thresholds map[CheckState]int) CheckState

type ScheduledCheck

type ScheduledCheck struct {
	Name      string
	Type      string
	Config    *CheckSettings
	Check     Check
	LastRun   time.Time
	Scheduled bool
	Settled   bool
	State     CheckState
	Suspended bool
	History   ResultHistory
}

func (*ScheduledCheck) AddResult

func (c *ScheduledCheck) AddResult(result *Result) ResultHistory

func (*ScheduledCheck) LastResult

func (c *ScheduledCheck) LastResult() *Result

func (*ScheduledCheck) Remaining

func (c *ScheduledCheck) Remaining() time.Duration

type Stateful added in v0.4.0

type Stateful interface {
	Save() interface{}
	Restore(func(interface{}))
}

Stateful is implemented by checks that keep local state that should be persisted across restarts.

type TombStone added in v0.3.0

type TombStone struct {
	Time   time.Time
	Checks map[string]CheckTombStone
}

func LoadTombStone added in v0.3.0

func LoadTombStone() (*TombStone, error)

func NewTombStone added in v0.3.0

func NewTombStone(checks map[string]*ScheduledCheck) *TombStone

func (*TombStone) Restore added in v0.3.0

func (ts *TombStone) Restore(checks map[string]*ScheduledCheck) error

func (*TombStone) Save added in v0.3.0

func (ts *TombStone) Save() error

type Validator added in v0.4.0

type Validator interface {
	// Validate checks the configuration of the object and returns any errors.
	Validate() error
}

Validator is implemented by checks, alerts and plugins that wish to validate their own config.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL