inspeqtor

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 30, 2014 License: GPL-3.0 Imports: 29 Imported by: 10

README

Inspeqtor

This software is still under active development and should be considered beta quality

Inspeqtor monitors your application infrastructure. It gathers and verifies key metrics from all the moving parts in your application and alerts you when something looks wrong. It understands the application deployment workflow so it won't bother you during a deploy.

What it does:

  • Monitor systemd-, upstart-, runit- or launchd-managed services
  • Monitor process memory and CPU usage
  • Monitor daemon-specific metrics (e.g. redis, memcached, mysql, nginx...)
  • Monitor and alert based on host CPU, load, swap and disk usage
  • Alert or restart a process if a rule threshold is breached
  • Alert if a process disappears or changes PID
  • Signal deploy start/stop to silence alerts during deploy

What it doesn't:

  • monitor or control arbitrary processes, services must be init-managed
  • have any runtime dependencies at all, not even libc.

Installation

See the Inspeqtor wiki for complete documentation.

Requirements

Linux 3.0+. It will run on OS X. FreeBSD is untested. It uses about 5-10MB of RAM at runtime.

Upgrade

Inspeqtor Pro is the commercial version of Inspeqtor and offers more features, official support and a non-GPL license:

  • Monitor legacy /etc/init.d services with PID files
  • Route alerts to different teams or individuals
  • Send alerts to Slack or other team chat rooms

See the wiki documentation for in-depth documentation around each Pro feature.

License

Licensed under GPLv3.

Author

Inspeqtor is written by Mike Perham of Contributed Systems. We build awesome open source-based infrastructure to help you build awesome apps.

We also develop Sidekiq and sell Sidekiq Pro, the best Ruby background job processing system.

Documentation

Index

Constants

View Source
const (
	VERSION = "0.5.0"
)

Variables

View Source
var (
	Actions = map[string]ActionBuilder{
		"alert":   buildAlerter,
		"restart": buildRestarter,
	}
	Notifier = map[string]NotifierBuilder{
		"email": buildEmailNotifier,
		"gmail": buildGmailNotifier,
	}
)
View Source
var (
	BuildHost    = convertHost
	BuildService = convertService
	BuildRule    = convertRule
	BuildAction  = convertAction
)
View Source
var (
	Term os.Signal = syscall.SIGTERM
	Hup  os.Signal = syscall.SIGHUP

	SignalHandlers = map[os.Signal]func(*Inspeqtor){
		Term:         exit,
		os.Interrupt: exit,
		Hup:          reload,
	}
	Name      string = "Inspeqtor"
	Licensing string = "Licensed under the GNU Public License 3.0"
)
View Source
var (
	CommandHandlers = map[string]commandFunc{
		"start":  startDeploy,
		"finish": finishDeploy,
		"status": currentStatus,
		"show":   sparkline,
		"♡":      heart,
	}
)
View Source
var Defaults = GlobalConfig{15, 300}

Functions

func HandleSignal

func HandleSignal(sig os.Signal, handler func(*Inspeqtor))

func HandleSignals

func HandleSignals()

func ParseInq

func ParseInq(global *ConfigFile, confDir string) (*Host, []Checkable, error)

Parses the host- and service-specific rules in /etc/inspeqtor/conf.d/*.inq

Types

type Action

type Action interface {
	Trigger(event *Event) error
}

type ActionBuilder

type ActionBuilder func(Checkable, *AlertRoute) (Action, error)

An Action is something which is triggered when a rule is broken. This is typically either a Notification or to Restart the service.

type AlertRoute

type AlertRoute struct {
	Name    string
	Channel string
	Config  map[string]string
}

An alert route is a way to send an alert to a recipient.

Channel is the notification mechanism: email, campfire, etc. Config is an undefined set of kv pairs for configuring the channel.

The configuration looks like this:

send alerts
  via CHANNEL with K V, K V, K V

You'd then write a rule like:

if foo > 10 then alert

func ValidateChannel

func ValidateChannel(name string, channel string, config map[string]string) (*AlertRoute, error)

type Checkable

type Checkable interface {
	Name() string
	Parameter(string) string
	Metrics() metrics.Store
	Resolve([]services.InitSystem) error
	Rules() []*Rule
	Verify() []*Event
	Collect(bool, func(Checkable))
}

type ConfigFile

type ConfigFile struct {
	Top         GlobalConfig
	AlertRoutes map[string]*AlertRoute
}

func ParseGlobal

func ParseGlobal(rootDir string) (*ConfigFile, error)

type EmailEvent

type EmailEvent struct {
	*Event
	Config *EmailNotifier
}

type EmailNotifier

type EmailNotifier struct {
	Username string
	Password string
	Host     string
	From     string
	To       string
}

func (EmailNotifier) Trigger

func (e EmailNotifier) Trigger(event *Event) error

func (*EmailNotifier) TriggerEmail

func (e *EmailNotifier) TriggerEmail(event *Event, sender EmailSender) error

type EmailSender

type EmailSender func(e *EmailNotifier, doc bytes.Buffer) error

type Entity

type Entity struct {
	// contains filtered or unexported fields
}

A named thing which can checked by Inspeqtor

func (*Entity) Metrics

func (e *Entity) Metrics() metrics.Store

func (*Entity) Name

func (e *Entity) Name() string

func (*Entity) Parameter

func (e *Entity) Parameter(key string) string

func (*Entity) Parameters

func (e *Entity) Parameters() map[string]string

func (*Entity) Rules

func (e *Entity) Rules() []*Rule

type Event

type Event struct {
	Type EventType
	Checkable
	*Rule
}

func (*Event) Hostname

func (e *Event) Hostname() string

func (*Event) Service

func (e *Event) Service() *Service

func (*Event) Target

func (e *Event) Target() string

type EventType

type EventType string

There are several different types of Events:

* Process disappeared (or did not exist when we started up) * Process appeared * Rule failed check * Rule has recovered

const (
	ProcessDoesNotExist EventType = "ProcessDoesNotExist"
	ProcessExists       EventType = "ProcessExists"
	RuleFailed          EventType = "RuleFailed"
	RuleRecovered       EventType = "RuleRecovered"
)

func (EventType) String

func (s EventType) String() string

type GlobalConfig

type GlobalConfig struct {
	CycleTime    uint
	DeployLength uint
}

Parses the global inspeqtor configuration in /etc/inspeqtor/inspeqtor.conf.

type Host

type Host struct {
	*Entity
}

Host is the local machine.

func NewHost

func NewHost() *Host

func (*Host) Collect

func (h *Host) Collect(silenced bool, completeCallback func(Checkable))

func (*Host) Resolve

func (h *Host) Resolve(_ []services.InitSystem) error

func (*Host) Verify

func (s *Host) Verify() []*Event

type Inspeqtor

type Inspeqtor struct {
	RootDir    string
	SocketPath string
	StartedAt  time.Time

	ServiceManagers []services.InitSystem
	Host            *Host
	Services        []Checkable
	GlobalConfig    *ConfigFile
	Socket          net.Listener
	SilenceUntil    time.Time
	Valid           bool
}

func New

func New(dir string, socketpath string) (*Inspeqtor, error)

func (*Inspeqtor) Parse

func (i *Inspeqtor) Parse() error

func (*Inspeqtor) Shutdown

func (i *Inspeqtor) Shutdown()

func (*Inspeqtor) Start

func (i *Inspeqtor) Start()

func (*Inspeqtor) TestAlertRoutes

func (i *Inspeqtor) TestAlertRoutes() int

type NotifierBuilder

type NotifierBuilder func(Checkable, map[string]string) (Action, error)

A Notifier is a route to send an alert somewhere else. The global conf sets up the necessary params for the notification to work.

type Operator

type Operator uint8
const (
	LT Operator = iota
	GT
)

func (Operator) String

func (o Operator) String() string

type Restartable

type Restartable interface {
	Restart() error
}

A Service is Restartable, Host is not.

type Restarter

type Restarter struct {
	*Service
}

func (Restarter) Trigger

func (r Restarter) Trigger(event *Event) error

type Rule

type Rule struct {
	Entity           Checkable
	MetricFamily     string
	MetricName       string
	Op               Operator
	DisplayThreshold string
	Threshold        float64
	CurrentValue     float64
	CycleCount       int
	TrippedCount     int
	State            RuleState
	Actions          []Action
}

func (*Rule) Check

func (rule *Rule) Check() *Event

Run through all Rules and check if we need to trigger actions.

"tripped" means the Rule threshold was breached **this cycle**. "triggered" means the Rule threshold was breached enough cycles in a row to fire the alerts associated with the Rule.

There are three possible states for Rules: Ok - rule was fine last time and passed this time too. Triggered - threshold breached enough times, action should be taken Recovered - rule is currently Triggered but threshold was not breached this time

func (*Rule) Consequence

func (r *Rule) Consequence() string

func (*Rule) DisplayState

func (r *Rule) DisplayState() string

func (*Rule) EntityName

func (r *Rule) EntityName() string

func (*Rule) FetchDisplayCurrentValue

func (r *Rule) FetchDisplayCurrentValue() string

func (*Rule) FetchLatestMetricValue

func (r *Rule) FetchLatestMetricValue() float64

func (*Rule) Metric

func (r *Rule) Metric() string

func (*Rule) Reset

func (r *Rule) Reset()

type RuleState

type RuleState string
const (
	Ok        RuleState = "Ok"
	Triggered RuleState = "Triggered"
	Recovered RuleState = "Recovered"
)

func (RuleState) String

func (o RuleState) String() string

type Service

type Service struct {
	*Entity
	// Handles process events: exists, doesn't exist
	EventHandler Action
	Process      *services.ProcessStatus
	Manager      services.InitSystem
}

A service is an Entity which resolves to a Process we can monitor.

func NewService

func NewService(name string) *Service

func (*Service) Collect

func (svc *Service) Collect(silenced bool, completeCallback func(Checkable))

Called for each service each cycle, in parallel. This method must be thread-safe. Since this method executes in a goroutine, errors must be handled/logged here and not just returned.

Each cycle we need to: 1. verify service is Up and running. 2. capture process metrics 3. run rules 4. trigger any necessary actions

func (*Service) Resolve

func (svc *Service) Resolve(mgrs []services.InitSystem) error

Resolve each defined service to its managing init system. Called only at startup, this is what maps services to init and fires ProcessDoesNotExist events.

func (*Service) Restart

func (s *Service) Restart() error

func (*Service) SetMetrics

func (s *Service) SetMetrics(newStore metrics.Store)

func (*Service) String

func (s *Service) String() string

func (*Service) Transition

func (s *Service) Transition(ps *services.ProcessStatus, emitter func(EventType))

func (*Service) Verify

func (s *Service) Verify() []*Event

Directories

Path Synopsis
conf
runit manages services usually found in /etc/service or /service, which are soft links to the actual service directories in /etc/sv: <service_name>/ run log/ run supervise/ pid # => 4994 stat # => run / down
runit manages services usually found in /etc/service or /service, which are soft links to the actual service directories in /etc/sv: <service_name>/ run log/ run supervise/ pid # => 4994 stat # => run / down

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL