libhealth

package module
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2022 License: Apache-2.0 Imports: 12 Imported by: 2

README

libhealth

Go Report Card Build Status GoDoc NetflixOSS Lifecycle GitHub

libhealth is a Golang library that provides flexible components to report the current state of external systems that an application depends on, as well as the current health of any internal aspects of the application.

components

dependency set

Most applications have a set of dependencies whose health state needs to be tracked. A dependency set can compute the health state of an application from health monitors and their associated urgency levels.

The final health state of an application is computed from their status and urgency. A weakly coupled component in a complete outage results in a MINOR healthcheck outage state. A more strongly coupled or required component in an OUTAGE results in correspondingly higher outage states. The matrix below illustrates the relationship between the "urgency" of a component and computed status.

Status \ Urgency REQUIRED STRONG WEAK NONE
OUTAGE OUTAGE MAJOR MINOR OK
MAJOR MAJOR MAJOR MINOR OK
MINOR MINOR MINOR MINOR OK
OK OK OK OK OK

While applications can implement a dependency set of their own, a basic dependency set is provided which fits most use cases. Applications typically need one basic dependency set, and libhealth will update monitors and track their state in background goroutines.

Example:

import 	"oss.indeed.com/go/libhealth"

func setupHealth() {
	deps := libhealth.NewBasicDependencySet()
	deps.Register(libhealth.NewMonitor(
                                "health-monitor-name",
                                "monitor description",
                                "https://docs/to/your/monitor",
                                libhealth.WEAK,
                                func(ctx context.Context) libhealth.Health {
                                    // calculate monitor health here
                                    return libhealth.NewHealth(libhealth.OK, "everything is fine")
                                }))
}
healthcheck endpoints

Typical applications expose several healthcheck endpoints to an HTTP server for tracking their state. libhealth provides two classes of endpoints: public "info" and private healthcheck endpoints. The public endpoints are typically consumed by other software (e.g loadbalancers, HAProxy, nginx, etc). They return a 200 or 500 status code and very simple json payload indicating the source of the response.

An example response to the /info endpoints is shown below:

{
  "condition" : "OK",
  "duration" : 0,
  "hostname" : "aus-worker11"
}

The private healthcheck endpoints expose significantly more information about the runtime and environment of the process to aid debugging outages, however these should only be exposed to whitelisted ips. If this is not possible, consider only exposing endpoints for the less verbose public endpoints.

A common pattern is to expose the following endpoints:

/info/healthcheck
/info/healthcheck/live
/private/healthcheck
/private/healthcheck/live

HTTP handlers are provided by libhealth for serving each of these routes:

import "oss.indeed.com/go/libhealth"

func healthRouter(d libhealth.DependencySet) *http.ServeMux {
	router := http.NewServeMux()
	router.Handle("/info/healthcheck", libhealth.NewInfo(d))
	router.Handle("/info/healthcheck/live", libhealth.NewInfo(d))
	router.Handle("/private/healthcheck", libhealth.NewPrivate("my-app-name", d))
	router.Handle("/private/healthcheck/live", libhealth.NewPrivate("my-app-name", d))
	return router
}

Alternatively, you can use the helper function WrapServeMux, which will register all these handlers for you:

import "oss.indeed.com/go/libhealth"

...
router := http.NewServeMux()
libhealth.WrapServeMux(router, "my-app-name", dependencies)

Contributing

We welcome contributions! Feel free to help make libhealth better.

Process
  • Open an issue and describe the desired feature / bug fix before making changes. It's useful to get a second pair of eyes before investing development effort.
  • Make the change. If adding a new feature, remember to provide tests that demonstrate the new feature works, including any error paths. If contributing a bug fix, add tests that demonstrate the erroneous behavior is fixed.
  • Open a pull request. Automated CI tests will run. If the tests fail, please make changes to fix the behavior, and repeat until the tests pass.
  • Once everything looks good, one of the indeedeng members will review the PR and provide feedback.

Maintainers

The oss.indeed.com/go/libhealth project is maintained by Indeed Engineering.

While we are always busy helping people get jobs, we will try to respond to GitHub issues, pull requests, and questions within a couple of business days.

Code of Conduct

oss.indeed.com/go/libhealth is governed by the Contributer Covenant v1.4.1

For more information please contact opensource@indeed.com.

License

The oss.indeed.com/go/libhealth project is open source under the Apache 2.0 license.

Documentation

Index

Constants

View Source
const (
	InfoHealthCheck     = `/info/healthcheck`
	InfoHealthCheckLive = `/info/healthcheck/live`
)

Paths to info healthchecks.

View Source
const (
	PrivateHealthCheck     = `/private/healthcheck`
	PrivateHealthCheckLive = `/private/healthcheck/live`
)

The exposed endpoints for private healthchecks.

Variables

This section is empty.

Functions

func ComputeStatusCode

func ComputeStatusCode(info bool, s Summary) int

ComputeStatusCode returns the HTTP status code representative of the status. The public info endpoints return HTTP 500 internal server error if the component is in outage. Otherwise, any lessor status returns HTTP 200 ok. The private endpoints conversely return a HTTP 200 ok if and only if the overall state is ok. Any unhealthy state will return an HTTP 500 internal server error.

func WrapServeMux

func WrapServeMux(
	mux *http.ServeMux,
	appname string,
	provided DependencySet,
	additional ...HealthMonitor,
)

WrapServeMux will wrap mux with Handlers for

  • /private/healthcheck
  • /private/healthcheck/live
  • /info/healthcheck
  • /info/healthcheck/live

Types

type BasicDependencySet

type BasicDependencySet struct {
	// contains filtered or unexported fields
}

BasicDependencySet is the standard implementation of DependancySet that behaves the way you would expect. HealthMonitors are registered to a DependencySet instance, and then you can call Live() or Background() on that instance. Live() will cause all of the HealthMonitors to call their Check() methods, whereas Background() will retrieve the cached Health of that health check.

func NewBasicDependencySet

func NewBasicDependencySet(monitors ...HealthMonitor) *BasicDependencySet

NewBasicDependencySet will create a new BasicDependencySet instance and register all of the provided HealthMonitor instances.

func NewBasicDependencySetWithContext added in v1.1.1

func NewBasicDependencySetWithContext(ctx context.Context, monitors ...HealthMonitor) *BasicDependencySet

NewBasicDependencySet will create a new BasicDependencySet instance using a specific context and register all of the provided HealthMonitor instances.

func (*BasicDependencySet) Background

func (d *BasicDependencySet) Background() Summary

Background will retrieve the cached Health for each of the registered HealthChecker instances.

func (*BasicDependencySet) Live

func (d *BasicDependencySet) Live() Summary

Live will force all of the HealthChecker instances to execute their Check methods, and will update all cached Health as well.

func (*BasicDependencySet) Register

func (d *BasicDependencySet) Register(monitors ...HealthMonitor)

Register will register all of the provided HealthMonitor instances. HealthMonitors which have a 0 value Timeout will NOT be executed on Register(). They will only be executed on calls to Live(). This is important, because it means such a checker will be set to OUTAGE if only Background() is ever called.

type Component

type Component struct {
	Timestamp   int64  `json:"timestamp"`
	DocURL      string `json:"documentationUrl"`
	Urgency     string `json:"urgency"`
	Description string `json:"description"`
	State       string `json:"status"`
	Message     string `json:"errorMessage"`
	Duration    int64  `json:"duration"`
	LastGood    int64  `json:"lastKnownGoodTimestamp"`
	Period      int64  `json:"period"`
	ID          string `json:"id"`
	Date        string `json:"date"`
}

A Component is the healthcheck status of one component in a /private/healthcheck result.

type Components

type Components struct {
	Outage []Component `json:"OUTAGE,omitempty"`
	Major  []Component `json:"MAJOR,omitempty"`
	Minor  []Component `json:"MINOR,omitempty"`
	Ok     []Component `json:"OK,omitempty"`
}

Components of a particular Status. We list them explicitly so that during json encoding they are ordered.

type DependencySet

type DependencySet interface {
	Register(monitors ...HealthMonitor)
	Background() Summary
	Live() Summary
}

DependencySet is an interface which can be used to represent a set of HealthChecker instances. These HealthChecker instances are used to determine the healthiness of a service.

type Health

type Health struct {
	Status
	Urgency
	time.Time
	Message
	time.Duration
}

Health is a representation of the health of a service at a moment in time. It is composed of a Status, an Urgency, a Time, and a Message. Once created it should not be modified.

func NewHealth

func NewHealth(
	state Status,
	message string,
) Health

NewHealth creates a Health for a fixed moment in time.

func (Health) String

func (h Health) String() string

String returns a human readable summary

type HealthChecker

type HealthChecker func(ctx context.Context) Health

HealthChecker is a func that you provide which checks the status of something's health.

type HealthMonitor

type HealthMonitor interface {
	// Name provides a unique name for the healthcheck.
	Name() string
	// Check is the function representing the work of a health check.
	Check(ctx context.Context) Health
	// Timeout is how long Check gets to run before defaulting to OUTAGE.
	Timeout() time.Duration
	// Period is how often Check should run.
	Period() time.Duration
	// Description is an informative string about what the healthcheck does.
	Description() string
	// Documentation is a url link that documents additional information about the healthcheck.
	Documentation() string
	// Urgency is how important this service.
	Urgency() Urgency
	// LastOk is the time when this healthcheck was last OK.
	LastOk() time.Time
	// Failed is the number of consecutive healthcheck failures. Returns zero when the healthcheck is OK.
	Failed() int
}

HealthMonitor is an interface that should be implemented for every custom type of health check you need. These can then be added to a DependencySet and used in Live and Background version of Info and Private healthchecks.

type HealthStatus

type HealthStatus struct {
	Monitor HealthMonitor

	Prev Status
	Next Health
}

type HealthTracker

type HealthTracker interface {
	DependencySet() DependencySet       // implemented by your app
	RegisterDependencies(DependencySet) // implemented by libhealth
}

HealthTracker is the interface a framework that takes care of managing a HealthServer should satisfy so that it can register dependencies defined by the thing the framework is running.

type Info

type Info struct {
	// contains filtered or unexported fields
}

Info is an http.Handler for

/info/healthcheck
/info/healthcheck/live.

func NewInfo

func NewInfo(d DependencySet) *Info

NewInfo will create a new Info handler for a given DependencySet set.

func (*Info) ServeHTTP

func (i *Info) ServeHTTP(w http.ResponseWriter, r *http.Request)

ServeHTTP is intended to be used by a net/http.ServeMux for serving formatted json.

type InfoResult

type InfoResult struct {
	Condition string `json:"condition"`
	Hostname  string `json:"hostname"`
	Duration  int64  `json:"duration"`
}

InfoResult represents the body of an info healthcheck.

type Message

type Message string

Message is a string with some useful information regarding a Health.

type Monitor

type Monitor struct {
	// contains filtered or unexported fields
}

Monitor is the standard implementation of HealthChecker.

func NewMonitor

func NewMonitor(
	name,
	description,
	docURL string,
	urgency Urgency,
	check HealthChecker,
	statusChan chan HealthStatus,
) *Monitor

NewMonitor will create a new monitor and set some defaults. If desired, pass in a channel and it will be published to when the health state of the monitor changes.

func NewMonitorWithOptions added in v1.1.0

func NewMonitorWithOptions(
	name,
	description,
	docURL string,
	urgency Urgency,
	check HealthChecker,
	options ...MonitorOption,
) *Monitor

NewMonitorWithOptions constructs a new monitor from the provided components and configures optional ones (such as a status channel, timeout, and interval) based on the provided in options.

func TransitiveMonitor

func TransitiveMonitor(
	url,
	name,
	description,
	wikipage string,
	urgency Urgency,
	statusChan chan HealthStatus,
) *Monitor

TransitiveMonitor creates a Monitor that is a dependency on another service that responds to a healthcheck

func (*Monitor) Check

func (m *Monitor) Check(ctx context.Context) Health

Check will execute the HealthChecker associated with the monitor.

func (*Monitor) Description

func (m *Monitor) Description() string

func (*Monitor) Documentation

func (m *Monitor) Documentation() string

func (*Monitor) Failed

func (m *Monitor) Failed() int

func (*Monitor) LastOk

func (m *Monitor) LastOk() time.Time

func (*Monitor) Name

func (m *Monitor) Name() string

func (*Monitor) Period

func (m *Monitor) Period() time.Duration

func (*Monitor) Timeout

func (m *Monitor) Timeout() time.Duration

func (*Monitor) Urgency

func (m *Monitor) Urgency() Urgency

type MonitorOption added in v1.1.0

type MonitorOption func(monitor *Monitor)

func WithPeriod added in v1.1.0

func WithPeriod(period time.Duration) MonitorOption

WithPeriod configures an optional interval for the montor to be run on. If not provided, the default is used.

func WithStatusChan added in v1.1.0

func WithStatusChan(statusChan chan HealthStatus) MonitorOption

WithStatusChan configures the optional subscription channel to notify consumers of health changes.

func WithTimeout added in v1.1.0

func WithTimeout(timeout time.Duration) MonitorOption

WithTimeout configures an optional timeout for the monitor. If not provided, the default is used.

type Private

type Private struct {
	// contains filtered or unexported fields
}

Private is an http.Handler for

/private/healthcheck
/private/healthcheck/live

func NewPrivate

func NewPrivate(appName string, set DependencySet) *Private

NewPrivate creates a new Private so that service appName can be register its DependencySet.

func (*Private) ServeHTTP

func (p *Private) ServeHTTP(w http.ResponseWriter, r *http.Request)

type PrivateResult

type PrivateResult struct {
	AppName                   string            `json:"appName"`
	Condition                 string            `json:"condition"`
	Duration                  int64             `json:"duration"`
	Hostname                  string            `json:"hostname"`
	Environment               map[string]string `json:"environment"`
	CWD                       string            `json:"cwd"`
	AppStartDateSystem        string            `json:"appStartDateSystem"`
	AppStartDateUTC           string            `json:"appStartDateUTC"`
	AppStartUnixTimestamp     string            `json:"appStartUnixTimestamp"`
	AppUpTimeReadable         string            `json:"appUpTimeReadable"`
	AppUpTimeSeconds          string            `json:"appUpTimeSeconds"`
	LeastRecentlyExecutedDate string            `json:"leastRecentlyExecutedDate"`
	LeastRecentlyExecutedTime int64             `json:"leastRecentlyExecutedTimestamp"`
	Results                   Components        `json:"results"`
}

A PrivateResult is the struct (and JSON) definition of what the response to a private healthcheck endpoint should be composed of.

type Result

type Result struct {
	Health
	// contains filtered or unexported fields
}

Result represents the final result of a HealthChecker. It contains all the information needed before being consumed by a robot.

type Status

type Status int

Status is a representation of the health of the component.

const (
	OUTAGE Status = iota
	MAJOR
	MINOR
	OK
)

func BestState

func BestState(left, right Status) Status

BestState returns the more cheerful of two states.

func ParseStatus

func ParseStatus(state string) Status

ParseStatus parses the given string into a Status. If the string is malformed, OUTAGE is returned.

func WorstState

func WorstState(left, right Status) Status

WorstState returns the less positive of two states.

func (Status) BetterThan

func (s Status) BetterThan(level Status) bool

BetterThan compares a Status to another Status.

func (Status) SameAs

func (s Status) SameAs(level Status) bool

SameAs compares a Status to another Status.

func (Status) SameOrBetterThan

func (s Status) SameOrBetterThan(level Status) bool

SameOrBetterThan compares a Status to another Status.

func (Status) SameOrWorseThan

func (s Status) SameOrWorseThan(level Status) bool

SameOrWorseThan compares a Status to another Status.

func (Status) String

func (s Status) String() string

String provides a regular string representation of a Status.

func (Status) WorseThan

func (s Status) WorseThan(level Status) bool

WorseThan compares a Status to another Status.

type Summary

type Summary struct {
	// contains filtered or unexported fields
}

Summary is for representing the result of running multiple Checker instances such as the ones kept by a Dependencies. This object is intended to be used by a Dependencies for generating specified JSON output.

func NewSummary

func NewSummary(executed time.Time, results []Result) Summary

NewSummary will create a new Summary for a list of Health responses.

func (Summary) Duration

func (s Summary) Duration() time.Duration

Duration is a method provided to determine the total length of time it took to compute a Summary - that is how long it took to run a set of Checkers and produce results.

func (Summary) Executed

func (s Summary) Executed() time.Time

Executed returns the time at which s was generated by initiating the checks it represents.

func (Summary) Overall

func (s Summary) Overall() Status

Overall will return the combined downgraded Status of all of the Health instances. This state depends on both the check status and the urgency of each of the Health instances.

func (Summary) Status

func (s Summary) Status(names ...string) Status

Status will return the combined downgraded Status of all of the Health instances identified by name The state does NOT depend on the urgency of each of the Health instances

func (Summary) StatusWithUrgency

func (s Summary) StatusWithUrgency(names ...string) Status

StatusWithUrgency will return the combined downgraded Status of all of the Health instances identified by name This status depends on both the check status and the urgency of each of the Health instances.

type Urgency

type Urgency int

Urgency is a level of requirement for a service to be operational. A REQUIRED service would cause major service disruption if it is not healthy. Likewise a WEAK service can fail without major issues.

const (
	REQUIRED Urgency = iota
	STRONG
	WEAK
	NONE
	UNKNOWN
)

func ParseUrgency

func ParseUrgency(urgency string) Urgency

ParseUrgency converts the given string into an Urgency. If the string is malformed, UNKNOWN is returned.

func (Urgency) Detail

func (u Urgency) Detail() string

Detail provides a detailed representation of the Urgency level

func (Urgency) DowngradeWith

func (u Urgency) DowngradeWith(systemState, newState Status) Status

DowngradeWith returns the downgraded Outage state according to HCv3 math.

func (Urgency) String

func (u Urgency) String() string

String provides an obvious representation of the Urgency level

Directories

Path Synopsis
Package count enables tracking accumulating values.
Package count enables tracking accumulating values.
internal/data
Package data provides time series buckets of accumulated values.
Package data provides time series buckets of accumulated values.
Package gauge provides mechanisms for gauging values.
Package gauge provides mechanisms for gauging values.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL