status

module

v0.1.1 Latest Latest Go to latest Published: May 2, 2024 License: AGPL-3.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/practable/status

Links

Open Source Insights

README ¶

status

Status server interacts with the book, jump, and relay services to identify experiments with (un)healthy streams and adjusts the status of experiment resources in booking system accordingly:

All required streams ok?	Jump OK?	Available in booking system?
Y	X	Y
N	X	N

Key: Y: yes, N: no, X: does not matter

Email alerts are sent on status changes to both jump and streams.

Stream health changed?	Jump health changed?	Send Email?	Change availabilty?
N	N	N	N
N	Y	Y	N
Y	N	Y	Y
Y	Y	Y	Y

An experiment without a jump connection can still be used so long as the required streams are present, although it is an issue for the maintainer to handle, so email alerts are sent for changes to jump and streams health events (an event is a change in the health).

Criteria

Jump

Jump connections remain dormant with no communication, until a client wishes to log in. At which point, there is traffic recorded in the rx section of the statistics. A healthy jump connection is simply one that exists - there is no need to check the stats.

Relay

Healthy stream connections from an experiment have transmitted recently, typically every few seconds or more frequently. Rather than define a healthy stream, as this requires some quite detailed checks, we can instead detect definitely unhealthy streams and alert on those. An unhealthy stream is defined as NOT having transmitted wtihin the last 10s.

Usage

Status requires configuration with the details of the book, jump and relay services, an optional email relay, and some tuning parameters that control how quick it is to adjust the status of experiments.

Typically, status is configured using admin tools scripts for your instance, where it is usually started as a systemd service.

A local version can be run using ./scripts/local-demo.sh which will need modifying to suit your configuration.

export STATUS_BASEPATH_BOOK=/tenant/book
export STATUS_BASEPATH_JUMP=/tenant/jump
export STATUS_BASEPATH_RELAY=/tenant/relay
export STATUS_EMAIL_CC="beta@a.org"
export STATUS_EMAIL_FROM=other@b.org
export STATUS_EMAIL_HOST=stmp.b.org
export STATUS_EMAIL_LINK=app.practable.io/tenant/status
export STATUS_EMAIL_PASSWORD=something
export STATUS_EMAIL_PORT=587
export STATUS_EMAIL_SUBJECT=app.practable.io/tenant
export STATUS_EMAIL_TO="alpha@a.org"
export STATUS_HEALTH_LAST=10s
export STATUS_HEALTH_LOG_EVERY=10m
export STATUS_HEALTH_EVENTS=100
export STATUS_HEALTH_STARTUP=1m
export STATUS_HOST_BOOK=app.example.org
export STATUS_HOST_JUMP=app.example.org
export STATUS_HOST_RELAY=app.example.org
export STATUS_LOG_LEVEL=warn
export STATUS_LOG_FORMAT=json
export STATUS_LOG_FILE=/var/log/status/status.log
export STATUS_PORT_PROFILE=6061
export STATUS_PORT_SERVE=3007
export STATUS_PROFILE=false
export STATUS_QUERY_BOOK_EVERY=15m
export STATUS_RECONNECT_JUMP_EVERY=1d
export STATUS_RECONNECT_RELAY_EVERY=1d
export STATUS_SCHEME_BOOK=https
export STATUS_SCHEME_JUMP=https
export STATUS_SCHEME_RELAY=https
export STATUS_SECRET_JUMP=othersecret
export STATUS_SECRET_BOOK=somesecret
export STATUS_SECRET_RELAY=somesecret
export STATUS_TIMEOUT_BOOK=5s
status serve

Tuning

The tuning parameters affect the responsiveness of the system to problems, and how complete a history of issues is stored, both of which must be traded off against CPU, memory and disk usage. Some suggested ranges are

parameter	test	prod	comment
STATUS_HEALTH_LAST	10s	10s	healthy stream must have last transmitted within this duration. Configure experiments with a heartbeat for data streams that is shorter than this duration.
STATUS_HEALTH_LOG_EVERY	1m	10m-1h	how often to write current overall status info to log file
STATUS_HEALTH_EVENTS	20	20	how many health events to store in memory per experiment
STATUS_HEALTH_STARTUP	1m	5m	how long to wait after an experiment has first connected to before assessing overall health
STATUS_QUERY_BOOK_EVERY	5m	15m-1h	how often to check the booking manifest for required streams
STATUS_RECONNECT_JUMP_EVERY	30s	1d-30d	Reconnection is automatic, this just provides a way to have a long running service without setting a long token lifetime up front
STATUS_RECONNECT_RELAY_EVERY	30s	1d-30d	"
STATUS_TIMEOUT_BOOK	5s	30s	This is how long to wait for GET/PUT requests to the book server

Email

Many cloud providers do not allow outgoing access to smtp (port 25), so as to reduce spam. Many email providers offer an smtp service running on port 587, with basic auth. If your current provider does not offer this, it is relatively inexpensive to obtain a custom domain and then seek an inexpensive email provider to host an account for that domain.

Multiple addresses

Multiple addresses are supported - just put emails in a comma-separated list

Rate limits

Emails can only be sent as often as checks are performed, and only one email is sent per check, summarising all the health events that just occurred, as well as listing all current problems in a single list (to aid cut and paste forwarding of the current state info to other team members, for eample). Rate limits on emails are therefore enforced by the frequency with which relay and jump send status updates. If both relay and jump were to fail, then the

Trouble shooting

If you are seeing unexpected errors, it is likely a configuration issue. The openapi client code can be difficult to develop with because unexpected errors tend to throw an unmarshalling error rather than passing the server's error message. Hence some clients used by status are written using raw HTTP requests. If you are seeing unexpected errors, then it could well be a configuration error that is being masked by one of the clients that uses the openapi client code. In these cases, check the logs for the services to see whether something straightforward like a token generation error is preventing authentication, or whether requests are misdirected (e.g. incorrect base path such as using the wrong instance/tenant sub path). Note that book and jump currently require /api/v1 in the base path but relay does not.

Directories ¶

Path	Synopsis
cmd
status
status/cmd
internal
config
serve
serve/models
serve/restapi Package restapi Status	Package restapi Status
serve/restapi/operations
server
status

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL