datadog-home-project

command module

v0.0.0-...-5c26a86 Latest Latest Go to latest Published: Jul 4, 2020 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ayoubed/datadog-home-project

README ¶

Website availability & performance monitoring tool

Overview

A console program to monitor the performance and availability of websites
Websites and check intervals are user-defined

Statistics

Check the different websites with their corresponding check intervals
Compute a few interesting metrics: availability, max/avg response times, max/avg time to first byte, response codes count

Alerting

When a website availability is below a user-defined threshold for a user-defined interval, an alert message is created: "Website {website} is down. availability={availability}, time={time}" (default config threshold: 80%, interval: 2min)
When availability resumes, another message is created detailing when the alert recovered

Dashboard

displays stats for a user-defined timeframe, stats are updated following a user-defined interval. Default:
- Every 10s, display the stats for the past 10 minutes for each website
- Every minute displays the stats for the past hour for each website
Show all past alerting messages

Requirements

InfluxDB 2.0 - open source time series database
Go 1.14 - a systems programming language
Docker - we will use to ease up running an InfluxDB instance

Installation

Building from source

Build your Go app:

$ go build

Run an InfluxDB instance:

$ docker run -p 8086:8086 -v influxdb:/var/lib/influxdb influxdb

Run your build file:

$ ./datadog-home-project

Testing

We provided tests for the alerting process. The Go Testing package was used for this purpose. The 5 scenarios were tested:

0: got 0 records
Expected: don't send a "website down" alert

1: We don't have enough records on the last timeframe, availability <= threshold
Expected: don't send a "website down" alert.

2: We have enough records on the last timeframe, availability > threshold, website state is up
Expected: don't send a "website up" alert.

3: We have enough records on the last timeframe, availability <= threshold, website state is up
Expected: send a "website down" alert.

4: We have enough records on the last timeframe, availability <= threshold, website state is down
Expected: don't send a "website down" alert.

5: We have enough records on the last timeframe, availability > threshold, website state is down
Expected: send a "website up" alert.

To run the test suite, execute the following:

$ go test -v ./...

Implementation details

The project relies heavily on the built-in concurrency features of Go. All of the following entities are run concurrently using goroutines. All communications are done through go channels, particularly the monitor logs and alerts channel.

All the error management and propagation are done using the excellent "errgroup" package that facilitates managing errors while spanning multiple goroutines.

Monitor

The monitor starts concurrent tickers linked to each website. Following a user-defined interval, it sends a request to the website measures a few interesting metrics(response time, time to first byte), and sends the results as a measurement to our logs channel.

Database

Responsible for storing the measurements we provide in a time-based manner. It facilitates getting measurements for a particular timeframe.

Statsagent

Called by other entities. It computes the stats(avg/max response time, avg/max time to first byte) for the websites we monitor. It also computes the availability of a website over a timeframe.

Dashboard

Displays stats about the websites we monitor with user-defined configs(update interval, stats timeframe). It starts concurrent tickers for each view that call stats agent to get the new metrics.

The dashboard also listens to the alerts channel and displays new and past alerts on the GUI.

Alerting

It starts a ticker with a user-defined interval that calls the stats agent to compute the availability for a user-defined timeframe. All alerts are sent to an alerts channel that is consumed by our dashboard.

Ps: the alerting ticker interval should be reasonably small to keep accuracy, but not the extent of overloading the database. Using a ticker was a simplification I chose. In a production environment, we may be able to rely on a pub/sub approach to reduce the overload, which InfluxDB supports.

Possible improvements

Architecture: In a production environment, it makes sense to split the different entities we mentioned to separate microservices. The real-time communication should then be swapped to account for the change, we can maybe use gRPC streaming or WebSockets, or maybe we can use a message broker.
Stats: We recompute the stats each time we get stats. A possible improvement is keeping a queue of all relevant measurements and compute that stats in a rolling manner.
Alerts: we keep all of our alerts messages in memory. In a production environment, it makes sense to use a caching service like Redis or Memcached.
Logging: all of our alert messages should be logged for historical purposes. It also makes sense to store stats over some indicative timeframes(day stats, week stats, month stats)
Installation and deployments: we should containerize our application to make it easier for people to use and possibly deploy our tool and different environment

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
alerting
dashboard
database
monitor
request
statsagent

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL