Oxcross
Oxcross is a simple HTTP latency monitoring system, allowing multiple edge servers to be monitored from multiple locations at the same time, with a centralised configuration mechanism. The system consists of:
oxcross-origin
, which runs as a systemd daemon on edge servers to be monitored.
- These daemons are optional. Any existing health check endpoint returning 200s can be monitored in the
simple
mode, at the loss of some synchronization information.
oxcross-leaf
, which runs as a systemd daemon on monitoring servers, and are responsible for probing server for HTTP response timing and perceived clock drifts (advanced
mode only), and record these as metrics.
- It is light-weight and can run on virtual server environments with minimal specs.
configserver
, which is responsible for distributing information of origin servers and global configurations to leaf clients.
- It is Docker-packed and ready for running in a Kubernetes cluster.
Background
Through long-lasting bargain hunting, I have a small fleet of virtual private servers from budget hosting providers costing under $1/month distributed around the world. These servers are low-powered, and their management interfaces are very different to each other. Therefore, I could not join them into a Kubernetes cluster efficiently, and it has been difficult for me to make good uses of them.
Because the large variety of network environments these servers sit in, they can be very useful for testing HTTP round trip latency and other connectivity information from around the world. Under this model, there will be one oxcross-origin
server for each location my Kubernetes cluster operates; and one oxcross-leaf
monitor server for each location I have a low-powered server needing to put into use. Through a config distributed by configserver
periodically reloaded by each oxcross-leaf
, all leaves can monitor all origins, and export their results as Prometheus metrics. All I then need to do is to scrape these metrics from my cluster and visualise them.
However, this is only a good use case if I won't need to manually reconfigure each existing monitor server every time I add an edge server to be monitored. Furthermore, having many monitor servers won't be useful unless their results can be gathered at a centralised location easily. Struggling to find an existing solution fitting these requirements, I decided to write one.
Configuration
Follow the order of oxcross-origin
on servers to be monitored, then configserver
for distributing configuration pointing to servers to be monitored, and finally oxcross-leaf
on servers used for monitoring.
oxcross-origin
This component is optional if you already have a health check endpoint which returns 200 responses. oxcross-origin
will also do this and also exports some optional timing information for leaves to use.
To set it up on a node to be monitored:
apt install sudo git
git clone https://github.com/chongyangshi/Oxcross.git
cd Oxcross
sh setup_origin.sh
configserver
Follow the example of config.yaml.example
, add all origin server locations into a JSON
config file.
- In
simple
mode, Oxcross will send a GET request to scheme://host:port/
, and monitor a 200 response.
- In
advanced
mode (oxcross-origin
required), Oxcross will send a GET request to scheme://host:port/oxcross
which exports timing informatin in a 200 response.
configserver
is optimized for running in a Kubernetes cluster. If using Kubernetes:
- Wrap the JSON in a
ConfigMap
manifest as shown in config.yaml.example
- Apply the
ConfigMap
manifest to create in-cluster configuration
- Apply
configserver.yaml
to set up the configserver
.
- The
Service
created (go-oxcross-configserver.monitoring.svc.cluster.local
) will need to be fronted by some kind of load balancer or reverse proxy to be exposed to the internet.
If not running in a cluster:
- At project's root directory, run
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags="-w -s" -o ./oxcross-configserver
- Wrap the resulting binary
oxcross-configserver
in a daemon wrapper of your choice.
- Start the daemon
The binary will listen on :9300
in either case.
oxcross-leaf
This component does the actual monitoring. To set it up on a node and monitor origin nodes:
apt install sudo git
git clone https://github.com/chongyangshi/Oxcross.git
cd Oxcross
sh setup_leaf.sh <leaf-id> https://your-oxcross-configserver.example.com
You will need to give each leaf a unique <leaf-id>
to identify it in metrics, and also supply the endpoint of your configserver
available over the internet or some kind of transit link. The leaf will automatically retrieve config from https://your-oxcross-configserver.example.com/config
and keep it up to date as you change the config from configserver
's end.
Metrics
oxcross-leaf
instances export Prometheus metrics on :9299
, which can be scraped through the internet or internal network by your Prometheus instance. An example Prometheus job can be found here.
The following metrics are available:
oxcross_leaf_probe_timings_{count|sum|bucket}
: a histogram counter providing HTTP round trip latency information from each leaf to each origin
oxcross_leaf_probe_results
: a success/fail counter allowing monitoring of reachability from each leaf to each origin
oxcross_leaf_origin_time_drift
: a timing gauge estimating the relative system time difference between each origin and each leaf which observed it.
Once metrics are scraped, you can find an example Grafana dashboard JSON here.
TODOs
Prometheus
metrics are low security-level information. Therefore I haven't implemented TLS for metrics scraping. Due to the complexity of PKI management, this will have to be done later.
- Observe and export other types of useful connectivity information from
oxcross-leaf
, such as traceroute data.