aerospike

command
v0.0.0-...-c72af41 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

README

General

Namespace discovery

The probe is not discovering namespaces automatically. The namespace_meta_key_prefix needs to be defined on the cluster's consul services for the probe to discover them The probe will not do any checking without at least one namespace specified.

Example: namespace_meta_key_prefix per default is set to aerospike-monitoring-.

For advertising "foo" and "bar" namespace

aerospike-monitoring-foo: true aerospike-monitoring-bar: true

Latency checks executed at cluster level

In the Aerospike probe, all latency checks are being run on cluster level. Normally they should be done at node level but with Aerospike it is not simple to write on a specifc node/partion as it depends heavily on consistent hashing. Instead we execute the check at cluster level and try to find were the request will be performed (by looking at current topology). It is not perfect but it was the best compromise at the time.

One "cluster" for each Namespace

In Aerospike, Namespace are mostly isolated (using difference disks, even different nodes). It means that different namespace on the same node might have a different behavior. The probe will query each namespace independently.

The second interest architectural: the probe-worker can only start if the prepare phase is complete. At the moment, the prepare phase is used to create all the objects for the durability check. If it starts without all objects created it can trigger false positives. We chose to prevent a worker from starting if the prepare phase is not working. The prepare phase is executed periodically until it complete, or human intervention (the scheduled failed metric will incr at each failed prepare).

Considering different namespaces as different clusters is a simple way to enable namespace checking to start independantly.

Checks

Latency

The latency check is performing the following operations:

  • Put an object with some data
  • Get the object and read its data and compare with what was put earlier
  • Delete the object

We use some heuristics to guess which node processed the request. While it may not be 100% accurate, having latency per server is very useful for debugging.

Durability

The durability check is working by writing many item once and checking if they are still there and if their data is correct.

It is done in two phases:

Prepare phase

Executed at the startup of the probe:

  • look for a flag item to determine if data has already been pushed
  • if not present: push all the items requested
Check phase

Executed every X period of time:

  • Look for all items and validate the data for each one

Fixing the data after dataloss

To reset the data after a loss the simpliest is to remove the flag item (named: durability_key_prefix + "all_pushed_flag") or to change the number of items to be used by the probe.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL