prometheus-cluster-exporter

command module
v0.0.0-...-4823153 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2022 License: GPL-3.0 Imports: 18 Imported by: 0

README

Prometheus Cluster Exporter

A Prometheus exporter for Lustre metadata operations and IO throughput metrics associated to SLURM accounts and process names with user and group information on a cluster.

Grafana dashboard is also available.

Building

go build -o prometheus-cluster-exporter *.go

Requirements

Lustre Exporter

Lustre exporter that exposes enabled Lustre Jobstats on the filesystem.

Squeue Command

The squeue command from SLURM must be accessable locally to the exporter to retrieve the running jobs.

For instance running the exporter on the SLURM controller is advisable, since the target host should be most stable for a productional environment.

Getent

The getent command is required for the uid to user and group mapping used for the process names throughput metrics.

Execution

Parameter
Name Default Description
version false Print version
promserver - [REQUIRED] Prometheus Server to be used e.g. http://prometheus-server:9090
log INFO Sets log level - INFO, DEBUG or TRACE
port 9846 The port to listen on for HTTP requests
timeout 15 HTTP request timeout in seconds for exporting Lustre Jobstats on Prometheus HTTP API
timerange 1m Time range used for rate function on the retrieving Lustre metrics from Prometheus - A three digit number with unit s, m, h or d
Running in a Productive Environment

For a productive environment it is advisable to run the exporter on the SLURM controller,
since the target host should be most stable.

Prometheus Scrape Settings

Depending on the required resolution and runtime of the exporter,

  • the scrape interval should be set as appropriate e.g. at least 1 minute or higher.
  • the scrape timeout should be set close to the specified scrape interval.

Metrics

Cluster exporter metrics are prefixed with "cluster_".

Global

These metrics are always exported.

Metric Labels Description
exporter_scrape_ok - Indicates if the scrape of the exporter was successful or not.
exporter_stage_execution_seconds name Execution duration in seconds spend in a specific exporter stage.
Metadata

Metadata operations are exposed per MDT, since it has been shown that it is a very helpful information to have.

Jobs
Metric Labels Description
job_metadata_operations account, user, target Total metadata operations of all jobs per account and user on a MDT.
Process Names
Metric Labels Description
proc_metadata_operations proc_name, group_name, user_name, target Total metadata operations of process names per group and user on a MDT.
Throughput
Jobs
Metric Labels Description
job_read_throughput_bytes account, user Total IO read throughput of all jobs on the cluster per account in bytes per second.
job_write_throughput_bytes account, user Total IO write throughput of all jobs on the cluster per account in bytes per second.
Process Names
Metric Labels Description
proc_read_throughput_bytes proc_name, group_name, user_name Total IO read throughput of process names on the cluster per group and user in bytes per second.
proc_write_throughput_bytes proc_name, group_name, user_name Total IO write throughput of process names on the cluster per group and user in bytes per second.

Multiple Srape Prevention

Since the forked processes do not have a timeout handling, they might block for a uncertain amount of time.
It is very unlikely that reexecuting the processes will solve the problem of beeing blocked.
Therefore multiple scrapes at a time will be prevented by the exporter.

The following warning will be displayed on afterward scrape executions, were a scrape is still active:
"Collect is still active... - Skipping now"

Besides that, the cluster_exporter_scrape_ok metric will be set to 0 for skipped scrape attempts.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL