maprobe

package module
v0.7.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2024 License: MIT Imports: 41 Imported by: 1

README

maprobe

Mackerel external probe / aggregate agent.

Description

maprobe is an external probe / aggregate agent with Mackerel.

maprobe agent works as below.

for probes
  1. Fetch hosts information from Mackerel API.
    • Filtered service and role.
  2. For each hosts, execute probes (ping, tcp, http, command).
    • expand place holder in configuration {{ .Host }} as Mackerel host struct.
    • {{ .Host.IPAddress.eth0 }} expand to e.g. 192.168.1.1
  3. Posts host metrics to Mackerel (and/or OpenTelemetry metrics endpoint if configured).
  4. Iterates these processes each 60 sec.
for aggregates
  1. Fetch hosts information from Mackerel API.
    • Filtered service and role.
  2. For each hosts, fetch specified host metrics to calculates these metrics by functions.
  3. Post theses aggregated metrics as Mackerel service metrics.
  4. Iterates these processes each 60 sec.

Install

Binary packages

GitHub releases

Docker

DockerHub

Usage

usage: maprobe [<flags>] <command> [<args> ...]

Flags:
  --help              Show context-sensitive help (also try --help-long and --help-man).
  --log-level="info"  log level

Commands:
  help [<command>...]
    Show help.

  agent [<flags>]
    Run agent

  once [<flags>]
    Run once

  ping [<flags>] <address>
    Run ping probe

  tcp [<flags>] <host> <port>
    Run TCP probe

  http [<flags>] <url>
    Run HTTP probe

  firehose-endpoint [<flags>]
    Run Firehose HTTP endpoint
agent / once

MACKEREL_APIKEY environment variable is required.

$ maprobe agent --help
usage: maprobe agent [<flags>]

Run agent

Flags:
      --help              Show context-sensitive help (also try --help-long and --help-man).
      --log-level="info"  log level
  -c, --config=CONFIG     configuration file path or URL(http|s3)

--config accepts a local file path or URL(http, https or s3 scheme). maprobe checks the config is modified, and reload in run time.

Defaults of --config and --log-level will be overrided from envrionment variables (CONFIG and LOG_LEVEL).

agent runs maprobe forever, once runs maprobe once.

Example Configuration for probes
post_probed_metrics: false   # when false, do not post host metrics to Mackerel. only dump to [info] log.
probes:
  - service: '{{ env "SERVICE" }}'   # expand environment variable
    role: server
    ping:
      address: '{{ .Host.IPAddresses.eth0 }}'

  - service: production
    role: webserver
    http:
      url: 'http://{{ .Host.CustomIdentifier }}/api/healthcheck'
      post: POST
      headers:
        Content-Type: application/json
      body: '{"hello":"world"}'
      expect_pattern: 'ok'

  - service: production
    role: redis
    tcp:
      host: '{{ .Host.IPAddress.eth0 }}'
      port: 6379
      send: "PING\n"
      expect_pattern: "PONG"
      quit: "QUIT\n"
    command:
      command:
        - "mackerel-plugin-redis"
        - "-host={{ .Host.IPAddress.eth0 }}"
        - "-tempfile=/tmp/redis-{{ .Host.ID }}"
    attributes: # supoort OpenTelemetry attributes
      - service.namespaece: redis
      - host.name: "{{ .Host.Name }}"

  - service: production
    service_metric: true # post metrics as service metrics
    http:
      url: 'https://example.net/api/healthcheck'
      post: GET
      headers:
        Content-Type: application/json
      body: '{"hello":"world"}'
      expect_pattern: 'ok'

destination:
  mackerel:
    enabled: true # default true
  otel:
    enabled: true # default false
    endpoint: localhost:4317
    insecure: true
OpenTelemetry metrics endpoint support

destination.otel.enabled: true enables to post metrics to OpenTelemetry metrics endpoint. maprobe uses the gRPC protocol to send metrics.

destination:
  mackerel:
    enabled: false # disable mackerel host/service metrics
  otel:
    enabled: true
    endpoint: localhost:4317
    insecure: true

Extra attributes can be added to metrics by attributes in probe configuration.

By default, maprobe adds service.name and host.id attributes to metrics.

probes:
  - service: production
    role: redis
    command:
      command:
        - "mackerel-plugin-redis"
        - "-host={{ .Host.IPAddress.eth0 }}"
        - "-tempfile=/tmp/redis-{{ .Host.ID }}"
    attributes: # extra attributes
      - service.namespace: redis
      - host.name: "{{ .Host.Name }}"
Service metrics support in probes

service_metric: true in probe configuration enables to post metrics as service metrics.

probes:
  - service: production
    service_metric: true # post metrics as service metrics
    # ...

In this case, .Host is not available in probe configuration.

Backup metrics using Amazon Kinesis Firehose

When Mackerel API is down, maprobe can backup corrected metrics to Amazon Kinesis Firehose.

backup:
  firehose_stream_name: your-maprobe-backup

If maprobe cannot post metrics to Mackerel API, maprobe posts these metrics to Firehose stream as backup.

maprobe agent --with-firehose-endpoint or maprobe firehose-endpoint runs HTTP server for Firehose HTTP Endpoint.

You can configure the Firehose stream that send data to HTTP endpoint to maprobe's http server.

[maprobe] -XXX-> [Mackerel]
          \
        (backup)
            \---> [Firehose](buffer and retry) -(ELB)-> [maprobe HTTP] --> [Mackerel]

Firehose HTTP Endpoint has paths below.

  • /post : Post metrics endpoint. "Access key" must be same the as MACKEREL_APIKEY which set in maprobe.
  • /ping : Always return 200 OK (for health check).

maprobe accepts Firehose HTTP requests and the metrics will send to Mackerel API (when available).

Ping

Ping probe sends ICMP ping to the address.

ping:
  address: "192.168.1.1"      # Hostname or IP address (required)
  count: 5                    # Iteration count (default 3)
  timeout: "500ms"            # Timeout to ping response (default 1 sec)
  metric_key_prefix:          # default ping

Ping probe generates the following metrics.

  • ping.count.success (count)
  • ping.count.failure (count)
  • ping.rtt.min (seconds)
  • ping.rtt.max (seconds)
  • ping.rtt.avg (seconds)
TCP

TCP probe connects to host:port by TCP (or TLS).

tcp:
  host: "memcached.example.com" # Hostname or IP Address (required)
  port: 11211                   # Port number (required)
  timeout: 10s                  # Seconds of timeout (default 5)
  send: "VERSION\n"             # String to send to the server
  quit: "QUIT\n"                # String to send server to initiate a clean close of the connection"
  expect_pattern: "^VERSION 1"  # Regexp pattern to expect in server response
  tls: false                    # Use TLS for connection
  no_check_certificate: false   # Do not check certificate
  metric_key_prefix:            # default tcp

TCP probe generates the following metrics.

  • tcp.check.ok (0 or 1)
  • tcp.elapsed.seconds (seconds)
HTTP

HTTP probe sends a HTTP request to url.

http:
  url: "http://example.com/"     # URL
  method: "GET"                  # Method of request (default GET)
  headers:                       # Map of request header
    Foo: "bar"
  body: ""                       # Body of request
  expect_pattern: "ok"           # Regexp pattern to expect in server response
  timeout: 10s                   # Seconds of request timeout (default 15)
  no_check_certificate: false    # Do not check certificate
  metric_key_prefix:             # default http

HTTP probe generates the following metrics.

  • http.check.ok (0 or 1)
  • http.response_time.seconds (seconds)
  • http.status.code (100~)
  • http.content.length (bytes)

When a status code is grather than 400, http.check.ok set to 0.

Command

Command probe executes command which outputs like Mackerel metric plugin.

command:
  command: "/path/to/metric-command -option=foo" # execute command
  timeout: "5s"                      # Seconds of command timeout (default 15)
  graph_defs: true                   # Post graph definitions to Mackerel (default false)
  env:  # environment variables for command execution
    FOO: foo
    BAR: bar

command accepts both a single string value and an array value. If an array value is passed, these are not processed by shell.

command:
  command:
    - "/path/to/metric-command"
    - "-option=foo"
  timeout: "5s"                      # Seconds of command timeout (default 15)
  graph_defs: true                   # Post graph definitions to Mackerel (default false)

Command probe handles command's output as host metric.

When graph_defs is true, maprobe runs a command with MACKEREL_AGENT_PLUGIN_META=1 environment variables and post graph definitions to Mackerel at first time.

If the command does not return a valid graph definitions output, that is ignored.

See also ホストのカスタムメトリックを投稿する - Mackerel ヘルプ.

Example of automated cleanup for terminated EC2 instances.

Command probe can run any scripts against for Mackerel hosts.

For example,

service: production
role: server
statues:
  - working
  - standby
  - poweroff
command:
  command: 'cleanup.sh {{.Host.ID}} {{index .Host.Meta.Cloud.MetaData "instance-id"}}'

cleanup.sh checks an instance status, retire a Mackerel host when the instance is not exists.

#!/bin/bash
set -u
host_id="$1"
instance_id="$2"
exec 1> /dev/null # dispose stdout
result=$(aws ec2 describe-instance-status --instance-id "${instance_id}" 2>&1)
if [[ $? == 0 ]]; then
  exit
elif [[ $result =~ "InvalidInstanceID.NotFound" ]]; then
   mkr retire --force "${host_id}"
fi
Example Configuration for aggregates
post_aggregated_metrics: false   # when false, do not post service metrics to Mackerel. only dump to [info] log.
aggregates:
  - service: production
    role: app-server
    metrics:
      - name: cpu.user.percentage
        outputs:
          - func: sum
            name: cpu.user.sum_percentage
          - func: avg
            name: cpu.user.avg_percentage
      - name: cpu.idle.percentage
        outputs:
          - func: sum
            name: cpu.idle.sum_percentage
          - func: avg
            name: cpu.idle.avg_percentage

This configuration posts service metrics (for service "production") as below.

  • cpu.user.sum_percentage = sum(cpu.user.percentage) of production:app-server
  • cpu.user.avg_percentage = avg(cpu.user.percentage) of production:app-server
  • cpu.idle.sum_percentage = sum(cpu.idle.percentage) of production:app-server
  • cpu.idle.avg_percentage = avg(cpu.idle.percentage) of production:app-server
functions for aggregates

Following functions are available to aggregate host metrics.

  • sum
  • min / minimum
  • max / maximum
  • avg / average
  • median
  • count

Author

Fujiwara Shunichiro fujiwara.shunichiro@gmail.com

License

Copyright 2018 Fujiwara Shunichiro

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

nless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

Index

Constants

View Source
const CustomPrefix = "custom."

Variables

View Source
var (
	DefaultHTTPTimeout         = 15 * time.Second
	DefaultHTTPMetricKeyPrefix = "http"
)
View Source
var (
	Version                = "0.5.4"
	MaxConcurrency         = 100
	MaxClientConcurrency   = 5
	PostMetricBufferLength = 100

	ProbeInterval = 60 * time.Second

	MackerelAPIKey string
)
View Source
var (
	DefaultPingTimeout    = time.Second
	DefaultPingCount      = 3
	DefaultPingMetricName = "ping"
)
View Source
var (
	DefaultTCPTimeout         = 5 * time.Second
	DefaultTCPMaxBytes        = 32 * 1024
	DefaultTCPMetricKeyPrefix = "tcp"
)
View Source
var DefaultCommandTimeout = 15 * time.Second

Functions

func GetSSMParameter added in v0.5.1

func GetSSMParameter(ctx context.Context, name string) (string, error)

func Run

func Run(ctx context.Context, wg *sync.WaitGroup, configPath string, once bool) error

func RunFirehoseEndpoint added in v0.4.0

func RunFirehoseEndpoint(ctx context.Context, wg *sync.WaitGroup, port int)

RunFirehoseEndpoint runs Firehose HTTP endpoint server.

Types

type AggregateDefinition added in v0.2.0

type AggregateDefinition struct {
	Service  exString        `yaml:"service"`
	Role     exString        `yaml:"role"`
	Roles    []exString      `yaml:"roles"`
	Statuses []exString      `yaml:"statuses"`
	Metrics  []*MetricConfig `yaml:"metrics"`
}

type Attribute added in v0.7.0

type Attribute struct {
	Service string
	HostID  string
	Extra   map[string]string
}

func (*Attribute) Otel added in v0.7.0

func (a *Attribute) Otel() *otelattribute.Set

func (*Attribute) SetExtra added in v0.7.0

func (a *Attribute) SetExtra(ex map[string]string, host *mackerel.Host)

func (Attribute) String added in v0.7.0

func (a Attribute) String() string

type BackupConfig added in v0.4.0

type BackupConfig struct {
	FirehoseStreamName string `yaml:"firehose_stream_name"`
}

type Channels added in v0.7.0

type Channels struct {
	ServiceMetrics chan ServiceMetric
	HostMetrics    chan HostMetric
	OtelMetrics    chan Metric
	Destination    *DestinationConfig
}

func NewChannels added in v0.7.0

func NewChannels(dst *DestinationConfig) *Channels

func (*Channels) Close added in v0.7.0

func (ch *Channels) Close()

func (*Channels) SendAggregatedMetric added in v0.7.0

func (ch *Channels) SendAggregatedMetric(m ServiceMetric)

func (*Channels) SendHostMetric added in v0.7.0

func (ch *Channels) SendHostMetric(m HostMetric)

func (*Channels) SendServiceMetric added in v0.7.0

func (ch *Channels) SendServiceMetric(m ServiceMetric)

type Client added in v0.4.0

type Client struct {
	// contains filtered or unexported fields
}

func (*Client) FindHosts added in v0.4.0

func (client *Client) FindHosts(p *mackerel.FindHostsParam) ([]*mackerel.Host, error)

func (*Client) PostHostMetricValues added in v0.4.0

func (c *Client) PostHostMetricValues(mvs []*mackerel.HostMetricValue) error

func (*Client) PostServiceMetricValues added in v0.4.0

func (c *Client) PostServiceMetricValues(serviceName string, mvs []*mackerel.MetricValue) error

type CommandProbe

type CommandProbe struct {
	Command   []string
	Timeout   time.Duration
	GraphDefs bool
	// contains filtered or unexported fields
}

func (*CommandProbe) GetGraphDefs added in v0.1.0

func (p *CommandProbe) GetGraphDefs() (*GraphsOutput, error)

func (*CommandProbe) MetricName

func (p *CommandProbe) MetricName(name string) string

func (*CommandProbe) PostGraphDefs added in v0.1.0

func (p *CommandProbe) PostGraphDefs(client *mackerel.Client, pc *CommandProbeConfig) error

func (*CommandProbe) Run

func (p *CommandProbe) Run(_ context.Context) (ms Metrics, err error)

func (*CommandProbe) String

func (p *CommandProbe) String() string

func (*CommandProbe) TempDir added in v0.3.5

func (p *CommandProbe) TempDir() string

type CommandProbeConfig

type CommandProbeConfig struct {
	RawCommand interface{} `yaml:"command"`

	Timeout   time.Duration     `yaml:"timeout"`
	GraphDefs bool              `yaml:"graph_defs"`
	Env       map[string]string `yaml:"env"`
	// contains filtered or unexported fields
}

func (*CommandProbeConfig) GenerateProbe

func (pc *CommandProbeConfig) GenerateProbe(host *mackerel.Host, client *mackerel.Client) (Probe, error)

type Config

type Config struct {
	Probes            []*ProbeDefinition `yaml:"probes"`
	PostProbedMetrics bool               `yaml:"post_probed_metrics"`

	Aggregates            []*AggregateDefinition `yaml:"aggregates"`
	PostAggregatedMetrics bool                   `yaml:"post_aggregated_metrics"`

	ProbeOnly *bool `yaml:"probe_only"` // deprecated

	Backup      *BackupConfig      `yaml:"backup"`
	Destination *DestinationConfig `yaml:"destination"`
	// contains filtered or unexported fields
}

func LoadConfig

func LoadConfig(location string) (*Config, string, error)

func (*Config) String added in v0.0.1

func (c *Config) String() string

type DestinationConfig added in v0.7.0

type DestinationConfig struct {
	Mackerel *MackerelConfig `yaml:"mackerel"`
	Otel     *OtelConfig     `yaml:"otel"`
}

type Graph added in v0.1.0

type Graph struct {
	Label   string            `json:"label"`
	Unit    string            `json:"unit"`
	Metrics []GraphDefsMetric `json:"metrics"`
}

type GraphDefsMetric added in v0.1.0

type GraphDefsMetric struct {
	Name    string `json:"name"`
	Label   string `json:"label"`
	Stacked bool   `json:"stacked"`
}

type GraphsOutput added in v0.1.0

type GraphsOutput struct {
	Graphs map[string]Graph `json:"graphs"`
}

type HTTPProbe

type HTTPProbe struct {
	URL                string
	Method             string
	Headers            map[string]string
	Body               string
	ExpectPattern      *regexp.Regexp
	Timeout            time.Duration
	NoCheckCertificate bool
	// contains filtered or unexported fields
}

func (*HTTPProbe) HostID

func (p *HTTPProbe) HostID() string

func (*HTTPProbe) MetricName

func (p *HTTPProbe) MetricName(name string) string

func (*HTTPProbe) Run

func (p *HTTPProbe) Run(_ context.Context) (ms Metrics, err error)

func (*HTTPProbe) String

func (p *HTTPProbe) String() string

type HTTPProbeConfig

type HTTPProbeConfig struct {
	URL                string            `yaml:"url"`
	Method             string            `yaml:"method"`
	Headers            map[string]string `yaml:"headers"`
	Body               string            `yaml:"body"`
	ExpectPattern      string            `yaml:"expect_pattern"`
	Timeout            time.Duration     `yaml:"timeout"`
	NoCheckCertificate bool              `yaml:"no_check_certificate"`
	MetricKeyPrefix    string            `yaml:"metric_key_prefix"`
}

func (*HTTPProbeConfig) GenerateProbe

func (pc *HTTPProbeConfig) GenerateProbe(host *mackerel.Host) (Probe, error)

type HostMetric added in v0.2.0

type HostMetric struct {
	HostID string
	Metric
}

func (HostMetric) HostMetricValue added in v0.2.0

func (m HostMetric) HostMetricValue() *mackerel.HostMetricValue

func (HostMetric) String added in v0.2.0

func (m HostMetric) String() string

type HostMetrics added in v0.2.0

type HostMetrics []HostMetric

func (HostMetrics) String added in v0.2.0

func (ms HostMetrics) String() string

type MackerelConfig added in v0.7.0

type MackerelConfig struct {
	Enabled bool `yaml:"enabled"`
}

type Metric

type Metric struct {
	Name      string
	Value     float64
	Timestamp time.Time
	Attribute *Attribute
}

func (Metric) HostMetric added in v0.6.0

func (m Metric) HostMetric(hostID string) HostMetric

func (Metric) Otel added in v0.7.0

func (m Metric) Otel() otelmetricdata.Metrics

func (Metric) OtelString added in v0.7.0

func (m Metric) OtelString() string

func (Metric) ServiceMetric added in v0.6.0

func (m Metric) ServiceMetric(service string) ServiceMetric

func (Metric) String

func (m Metric) String() string

type MetricConfig added in v0.2.0

type MetricConfig struct {
	Name    exString        `yaml:"name"`
	Outputs []*OutputConfig `yaml:"outputs"`
}

type Metrics

type Metrics []Metric

func (Metrics) String

func (ms Metrics) String() string

type OtelConfig added in v0.7.0

type OtelConfig struct {
	Enabled  bool   `yaml:"enabled"`
	Endpoint string `yaml:"endpoint"`
	Insecure bool   `yaml:"insecure"`
}

type OtelMetric added in v0.7.0

type OtelMetric interface {
	ServiceMetric | HostMetric
}

type OutputConfig added in v0.2.0

type OutputConfig struct {
	Func     exString `yaml:"func"`
	Name     exString `yaml:"name"`
	EmitZero bool     `yaml:"emit_zero"`
	// contains filtered or unexported fields
}

type PingProbe

type PingProbe struct {
	Address string
	Count   int
	Timeout time.Duration
	// contains filtered or unexported fields
}

func (*PingProbe) MetricName

func (p *PingProbe) MetricName(name string) string

func (*PingProbe) Run

func (p *PingProbe) Run(ctx context.Context) (Metrics, error)

func (*PingProbe) String

func (p *PingProbe) String() string

type PingProbeConfig

type PingProbeConfig struct {
	Address         string        `yaml:"address"`
	Count           int           `yaml:"count"`
	Timeout         time.Duration `yaml:"timeout"`
	MetricKeyPrefix string        `yaml:"metric_key_prefix"`
}

func (*PingProbeConfig) GenerateProbe

func (pc *PingProbeConfig) GenerateProbe(host *mackerel.Host) (Probe, error)

type Probe

type Probe interface {
	Run(ctx context.Context) (Metrics, error)
	MetricName(string) string
}

type ProbeConfig

type ProbeConfig interface {
	GenerateProbe(host *mackerel.Host) (Probe, error)
}

type ProbeDefinition

type ProbeDefinition struct {
	Service  exString   `yaml:"service"`
	Role     exString   `yaml:"role"`
	Roles    []exString `yaml:"roles"`
	Statuses []exString `yaml:"statuses"`

	IsServiceMetric bool `yaml:"service_metric"`

	Ping    *PingProbeConfig    `yaml:"ping"`
	TCP     *TCPProbeConfig     `yaml:"tcp"`
	HTTP    *HTTPProbeConfig    `yaml:"http"`
	Command *CommandProbeConfig `yaml:"command"`

	Attributes map[string]string `yaml:"attributes"`
}

func (*ProbeDefinition) GenerateProbes

func (pd *ProbeDefinition) GenerateProbes(host *mackerel.Host, client *mackerel.Client) []Probe

func (*ProbeDefinition) RunHostProbes added in v0.6.0

func (pd *ProbeDefinition) RunHostProbes(ctx context.Context, client *Client) []HostMetric

func (*ProbeDefinition) RunProbes added in v0.6.0

func (pd *ProbeDefinition) RunProbes(ctx context.Context, client *Client, chs *Channels, wg *sync.WaitGroup)

func (*ProbeDefinition) RunServiceProbes added in v0.6.0

func (pd *ProbeDefinition) RunServiceProbes(ctx context.Context, client *Client) []ServiceMetric

func (*ProbeDefinition) Validate added in v0.6.0

func (pd *ProbeDefinition) Validate() error

type ServiceMetric added in v0.2.0

type ServiceMetric struct {
	Service string
	Metric
}

func (ServiceMetric) MetricValue added in v0.2.0

func (m ServiceMetric) MetricValue() *mackerel.MetricValue

func (ServiceMetric) String added in v0.2.0

func (m ServiceMetric) String() string

type ServiceMetrics added in v0.2.0

type ServiceMetrics []ServiceMetric

func (ServiceMetrics) String added in v0.2.0

func (ms ServiceMetrics) String() string

type TCPProbe

type TCPProbe struct {
	Host               string
	Port               string
	Send               string
	Quit               string
	MaxBytes           int
	ExpectPattern      *regexp.Regexp
	Timeout            time.Duration
	TLS                bool
	NoCheckCertificate bool
	// contains filtered or unexported fields
}

func (*TCPProbe) MetricName

func (p *TCPProbe) MetricName(name string) string

func (*TCPProbe) Run

func (p *TCPProbe) Run(_ context.Context) (ms Metrics, err error)

func (*TCPProbe) String

func (p *TCPProbe) String() string

type TCPProbeConfig

type TCPProbeConfig struct {
	Host               string        `yaml:"host"`
	Port               string        `yaml:"port"`
	Timeout            time.Duration `yaml:"timeout"`
	Send               string        `yaml:"send"`
	Quit               string        `yaml:"quiet"`
	MaxBytes           int           `yaml:"max_bytes"`
	ExpectPattern      string        `yaml:"expect_pattern"`
	TLS                bool          `yaml:"tls"`
	NoCheckCertificate bool          `yaml:"no_check_certificate"`
	MetricKeyPrefix    string        `yaml:"metric_key_prefix"`
}

func (*TCPProbeConfig) GenerateProbe

func (pc *TCPProbeConfig) GenerateProbe(host *mackerel.Host) (Probe, error)

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL