slog-agent

command module
v0.0.0-...-f132d4f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2023 License: MIT Imports: 6 Imported by: 0

README

slog-agent

A log agent designed to process and filter massive amounts of logs in reat-time and forward to upsteam (fluentd)

What we built this for

We have hundreds of thousands of application logs per second that need to be processed or filtered as quickly as possible, for each server.

At the target rate of one million logs per second, every steps could be bottlenecks and conventional log processors are not designed to handle that sort of traffic. This agent is built to be extremely efficient, both memory and CPU wise, and also to be able to scale up to multiple CPU cores efficiently, at the cost of everything else.

A possibly baised and unfair comparison of this vs Lua transform with fluent-bit, is roughly 0.5M log/s from network input, processed and gzipped at 1:20-50 ratio (2 cores), vs 50K log/s from file and uncompressed (one core) for the same processing steps. We also tested Vector with similar but worse results.

What you need to adopt this

You need basic understanding of Go, to be ready to write new transforms and dig into profiling reports.

Things are slow on generic log processors for very good reasons - For example, a simple matching by regular expression could be 50 times slower than a special glob pattern, and allocates tons of buffers in memory heap which then need more CPU time to be GC'ed. The boundary crossing scripting interface is another bottleneck, with marshalling and unmarshalling of each records that could cost more than the script execution itself.

Without any of such generic and flexible transforms and parsers, everything needs to be done in manually written code, or blocks of code that can be assembled together - which is essentially what this log agent provides, a base and blocks of code for you to build high performance log processors - but only if you need that kind of performance. The design is pluggable and the program is largely configurable, but you're going to run into situations which can only be solved by writing new code.

Features

  • Input: RFC 5424 Syslog protocol via TCP, with experimental multiline support
  • Transforms: field extraction and creations, drop, truncate, if/switch, email redaction
  • Buffering: hybrid disk+memory buffering - compressed and only persisted when necessary
  • Output: Fluentd Forward protocol, both compressed and uncompressed. Single output only.
  • Metrics: Prometheus metrics to count logs and log size by key fields (e.g. vhost + log level + filename)

Dynamic fields are not supported - All fields must be known in configuration because they're packed in arrays that can be accessed without hashmap lookup.

"tags" or similar concept doesn't exist here. Instead there are "if" and "switch-case" matching field values.

See the sample configurations for full features.

Performance and Backpressure

Logs are compressed and saved in chunk files if output cannot clear the logs fast enough. The maximum numbers of pending chunks for each pipeline (key field set) are limited and defined in defs/params.go.

Input would be paused if logs cannot be processed fast enough - since RFC 5424 doesn't support any pause mechanism, it'd likely cause internal errors on both the agent and the logging application, but would not affect other applications' logging if pipelines are properly set-up / isolated (e.g. by app-name and vhost).

For a typical server CPU (e.g. Xeon, 2GHz), a single pipeline / core should be able to handle at least:

  • 300-500K log/s for small logs, around 100-200 bytes each including syslog headers
  • 200K log/s or 400MB/s for larger logs

Note on servers with more than a few dozens of CPU cores, an optimal GOMAXPROCS has to be measured and set for production workload, until https://github.com/golang/go/issues/28808 is resolved

Build

Requires gotils which provides build tools

make
make test

Operation manual

Configuration

See sample configurations.

Experimental configuration reloading is supported by starting with --allow_reload and sending SIGHUP; See [testdata/config_sample.yml] for details on which sections may be reconfigured. In general everything after inputs are re-configurable. If reconfiguration fails, errors are logged and the agent would continue to run with old configuration, without any side-effect.

Note after successful reloading, some of previous logs may be sent to upstream again if they hadn't been acknowledged in time.

The metric family slogagent_reloads_total counts sucesses and failures of reconfigurations.

Currently it is not possible to recover previously queued logs if orchestration/keys have been changed.

Runtime diagnosis
  • SIGHUP aborts and recreates all pipelines with new config loaded from the same file. Incoming connections are unaffected.
  • SIGUSR1 recreates all outgoing connections or sessions gracefully.
  • http://localhost:METRICS_PORT/ provides Golang's builtin debug functions in addition to metrics, such as stackdump and profiling.

Development

Mark inlinable code

Add xx:inline comment on the same line as function declaration

func (s *xLogSchema) GetFieldName(index int) string { // xx:inline

If this function is too complex to be inlined, build would fail with a warning.

Re-generate templated source (.tpl.go)
make gen
Re-generate expected output in integration tests
make test-gen

Runtime Diagnosis

Prometheus listener address (default :9335) exposes go's debug/pprof in addition to metrics, which can dump goroutine stacks.

Options:

  • --cpuprofile FILE_PATH: enable GO CPU profiling, with some overhead
  • --memprofile FILE_PATH: enable GO CPU profiling
  • --trace FILE_PATH: enable GO tracing

Benchmark & Profiling

Example:

LOG_LEVEL=warn time BUILD/slog-agent benchmark agent --input 'testdata/development/*.log' --repeat 250000 --config testdata/config_sample.yml --output null --cpuprofile /tmp/agent.cpu --memprofile /tmp/agent.mem
go tool pprof -http=:8080 BUILD/slog-agent /tmp/agent.cpu

--output supports several formats:

  • `` (empty): default to forward to fluentd as defined in config. Chunks may not be fully sent when shutdown and unsent chunks would be saved for next run.
  • null: no output. Results are compressed as in normal routine, counted and then dropped.
  • .../%s: create fluentd forward message files each of individual chunks at the path (%s as chunk ID). The dir must exist first.
  • .../%s.json: create JSON files each of individual chunks at the path (%s as chunk ID). The dir must exist first.

fluentd forward message files can be examined by fluentlibtool

Internals

See DESIGN

Key dependencies
  • fluentlib for fluentd forward protocol, fake server and dump tool for testing.
  • klauspost' compress library for fast gzip compression which is absolutely critical to the agent: always benchmark before upgrade. compression takes 1/2 to 1/3 of CPU time in our environments
  • YAML v3 required for custom tags in configuration. KnownFields is still not working and it cannot check non-existent or misspelled YAML properties.

Authors

Special thanks to Henrik Sjöström for his guiding on Go optimization, integration testing and invaluable suggestions on performant design.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
Package base defines data types of log processing and interfaces of log processing steps, for example LogRecord struct and LogTransform interface
Package base defines data types of log processing and interfaces of log processing steps, for example LogRecord struct and LogTransform interface
bconfig
Package bconfig provides configuration interfaces for log processing units and factory mechanism
Package bconfig provides configuration interfaces for log processing units and factory mechanism
bmatch
Package bmatch provides log matchers used for filtering and transforms
Package bmatch provides log matchers used for filtering and transforms
bsupport
Package bsupport provides helpers for log processing and abstract types for implementations,
Package bsupport provides helpers for log processing and abstract types for implementations,
btest
Package btest provides test utilities and stubs of interfaces inside the base package
Package btest provides test utilities and stubs of interfaces inside the base package
Package buffer registers the list of all ChunkBufferer implementations
Package buffer registers the list of all ChunkBufferer implementations
hybridbuffer
Package hybridbuffer provides an ChunkBufferer implementation which keeps N numbers of unsent chunks in memory and starts saving to and loading from the queue directory when the limit is reached.
Package hybridbuffer provides an ChunkBufferer implementation which keeps N numbers of unsent chunks in memory and starts saving to and loading from the queue directory when the limit is reached.
Package cmd provides list of commands including self-benchmarks and tools
Package cmd provides list of commands including self-benchmarks and tools
Package defs provides shared constants and parameters
Package defs provides shared constants and parameters
Package input registers the list of all LogInput implementations
Package input registers the list of all LogInput implementations
sysloginput
Package sysloginput provides an input source for Syslog (RFC 5424) protocol via TCP
Package sysloginput provides an input source for Syslog (RFC 5424) protocol via TCP
syslogparser
Package syslogparser provides a LogParser for Syslog protocol (RFC 5424).
Package syslogparser provides a LogParser for Syslog protocol (RFC 5424).
syslogprotocol
Package syslogprotocol provides shared functions and constants of the syslog RFC 5424 protocol
Package syslogprotocol provides shared functions and constants of the syslog RFC 5424 protocol
tcplistener
Package tcplistener provides TCP listener(s)
Package tcplistener provides TCP listener(s)
Package orchestrate registers the list of all Orchestrator implementations
Package orchestrate registers the list of all Orchestrator implementations
obase
Package obase provides common classes for orchestration
Package obase provides common classes for orchestration
obykeyset
Package obykeyset provides ByKeySetOrchestrator, which creates pipelines for each of unique key-field set and distribute input logs among them
Package obykeyset provides ByKeySetOrchestrator, which creates pipelines for each of unique key-field set and distribute input logs among them
osingleton
Package osingleton provides SingletonOrchestrator, for test and benchmark only
Package osingleton provides SingletonOrchestrator, for test and benchmark only
Package output registers the list of all output implementations
Package output registers the list of all output implementations
baseoutput
Package baseoutput provides common framework for output implementations.
Package baseoutput provides common framework for output implementations.
fastmsgpack
Package fastmsgpack offers a subset of msgpack serialization operated on fixed length []byte with no heap allocation, no IO abstraction and all calls are inlined.
Package fastmsgpack offers a subset of msgpack serialization operated on fixed length []byte with no heap allocation, no IO abstraction and all calls are inlined.
fluentdforward
Package fluentdforward provides output implementations for fluentd "Forward" protocol, split into:
Package fluentdforward provides output implementations for fluentd "Forward" protocol, split into:
Package rewrite registers the list of all LogRewriter implementations
Package rewrite registers the list of all LogRewriter implementations
rcopy
Package rcopy provides 'copy' rewriter, which copies the original field value unmodified
Package rcopy provides 'copy' rewriter, which copies the original field value unmodified
rinline
Package rinline provides 'inline' rewriter, which inlines exactly one other field into the current field value if exists
Package rinline provides 'inline' rewriter, which inlines exactly one other field into the current field value if exists
runescape
Package runescape provides 'unescape' rewriter, which handles custom escape bytes like those in JSON strings.
Package runescape provides 'unescape' rewriter, which handles custom escape bytes like those in JSON strings.
Package run runs the actual log agent
Package run runs the actual log agent
Package test provides integration testing for the whole log agent
Package test provides integration testing for the whole log agent
Package transform registers the list of all LogTransform implementations
Package transform registers the list of all LogTransform implementations
taddfields
Package taddfields provides 'addFields' transform, which adds fields of fixed value or string template (with '$') to every log records, for example "message: task=$task class=$class $message" or "task_last_digit: ${task[-1:]}"
Package taddfields provides 'addFields' transform, which adds fields of fixed value or string template (with '$') to every log records, for example "message: task=$task class=$class $message" or "task_last_digit: ${task[-1:]}"
tblock
Package tblock provides 'block' transform, which groups child transform steps
Package tblock provides 'block' transform, which groups child transform steps
tdelfields
Package tdelfields provides 'delFields' transform which removes (empties) fields from log records
Package tdelfields provides 'delFields' transform which removes (empties) fields from log records
tdrop
Package tdrop provides 'drop' transform, which drops all log records matching specific criteria
Package tdrop provides 'drop' transform, which drops all log records matching specific criteria
textract
Package textract provides 'extract' transform, which parses specified field with regular expression and updates fields with named captures (overriding any existing value).
Package textract provides 'extract' transform, which parses specified field with regular expression and updates fields with named captures (overriding any existing value).
textractspecial
Package textractspecial provides 'extractHead' and 'extractTail' transforms, using prefix+wildcard+postfix for fast field extraction of simple cases, e.g.
Package textractspecial provides 'extractHead' and 'extractTail' transforms, using prefix+wildcard+postfix for fast field extraction of simple cases, e.g.
tif
Package tif provides 'if' transform, performing optional steps if the given conditions are satisfied
Package tif provides 'if' transform, performing optional steps if the given conditions are satisfied
tmapvalue
Package tmapvalue provides 'mapValue' transform, providing one-to-one mapping on a field value.
Package tmapvalue provides 'mapValue' transform, providing one-to-one mapping on a field value.
tparsetime
Package tparsetime provides 'parseTime' transform to parses timestamp from a given field.
Package tparsetime provides 'parseTime' transform to parses timestamp from a given field.
tredactemail
Package tredactemail provides 'redactEmail' transform to mask email addresses
Package tredactemail provides 'redactEmail' transform to mask email addresses
treplace
Package treplace provides 'replace' transform to performs replacements by regular expression on specified field.
Package treplace provides 'replace' transform to performs replacements by regular expression on specified field.
tswitch
Package tswitch provides 'switch' transform which acts like C switch without fallthrough.
Package tswitch provides 'switch' transform which acts like C switch without fallthrough.
ttruncate
Package ttruncate provides 'truncate' transform to truncate field values exceeding certain limit
Package ttruncate provides 'truncate' transform to truncate field values exceeding certain limit
tunescape
Package tunescape provides 'unescape' transform, which handles custom escape bytes like those in JSON strings.
Package tunescape provides 'unescape' transform, which handles custom escape bytes like those in JSON strings.
Package util provides utility functions and types
Package util provides utility functions and types
localcachedmap
Package localcachedmap provides a map with thread-local caches (copy-on-reference)
Package localcachedmap provides a map with thread-local caches (copy-on-reference)
stringtemplate
Package stringtemplate provides string expansion by pre-compiled templates, for example:
Package stringtemplate provides string expansion by pre-compiled templates, for example:
stringunescape
Package stringunescape provides Unescaper(s) for escaped strings
Package stringunescape provides Unescaper(s) for escaped strings

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL