holmes

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2022 License: Apache-2.0 Imports: 16 Imported by: 0

README

license

Holmes

中文版

Self-aware Golang profile dumper.

Our online system often crashes at midnight (usually killed by the OS due to OOM). As lazy developers, we don't want to be woken up at midnight and waiting for the online error to recur.

holmes comes to rescue.

Design

Holmes collects the following stats every interval passed:

  • Goroutine number by runtime.NumGoroutine.
  • RSS used by the current process with gopsutil
  • CPU percent a total. eg total 8 core, use 4 core = 50% with gopsutil

In addition, holmes will collect RSS based on GC cycle, if you enable GC heap.

After warming up(10 times collects after starting application) phase finished, Holmes will compare the current stats with the average of previous collected stats(10 cycles). If the dump rule is matched, Holmes will dump the related profile to log(text mode) or binary file(binary mode).

When you get warning messages sent by your own monitor system, e.g, memory usage exceed 80%, OOM killed, CPU usage exceed 80%, goroutine num exceed 100k. The profile is already dumped to your dump path. You could just fetch the profile and see what actually happened without pressure.

How to use

Dump goroutine when goroutine number spikes
h, _ := holmes.New(
    holmes.WithCollectInterval("5s"),
    holmes.WithDumpPath("/tmp"),
    holmes.WithTextDump(),
    holmes.WithDumpToLogger(true),
    holmes.WithGoroutineDump(10, 25, 2000, 10*1000,time.Minute),
)
h.EnableGoroutineDump()

// start the metrics collect and dump loop
h.Start()

// stop the dumper
h.Stop()
  • WithCollectInterval("5s") means the system metrics are collected once 5 seconds
  • WithDumpPath("/tmp") means the dump binary file(binary mode) will write content to /tmp dir.
  • WithTextDump() means not in binary mode, so it's text mode profiles
  • WithDumpToLogger() means profiles information will be outputted to logger.
  • WithGoroutineDump(10, 25, 2000, 100*1000,time.Minute) means dump will happen when current_goroutine_num > 10 && current_goroutine_num < 100*1000 && current_goroutine_num > 125% * previous_average_goroutine_num or current_goroutine_num > 2000, time.Minute means once a dump happened, the next dump will not happen before cooldown finish-1 minute.

    WithGoroutineDump(min int, diff int, abs int, max int, coolDown time.Duration) 100*1000 means max goroutine number, when current goroutines number is greater 100k, holmes would not dump goroutine profile. Cuz if goroutine num is huge, e.g, 100k goroutine dump will also become a heavy action: stw && stack dump. Max = 0 means no limit.

dump cpu profile when cpu load spikes
h, _ := holmes.New(
    holmes.WithCollectInterval("5s"),
    holmes.WithDumpPath("/tmp"),
    holmes.WithCPUDump(20, 25, 80, time.Minute),
    holmes.WithCPUMax(90),
)
h.EnableCPUDump()

// start the metrics collect and dump loop
h.Start()

// stop the dumper
h.Stop()
  • WithCollectInterval("5s") means the system metrics are collected once 5 seconds
  • WithDumpPath("/tmp") means the dump binary file(binary mode) will write content to /tmp dir.
  • WithBinaryDump() or WithTextDump() doesn't affect the CPU profile dump, because the pprof standard library doesn't support text mode dump.
  • WithCPUDump(10, 25, 80,time.Minute) means dump will happen when cpu usage > 10% && cpu usage > 125% * previous cpu usage recorded or cpu usage > 80%. time.Minute means once a dump happened, the next dump will not happen before cooldown finish-1 minute.
  • WithCPUMax means holmes would not dump all types profile when current cpu usage percent is greater than CPUMaxPercent.
dump heap profile when RSS spikes
h, _ := holmes.New(
    holmes.WithCollectInterval("5s"),
    holmes.WithDumpPath("/tmp"),
    holmes.WithTextDump(),
    holmes.WithMemDump(30, 25, 80,time.Mintue),
)

h.EnableMemDump()

// start the metrics collect and dump loop
h.Start()

// stop the dumper
h.Stop()
  • WithCollectInterval("5s") means the system metrics are collected once 5 seconds
  • WithDumpPath("/tmp") means the dump binary file(binary mode) will write content to /tmp dir.
  • WithTextDump() means not in binary mode, so it's text mode profiles
  • WithMemDump(30, 25, 80, time.Minute) means dump will happen when memory usage > 10% && memory usage > 125% * previous memory usage or memory usage > 80%. time.Minute means once a dump happened, the next dump will not happen before cooldown finish-1 minute.
Dump heap profile when RSS spikes based GC cycle

In some situations we can not get useful information, such the application allocates heap memory and collects it between one CollectInterval. So we design a new heap memory monitor rule, which bases on GC cycle, to control holmes dump. It will dump twice heap profile continuously while RSS spike, then devs can compare the profiles through pprof base command.

	h, _ := holmes.New(
		holmes.WithDumpPath("/tmp"),
		holmes.WithLogger(holmes.NewFileLog("/tmp/holmes.log", mlog.INFO)),
		holmes.WithBinaryDump(),
		holmes.WithMemoryLimit(100*1024*1024), // 100MB
		holmes.WithGCHeapDump(10, 20, 40, time.Minute),
		// holmes.WithProfileReporter(reporter),
	)
	h.EnableGCHeapDump().Start()
	time.Sleep(time.Hour)
Set holmes configurations on fly

You can use Set method to modify holmes' configurations when the application is running.

    h.Set(
        WithCollectInterval("2s"),
        WithGoroutineDump(min, diff, abs, 90, time.Minute))
Reporter dump event

You can use Reporter to implement the following features:

  • Send alarm messages that include the scene information when holmes dump profiles.
  • Send profiles to the data center for saving or analyzing.
        type ReporterImpl struct{}
        func (r *ReporterImpl) 	Report(pType string, filename string, reason ReasonType, eventID string, sampleTime time.Time, pprofBytes []byte, scene Scene) error{
            // do something	
        }
        ......
        r := &ReporterImpl{} // a implement of holmes.ProfileReporter Interface.
    	h, _ := holmes.New(
            holmes.WithProfileReporter(reporter),
            holmes.WithDumpPath("/tmp"),
            holmes.WithLogger(holmes.NewFileLog("/tmp/holmes.log", mlog.INFO)),
            holmes.WithBinaryDump(),
            holmes.WithMemoryLimit(100*1024*1024), // 100MB
            holmes.WithGCHeapDump(10, 20, 40, time.Minute),
)
  
Enable holmes as pyroscope client

Holmes supports to upload your profile to pyroscope server. More details click here please.

Noted that NOT set TextDump when you enable holmes as pyroscope client.

Enable them all!

It's easy.

h, _ := holmes.New(
    holmes.WithCollectInterval("5s"),
    holmes.WithDumpPath("/tmp"),
    holmes.WithTextDump(),

    holmes.WithCPUDump(10, 25, 80, time.Minute),
    //holmes.WithMemDump(30, 25, 80, time.Minute),
    holmes.WithGCHeapDump(10, 20, 40, time.Minute),
    holmes.WithGoroutineDump(500, 25, 20000, 0,time.Minute),
)

    h.EnableCPUDump().
    EnableGoroutineDump().
	EnableMemDump().
	EnableGCHeapDump().

Running in docker or other cgroup limited environment
h, _ := holmes.New(
    holmes.WithCollectInterval("5s"),
    holmes.WithDumpPath("/tmp"),
    holmes.WithTextDump(),

    holmes.WithCPUDump(10, 25, 80,time.Minute),
    holmes.WithCGroup(true), // set cgroup to true
)

known risks

If golang version < 1.19, collect a goroutine itself may cause latency spike because of the long time STW. At golang 1.19, it has been optz by concurrent way at this CL.

Show cases

Click here

Contributing

See our contributor guide.

Community

Scan the QR code below with DingTalk(钉钉) to join the Holmes user group.

dingtalk

Documentation

Index

Constants

View Source
const (
	// TrimResultTopN trimResult return only reserve the top n.
	TrimResultTopN = 10

	// TrimResultMaxBytes trimResultFront return only reserve the front n bytes.
	TrimResultMaxBytes = 512000

	// NotSupportTypeMaxConfig means this profile type is
	// not support control dump profile by max parameter.
	NotSupportTypeMaxConfig = 0

	// UniformLogFormat is the format of uniform logging.
	UniformLogFormat = "[Holmes] %v %v, config_min : %v, config_diff : %v, config_abs : %v, config_max : %v, previous : %v, current: %v"
)

Variables

This section is empty.

Functions

func NewFileLog

func NewFileLog(path string, level mlog.Level) mlog.ErrorLogger

func NewStdLogger

func NewStdLogger() mlog.ErrorLogger

NewStdLogger create an ErrorLogger interface value that writing to os.Stdout

Types

type DumpOptions

type DumpOptions struct {
	// full path to put the profile files, default /tmp
	DumpPath string
	// default dump to binary profile, set to true if you want a text profile
	DumpProfileType dumpProfileType
	// only dump top 10 if set to false, otherwise dump all, only effective when in_text = true
	DumpFullStack bool
	// dump profile to logger. It will make huge log output if enable DumpToLogger option. issues/90
	DumpToLogger bool
}

DumpOptions contains configuration about dump file.

type Holmes

type Holmes struct {

	// lock Protect the following
	sync.Mutex
	// contains filtered or unexported fields
}

Holmes is a self-aware profile dumper.

func New

func New(opts ...Option) (*Holmes, error)

New creates a holmes dumper.

func (*Holmes) Alertf

func (h *Holmes) Alertf(alert string, format string, args ...interface{})

func (*Holmes) Debugf

func (h *Holmes) Debugf(format string, args ...interface{})

func (*Holmes) DisableCPUDump

func (h *Holmes) DisableCPUDump() *Holmes

DisableCPUDump disables the CPU dump.

func (*Holmes) DisableGCHeapDump

func (h *Holmes) DisableGCHeapDump() *Holmes

DisableGCHeapDump disables the gc heap dump.

func (*Holmes) DisableGoroutineDump

func (h *Holmes) DisableGoroutineDump() *Holmes

DisableGoroutineDump disables the goroutine dump.

func (*Holmes) DisableMemDump

func (h *Holmes) DisableMemDump() *Holmes

DisableMemDump disables the mem dump.

func (*Holmes) DisableProfileReporter

func (h *Holmes) DisableProfileReporter()

func (*Holmes) DisableShrinkThread

func (h *Holmes) DisableShrinkThread() *Holmes

DisableShrinkThread disables shrink thread

func (*Holmes) DisableThreadDump

func (h *Holmes) DisableThreadDump() *Holmes

DisableThreadDump disables the goroutine dump.

func (*Holmes) EnableCPUDump

func (h *Holmes) EnableCPUDump() *Holmes

EnableCPUDump enables the CPU dump.

func (*Holmes) EnableDump

func (h *Holmes) EnableDump(curCPU int) (err error)

func (*Holmes) EnableGCHeapDump

func (h *Holmes) EnableGCHeapDump() *Holmes

EnableGCHeapDump enables the GC heap dump.

func (*Holmes) EnableGoroutineDump

func (h *Holmes) EnableGoroutineDump() *Holmes

EnableGoroutineDump enables the goroutine dump.

func (*Holmes) EnableMemDump

func (h *Holmes) EnableMemDump() *Holmes

EnableMemDump enables the mem dump.

func (*Holmes) EnableProfileReporter

func (h *Holmes) EnableProfileReporter()

func (*Holmes) EnableShrinkThread

func (h *Holmes) EnableShrinkThread() *Holmes

EnableShrinkThread enables shrink thread

func (*Holmes) EnableThreadDump

func (h *Holmes) EnableThreadDump() *Holmes

EnableThreadDump enables the goroutine dump.

func (*Holmes) Errorf

func (h *Holmes) Errorf(format string, args ...interface{})

func (*Holmes) Infof

func (h *Holmes) Infof(format string, args ...interface{})

func (*Holmes) ReportProfile

func (h *Holmes) ReportProfile(pType string, filename string, reason ReasonType, eventID string, sampleTime time.Time, pprofBytes []byte, scene Scene)

func (*Holmes) Set

func (h *Holmes) Set(opts ...Option) error

Set sets holmes's optional after initialing.

func (*Holmes) Start

func (h *Holmes) Start()

Start starts the dump loop of holmes.

func (*Holmes) Stop

func (h *Holmes) Stop()

Stop the dump loop.

func (*Holmes) Warnf

func (h *Holmes) Warnf(format string, args ...interface{})

type Option

type Option interface {
	// contains filtered or unexported methods
}

Option holmes option type.

func WithBinaryDump

func WithBinaryDump() Option

WithBinaryDump set dump mode to binary.

func WithCGroup

func WithCGroup(useCGroup bool) Option

WithCGroup set holmes use cgroup or not.

func WithCPUCore

func WithCPUCore(cpuCore float64) Option

WithCPUCore overwrite the system level CPU core number when it > 0. it's not a good idea to modify it on fly since it affects the CPU percent caculation.

func WithCPUDump

func WithCPUDump(min int, diff int, abs int, coolDown time.Duration) Option

WithCPUDump set the cpu dump options.

func WithCPUMax

func WithCPUMax(max int) Option

WithCPUMax : set the CPUMaxPercent parameter as max

func WithCollectInterval

func WithCollectInterval(interval string) Option

WithCollectInterval : interval must be valid time duration string, eg. "ns", "us" (or "µs"), "ms", "s", "m", "h".

func WithDumpPath

func WithDumpPath(dumpPath string) Option

WithDumpPath set the dump path for holmes.

func WithDumpToLogger

func WithDumpToLogger(new bool) Option

func WithFile added in v1.2.0

func WithFile(use bool) Option

func WithFullStack

func WithFullStack(isFull bool) Option

WithFullStack set to dump full stack or top 10 stack, when dump in text mode.

func WithGCHeapDump

func WithGCHeapDump(min int, diff int, abs int, coolDown time.Duration) Option

WithGCHeapDump set the GC heap dump options.

func WithGoProcAsCPUCore

func WithGoProcAsCPUCore(enabled bool) Option

WithGoProcAsCPUCore set holmes use cgroup or not.

func WithGoroutineDump

func WithGoroutineDump(min int, diff int, abs int, max int, coolDown time.Duration) Option

WithGoroutineDump set the goroutine dump options.

func WithLogger

func WithLogger(logger mlog.ErrorLogger) Option

WithLogger set the logger logger can be created by: NewFileLog("/path/to/log/file", level)

func WithMemDump

func WithMemDump(min int, diff int, abs int, coolDown time.Duration) Option

WithMemDump set the memory dump options.

func WithMemoryLimit

func WithMemoryLimit(limit uint64) Option

WithMemoryLimit overwrite the system level memory limit when it > 0.

func WithProfileReporter

func WithProfileReporter(r ProfileReporter) Option

WithProfileReporter will enable reporter reopens profile reporter through WithProfileReporter(h.opts.rptOpts.reporter)

func WithShrinkThread

func WithShrinkThread(threshold int, delay time.Duration) Option

WithShrinkThread enable/disable shrink thread when the thread number exceed the max threshold.

func WithTextDump

func WithTextDump() Option

WithTextDump set dump mode to text.

func WithThreadDump

func WithThreadDump(min, diff, abs int, coolDown time.Duration) Option

WithThreadDump set the thread dump options.

type ProfileReporter

type ProfileReporter interface {
	Report(pType string, filename string, reason ReasonType, eventID string, sampleTime time.Time, pprofBytes []byte, scene Scene) error
}

type ReasonType

type ReasonType uint8
const (
	// ReasonCurlLessMin means current value is less than min value.
	ReasonCurlLessMin ReasonType = iota
	// ReasonCurlGreaterMin means current value is greater than min value,
	// but don't meet any trigger conditions.
	ReasonCurlGreaterMin
	// ReasonCurGreaterMax means current value is greater than max value.
	ReasonCurGreaterMax
	// ReasonCurGreaterAbs means current value meets the trigger condition where
	// it is greater than abs value.
	ReasonCurGreaterAbs
	// ReasonDiff means current value is greater than the value: (diff + 1) * agv.
	ReasonDiff
)

func (ReasonType) String

func (rt ReasonType) String() string

type ReporterOptions

type ReporterOptions struct {
	// contains filtered or unexported fields
}

type Scene

type Scene struct {

	// current value while dump event occurs
	CurVal int
	// Avg is the average of the past values
	Avg int
	// contains filtered or unexported fields
}

Scene contains the scene information when profile triggers, including current value, average value and configurations.

func (*Scene) Set

func (base *Scene) Set(min, abs, diff int, coolDown time.Duration)

type ShrinkThrOptions

type ShrinkThrOptions struct {
	// shrink the thread number when it exceeds the max threshold that specified in Threshold
	Enable    bool
	Threshold int
	Delay     time.Duration // start to shrink thread after the delay time.
}

ShrinkThrOptions contains the configuration about shrink thread

Directories

Path Synopsis
example
reporters
pyroscope_reporter
* Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.
* Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL