s

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 30, 2023 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

View Source
const NodeSepToken = "/"

NodeSepToken is the token use to separate sub-trees and child node names in the supervision tree

Variables

View Source
var HealthyReport = HealthReport{}

HealthyReport represents a healthy report

View Source
var WithOrder = WithStartOrder

WithOrder is a backwards compatible alias to WithStartOrder

Deprecated: Use WithStartOrder instead

Functions

func ExplainError

func ExplainError(err error) string

ExplainError is a utility function that explains capataz errors in a human-friendly way. Defaults to a call to error.Error() if the underlying error does not come from the capataz library

Types

type BuildNodesFn

type BuildNodesFn = func() ([]Node, CleanupResourcesFn, error)

BuildNodesFn is a function that returns a list of nodes

Check the documentation of NewSupervisorSpec for more details and examples.

func WithNodes

func WithNodes(nodes ...Node) BuildNodesFn

WithNodes allows the registration of child nodes in a SupervisorSpec. Node records passed to this function are going to be supervised by the Supervisor created from a SupervisorSpec.

Check the documentation of NewSupervisorSpec for more details and examples.

type CleanupResourcesFn

type CleanupResourcesFn = func() error

CleanupResourcesFn is a function that cleans up resources that were allocated in a BuildNodesFn function.

Check the documentation of NewSupervisorSpec for more details and examples

type DynSupervisor

type DynSupervisor struct {
	// contains filtered or unexported fields
}

DynSupervisor is a supervisor that can spawn workers in a procedural way.

func NewDynSupervisor

func NewDynSupervisor(ctx context.Context, name string, opts ...Opt) (DynSupervisor, error)

NewDynSupervisor creates a DynamicSupervisor which can start workers at runtime in a procedural manner. It receives a context and the supervisor name (for tracing purposes).

When to use a DynSupervisor?

If you want to run supervised worker routines on dynamic inputs. This is something that a regular Supervisor cannot do, as it needs to know the children nodes at construction time.

Differences to Supervisor

As opposed to a Supervisor, a DynSupervisor:

* Cannot receive node specifications to start them in an static fashion

* It is able to spawn workers dynamically

  • In case of a hard crash and following restart, it will start with an empty list of children

func (DynSupervisor) GetName

func (dyn DynSupervisor) GetName() string

GetName returns the name of the Spec used to start this Supervisor

func (*DynSupervisor) Spawn

func (dyn *DynSupervisor) Spawn(nodeFn Node) (func() error, error)

Spawn creates a new worker routine from the given node specification. It either returns a cancel/shutdown callback or an error in the scenario the start of this worker failed. This function blocks until the worker is started.

func (*DynSupervisor) Terminate

func (dyn *DynSupervisor) Terminate() error

Terminate is a synchronous procedure that halts the execution of the whole supervision tree.

func (DynSupervisor) Wait

func (dyn DynSupervisor) Wait() error

Wait blocks the execution of the current goroutine until the Supervisor finishes it execution.

type ErrKVs

type ErrKVs interface {
	KVs() map[string]interface{}
}

ErrKVs is an utility interface used to get key-values out of Capataz errors

type Event

type Event struct {
	// contains filtered or unexported fields
}

Event is a record emitted by the supervision system. The events are used for multiple purposes, from testing to monitoring the healthiness of the supervision system.

func (Event) Err

func (e Event) Err() error

Err returns an error reported by the process that emitted this event

func (Event) GetCreated

func (e Event) GetCreated() time.Time

GetCreated returns a timestamp of the creation of the event by the process

func (Event) GetNodeTag

func (e Event) GetNodeTag() c.ChildTag

GetNodeTag returns the c.ChildTag from an Event

func (Event) GetProcessRuntimeName

func (e Event) GetProcessRuntimeName() string

GetProcessRuntimeName returns the given name of a process that emitted this event

func (Event) GetTag

func (e Event) GetTag() EventTag

GetTag returns the EventTag from an Event

func (Event) String

func (e Event) String() string

String returns an string representation for the Event

type EventNotifier

type EventNotifier func(Event)

EventNotifier is a function that is used for reporting events from the from the supervision system.

Check the documentation of WithNotifier for more details.

type EventTag

type EventTag uint32

EventTag specifies the type of Event that gets notified from the supervision system

const (

	// ProcessStarted is an Event that indicates a process started
	ProcessStarted EventTag
	// ProcessTerminated is an Event that indicates a process was stopped by a parent
	// supervisor
	ProcessTerminated
	// ProcessStartFailed is an Event that indicates a process failed to start
	ProcessStartFailed
	// ProcessFailed is an Event that indicates a process reported an error
	ProcessFailed
	// ProcessCompleted is an Event that indicates a process finished without errors
	ProcessCompleted
)

func (EventTag) String

func (tag EventTag) String() string

String returns a string representation of the current EventTag

type HealthReport

type HealthReport struct {
	// contains filtered or unexported fields
}

HealthReport contains a report for the HealthMonitor

func (HealthReport) GetDelayedRestartProcesses

func (hr HealthReport) GetDelayedRestartProcesses() map[string]bool

GetDelayedRestartProcesses returns a list of the delayed restart processes

func (HealthReport) GetFailedProcesses

func (hr HealthReport) GetFailedProcesses() map[string]bool

GetFailedProcesses returns a list of the failed processes

func (HealthReport) IsHealthyReport

func (hr HealthReport) IsHealthyReport() bool

IsHealthyReport indicates if this is a healthy report

type HealthcheckMonitor

type HealthcheckMonitor struct {
	// contains filtered or unexported fields
}

HealthcheckMonitor listens to the events of a supervision tree events, and assess if the supervisor is healthy or not

func NewHealthcheckMonitor

func NewHealthcheckMonitor(
	maxAllowedFailures uint32,
	maxAllowedRestartDuration time.Duration,
) *HealthcheckMonitor

NewHealthcheckMonitor offers a way to monitor a supervision tree health from events emitted by it.

maxAllowedFailures: the threshold beyond which the environment is considered

unhealthy.

maxAllowedRestartDuration: the restart threshold, which if exceeded, indicates

an unhealthy environment. Any process that fails
to restart under the threshold results in an
unhealthy report

func (*HealthcheckMonitor) GetHealthReport

func (h *HealthcheckMonitor) GetHealthReport() HealthReport

GetHealthReport returns a string that indicates why a the system is unhealthy. Returns empty if everything is ok.

func (*HealthcheckMonitor) HandleEvent

func (h *HealthcheckMonitor) HandleEvent(ev Event)

HandleEvent is a function that receives supervision events and assess if the supervisor sending these events is healthy or not

func (*HealthcheckMonitor) IsHealthy

func (h *HealthcheckMonitor) IsHealthy() bool

IsHealthy return true when the system is in a healthy state, meaning, no processes restarting at the moment

type Node

type Node func(SupervisorSpec) c.ChildSpec

Node represents a tree node in a supervision tree, it could either be a Subtree or a Worker

func NewDynSubtree

func NewDynSubtree(
	name string,
	startFn func(context.Context, Spawner) error,
	spawnerOpts []Opt,
	opts ...c.Opt,
) Node

NewDynSubtree builds a worker that has receives a Spawner that allows it to create more child workers dynamically in a sub-tree. NewDynSubtree builds a worker that receives a Spawner which allows it to create more child workers dynamically in a sibling sub-tree.

The runtime subtree is composed of a worker and a supervisor

<name> | `- spawner (creates dynamic workers in sibling subtree) | `- subtree

|
`- <dynamic_worker>

Note: The Spawner is automatically managed by the supervision tree, so clients are not required to terminate it explicitly.

func NewDynSubtreeWithNotifyStart

func NewDynSubtreeWithNotifyStart(
	name string,
	runFn func(context.Context, NotifyStartFn, Spawner) error,
	spawnerOpts []Opt,
	opts ...c.Opt,
) Node

NewDynSubtreeWithNotifyStart accomplishes the same goal as NewDynSubtree with the addition of passing an extra argument (notifyStart callback) to the startFn function parameter.

func NewWorker

func NewWorker(name string, startFn func(context.Context) error, opts ...c.Opt) Node

NewWorker creates a Node that represents a worker goroutine. It requires two arguments: a name that is used for runtime tracing and a startFn function.

The name argument

A name argument must not be empty nor contain forward slash characters (e.g. /), otherwise, the system will panic[*].

[*] This method is preferred as opposed to return an error given it is considered a bad implementation (ideally a compilation error).

The startFn argument

The startFn function is where your business logic should be located. This function will be running on a new supervised goroutine.

The startFn function will receive a context.Context record that *must* be used inside your business logic to accept stop signals from its parent supervisor.

Depending on the Shutdown values used with the WithShutdown settings of the worker, if the `startFn` function does not respect the given context, the parent supervisor will either block forever or leak goroutines after a timeout has been reached.

func NewWorkerWithNotifyStart

func NewWorkerWithNotifyStart(
	name string,
	startFn func(context.Context, NotifyStartFn) error,
	opts ...c.Opt,
) Node

NewWorkerWithNotifyStart accomplishes the same goal as NewWorker with the addition of passing an extra argument (notifyStart callback) to the startFn function parameter.

The NotifyStartFn argument

Sometimes you want to consider a goroutine started after certain initialization was done; like doing a read from a Database or API, or some socket is bound, etc. The NotifyStartFn is a callback that allows the spawned worker goroutine to signal when it has officially started.

It is essential to call this callback function in your business logic as soon as you consider the worker is initialized, otherwise the parent supervisor will block and eventually fail with a timeout.

Report a start error on NotifyStartFn

If for some reason, a child node is not able to start correctly (e.g. DB connection fails, network is kaput), the node may call the given NotifyStartFn function with the impending error as a parameter. This will cause the whole supervision system start procedure to abort.

func Subtree

func Subtree(subtreeSpec SupervisorSpec, opts ...c.Opt) Node

Subtree transforms SupervisorSpec into a Node. This function allows you to insert a black-box sub-system into a bigger supervised system.

Note the subtree SupervisorSpec is going to inherit the event notifier from its parent supervisor.

Example:

// Initialized a SupervisorSpec that doesn't know anything about other
// parts of the systems (e.g. is self-contained)
networkingSubsystem := cap.NewSupervisorSpec("net", ...)

// Another self-contained system
filesystemSubsystem := cap.NewSupervisorSpec("fs", ...)

// SupervisorSpec that is started in your main.go
cap.NewSupervisorSpec("root",
 cap.WithNodes(
   cap.Subtree(networkingSubsystem),
   cap.Subtree(filesystemSubsystem),
 ),
)

type NotifyStartFn

type NotifyStartFn = c.NotifyStartFn

NotifyStartFn is a function given to worker nodes that allows them to notify the parent supervisor that they are officialy started.

The argument contains an error if there was a failure, nil otherwise.

See the documentation of NewWorkerWithNotifyStart for more details

type Opt

type Opt func(*SupervisorSpec)

Opt is a type used to configure a SupervisorSpec

func WithNotifier

func WithNotifier(en EventNotifier) Opt

WithNotifier is an Opt that specifies a callback that gets called whenever the supervision system reports an Event

This function may be used to observe the behavior of all the supervisors in the systems, and it is a great place to hook in monitoring services like logging, error tracing and metrics gatherers

func WithRestartTolerance

func WithRestartTolerance(maxErrCount uint32, errWindow time.Duration) Opt

WithRestartTolerance is a Opt that specifies how many errors the supervisor should be willing to tolerate before giving up restarting and fail.

If the tolerance is met, the supervisor is going to fail, if this is a sub-tree, this error is going to be handled by a grand-parent supervisor, restarting the tolerance again.

Example

// Tolerate 10 errors every 5 seconds
//
// - if there is 11 errors in a 5 second window, it makes the supervisor fail
//
WithRestartTolerance(10, 5 * time.Second)

func WithStartOrder

func WithStartOrder(o Order) Opt

WithStartOrder is an Opt that specifies the start/stop order of a supervisor's children nodes

Possible values may be:

* LeftToRight -- Start children nodes from left to right, stop them from right to left

* RightToLeft -- Start children nodes from right to left, stop them from left to right

func WithStrategy

func WithStrategy(s Strategy) Opt

WithStrategy is an Opt that specifies how children nodes of a supervisor get restarted when one of the nodes fails

Possible values may be:

* OneForOne -- Only restart the failing child

* OneForAll (Not Implemented Yet) -- Restart the failing child and all its siblings[*]

[*] This option may come handy when all the other siblings depend on one another to work correctly.

type Order

type Order uint32

Order specifies the order in which a supervisor is going to start its node children. The stop order is the reverse of the start order.

const (
	// LeftToRight is an Order that specifies children start from left to right
	LeftToRight Order = iota
	// RightToLeft is an Order that specifies children start from right to left
	RightToLeft
)

type RestartToleranceReached

type RestartToleranceReached struct {
	// contains filtered or unexported fields
}

RestartToleranceReached is an error that gets reported when a supervisor has restarted a child so many times over a period of time that it does not make sense to keep restarting.

func NewRestartToleranceReached

func NewRestartToleranceReached(
	tolerance restartTolerance,
	sourceCh c.Child,
	sourceErr error,
	lastErr error,
) *RestartToleranceReached

NewRestartToleranceReached creates an ErrorToleranceReached record

func (*RestartToleranceReached) Error

func (err *RestartToleranceReached) Error() string

func (*RestartToleranceReached) KVs

func (err *RestartToleranceReached) KVs() map[string]interface{}

KVs returns a data bag map that may be used in structured logging

func (*RestartToleranceReached) Unwrap

func (err *RestartToleranceReached) Unwrap() error

Unwrap returns the last error that caused the creation of an ErrorToleranceReached error

type Spawner

type Spawner interface {
	Spawn(Node) (func() error, error)
}

Spawner is a record that can spawn other workers, and can wait for termination

type Strategy

type Strategy uint32

Strategy specifies how children get restarted when one of them reports an error

const (
	// OneForOne is an Strategy that tells the Supervisor to only restart the
	// child process that errored
	OneForOne Strategy = iota
	// OneForAll is an Strategy that tells the Supervisor to restart all the
	// siblings of a failed child process
	OneForAll
)

type Supervisor

type Supervisor struct {
	// contains filtered or unexported fields
}

Supervisor represents the root of a tree of goroutines. A Supervisor may have leaf or sub-tree children, where each of the nodes in the tree represent a goroutine that gets automatic restart abilities as soon as the parent supervisor detects an error has occured. A Supervisor will always be generated from a SupervisorSpec

func (Supervisor) GetCrashError

func (sup Supervisor) GetCrashError(block bool) (bool, error)

GetCrashError is a non-blocking function that returns a crash error if there is one, the first parameter indicates if the supervisor is running or not. If the returned error is not nil, the first result will always be true.

func (Supervisor) GetName

func (sup Supervisor) GetName() string

GetName returns the name of the Spec used to start this Supervisor

func (Supervisor) Terminate

func (sup Supervisor) Terminate() error

Terminate is a synchronous procedure that halts the execution of the whole supervision tree.

func (Supervisor) Wait

func (sup Supervisor) Wait() error

Wait blocks the execution of the current goroutine until the Supervisor finishes it execution.

type SupervisorBuildError

type SupervisorBuildError struct {
	// contains filtered or unexported fields
}

SupervisorBuildError wraps errors returned from a client provided function that builds the supervisor nodes, enhancing it with supervisor information

func (*SupervisorBuildError) Error

func (err *SupervisorBuildError) Error() string

func (*SupervisorBuildError) KVs

func (err *SupervisorBuildError) KVs() map[string]interface{}

KVs returns a metadata map for structured logging

type SupervisorRestartError

type SupervisorRestartError struct {
	// contains filtered or unexported fields
}

SupervisorRestartError wraps an error tolerance surpassed error from a child node, enhancing it with supervisor information and possible termination errors on other siblings

func (*SupervisorRestartError) Error

func (err *SupervisorRestartError) Error() string

Error returns an error message

func (*SupervisorRestartError) KVs

func (err *SupervisorRestartError) KVs() map[string]interface{}

KVs returns a metadata map for structured logging

type SupervisorSpec

type SupervisorSpec struct {
	// contains filtered or unexported fields
}

SupervisorSpec represents the specification of a static supervisor; it serves as a template for the construction of a runtime supervision tree. In the SupervisorSpec you can specify settings like:

* The children (workers or sub-trees) you want spawned in your system when it starts

* The order in which the supervised node children get started

* Notifies the supervisor to restart a child node (and, if specified all its siblings as well) when the node fails in unexpected ways.

func NewSupervisorSpec

func NewSupervisorSpec(name string, buildNodes BuildNodesFn, opts ...Opt) SupervisorSpec

NewSupervisorSpec creates a SupervisorSpec. It requires the name of the supervisor (for tracing purposes) and some children nodes to supervise.

Monitoring children that do not share resources

This is intended for situations where you need worker goroutines that are self-contained running in the background.

To specify a group of children nodes, you need to use the WithNodes utility function. This function may receive Subtree or Worker nodes.

Example:

cap.NewSupervisorSpec("root",

  // (1)
  // Specify child nodes to spawn when this supervisor starts
  cap.WithNodes(
    cap.Subtree(subtreeSupervisorSpec),
    workerChildSpec,
  ),

  // (2)
  // Specify child nodes start from right to left (reversed order) and
  // stop from left to right.
  cap.WithStartOrder(cap.RightToLeft),
)

Monitoring nodes that share resources

Sometimes, you want a group of children nodes to interact between each other via some shared resource that only the workers know about (for example, a gochan, a db datapool, etc).

You are able to specify a custom function (BuildNodesFn) that allocates and releases these resources.

This function should return:

* The children nodes of the supervision tree

* A function that cleans up the allocated resources (CleanupResourcesFn)

* An error, but only in the scenario where a resource initialization failed

Example:

cap.NewSupervisorSpec("root",

  // (1)
  // Implement a function that return all nodes to be supervised.
  // When this supervisor gets (re)started, this function will be called.
  // Imagine this function as a factory for it's children.
  func() ([]cap.Node, cap.CleanupResourcesFn, error) {

    // In this example, child nodes have a shared resource (a gochan)
    // and it gets passed to their constructors.
    buffer := make(chan MyType)
    nodes := []cap.Node{
      producerWorker(buffer),
      consumerWorker(buffer),
    }

    // We create a function that gets executed when the supervisor
    // shuts down.
    cleanup := func() {
      close(buffer)
    }

    // We return the allocated Node records and the cleanup function
    return nodes, cleanup, nil
  },

  // (2)
  cap.WithStartOrder(cap.RightToLeft),
)

Dealing with errors

Given resources can involve IO allocations, using this functionality opens the door to a few error scenarios:

1) Resource allocation returns an error

In this scenario, the supervision start procedure will fail and it will follow the regular shutdown procedure: the already started nodes will be terminated and an error will be returned immediately.

2) Resource cleanup returns an error

In this scenario, the termination procedure will collect the error and report it in the returned SupervisorError.

3) Resource allocation/cleanup hangs

This library does not handle this scenario. Is the responsibility of the user of the API to implement start timeouts and cleanup timeouts inside the given BuildNodesFn and CleanupResourcesFn functions.

func (SupervisorSpec) GetName

func (spec SupervisorSpec) GetName() string

GetName returns the given name of the supervisor spec (not a runtime name)

func (SupervisorSpec) Start

func (spec SupervisorSpec) Start(startCtx context.Context) (Supervisor, error)

Start creates a Supervisor from this SupervisorSpec.

A Supervisor is a tree of workers and/or sub-trees. The Start algorithm spawns the leaf worker goroutines first and then it will go up into the supervisor sub-trees. Depending on the SupervisorSpec's order, it will do an initialization in pre-order (LeftToRight) or post-order (RightToLeft).

Supervisor Tree Initialization

Once all the children leafs are initialized and running, the supervisor will execute its supervision monitor logic (listening to failures on its children). Invoking this method will block the thread until all the children and its sub-tree's childrens have been started.

Failures on Child Initialization

In the scenario that one of the child nodes fails to start (IO error, etc.), the Start algorithm is going to abort the start routine, and is going to stop in reverse order all the child nodes that have been started, finally returning an error value.

type SupervisorStartError

type SupervisorStartError struct {
	// contains filtered or unexported fields
}

SupervisorStartError wraps an error reported on the initialization of a child node, enhancing it with supervisor information and possible termination errors on other siblings

func (*SupervisorStartError) Error

func (err *SupervisorStartError) Error() string

Error returns an error message

func (*SupervisorStartError) KVs

func (err *SupervisorStartError) KVs() map[string]interface{}

KVs returns a metadata map for structured logging

type SupervisorTerminationError

type SupervisorTerminationError struct {
	// contains filtered or unexported fields
}

SupervisorTerminationError wraps errors returned by a child node that failed to terminate (io errors, timeouts, etc.), enhancing it with supervisor information. Note, the only way to have a valid SupervisorTerminationError is for one of the child nodes to fail or the supervisor cleanup operation fails.

func (*SupervisorTerminationError) Error

func (err *SupervisorTerminationError) Error() string

Error returns an error message

func (*SupervisorTerminationError) KVs

func (err *SupervisorTerminationError) KVs() map[string]interface{}

KVs returns a metadata map for structured logging

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL