goresilience

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2019 License: Apache-2.0 Imports: 2 Imported by: 27

README

Goresilience Build Status Go Report Card GoDoc

Goresilience is a Go toolkit to increase the resilience of applications. Inspired by hystrix and similar libraries at it's core but at the same time very different:

Features

  • Increase resilience of the programs.
  • Easy to extend, test and with clean design.
  • Go idiomatic.
  • Use the decorator pattern (middleware), like Go's http.Handler does.
  • Ability to create custom resilience flows, simple, advanced, specific... by combining different runners in chains.
  • Safety defaults.
  • Not couple to any framework/library.
  • Prometheus/Openmetrics metrics as first class citizen.

Table of Contents

Motivation

You are wondering, why another circuit breaker library...?

Well, this is not a circuit breaker library. Is true that Go has some good circuit breaker libraries (like sony/gobreaker, afex/hystrix-go or rubyist/circuitbreaker). But there is a lack a resilience toolkit that is easy to extend, customize and establishes a design that can be extended, that's why goresilience born.

The aim of goresilience is to use the library with the resilience runners that can be combined or used independently depending on the execution logic nature (complex, simple, performance required, very reliable...).

Also one of the key parts of goresilience is the extension to create new runners yourself and use it in combination with the bulkhead, the circuitbreaker or any of the runners of this library or from others.

Getting started

The usage of the library is simple. Everything is based on Runner interface.

The runners can be used in two ways, in standalone mode (one runner):

package main

import (
    "context"
    "log"
    "time"

    "github.com/slok/goresilience/timeout"
)

func main() {
    // Create our command.
    cmd := timeout.New(timeout.Config{
        Timeout: 100 * time.Millisecond,
    })

    for i := 0; i < 200; i++ {
        // Execute.
        result := ""
        err := cmd.Run(context.TODO(), func(_ context.Context) error {
            if time.Now().Nanosecond()%2 == 0 {
                time.Sleep(5 * time.Second)
            }
            result = "all ok"
            return nil
        })

        if err != nil {
            result = "not ok, but fallback"
        }

        log.Printf("the result is: %s", result)
    }
}

or combining in a chain of multiple runners by combining runner middlewares. In this example the execution will be retried timeout and concurrency controlled using a runner chain:

package main

import (
    "context"
    "errors"
    "fmt"

    "github.com/slok/goresilience"
    "github.com/slok/goresilience/bulkhead"
    "github.com/slok/goresilience/retry"
    "github.com/slok/goresilience/timeout"
)

func main() {
    // Create our execution chain.
    cmd := goresilience.RunnerChain(
        bulkhead.NewMiddleware(bulkhead.Config{}),
        retry.NewMiddleware(retry.Config{}),
        timeout.NewMiddleware(timeout.Config{}),
    )

    // Execute.
    calledCounter := 0
    result := ""
    err := cmd.Run(context.TODO(), func(_ context.Context) error {
        calledCounter++
        if calledCounter%2 == 0 {
            return errors.New("you didn't expect this error")
        }
        result = "all ok"
        return nil
    })

    if err != nil {
        result = "not ok, but fallback"
    }

    fmt.Printf("result: %s", result)
}

As you see, you could create any combination of resilient execution flows by combining the different runners of the toolkit.

Static Runners

Static runners are the ones that based on a static configuration and don't change based on the environment (unlike the adaptive ones).

Timeout

This runner is based on timeout pattern, it will execute the goresilience.Func but if the execution duration is greater than a T duration timeout it will return a timeout error.

Check example.

Retry

This runner is based on retry pattern, it will retry the execution of goresilience.Func in case it failed N times.

It will use a exponential backoff with some jitter (for more information check this)

Check example.

Bulkhead

This runner is based on bulkhead pattern, it will control the concurrency of goresilience.Func executions using the same runner.

It also can timeout if a goresilience.Func has been waiting too much to be executed on a queue of execution.

Check example.

Circuit breaker

This runner is based on circuitbreaker pattern, it will be storing the results of the executed goresilience.Func in N buckets of T time to change the state of the circuit based on those measured metrics.

Check example.

Chaos

This runner is based on failure injection of errors and latency. It will inject those failures on the required executions (based on percent or all).

Check example.

Adaptive Runners

Concurrency limit

Concurrency limit is based on Netflix concurrency-limit library. It tries to implement the same features but for goresilience library (nd compatible with other runners).

It limits the concurrency based on less configuration and adaptive based on the environment is running on that moment, hardware, load...

This Runner will limit the concurrency (like bulkhead) but it will use different TCP congestion algorithms to adapt the concurrency limit based on errors and latency.

The Runner is based on 4 components.

  • Limiter: This is the one that will measure and calculate the limit of concurrency based on different algorithms that can be choose, for example AIMD.
  • Executor: This is the one executing the goresilience.Func itself, it has different queuing implementations that will prioritize and drop executions based on the implementations.
  • Runner: This is the runner itself that will be used by the user and is the glue of the Limiter and the Executor. This will had a policy that will treat the execution result as an error, success or ignore for the Limiter algorithm.
  • Result policy: This is a function that can be configured on the concurrencylimit Runner. This function receives the result of the executed function and returns a result for the limit algorithm. This policy is responsible to tell the limit algorithm if the received error should be count as a success, failure or ignore on the calculation of the concurrency limit. For example: only count the errors that have been 502 other ones ignore.

Check AIMD example. Check CoDel example.

Executors
  • FIFO: This executor is the default one it will execute the queue jobs in a first-in-first-out order and also has a queue wait timeout.
  • LIFO: This executor will execute the queue jobs in a last-in-first-out order and also has a queue wait timeout.
  • AdaptiveLIFOCodel: Implementation of Facebook's CoDel+adaptive LIFO algorithm. This executor is used with Static limiter.
Limiter
  • Static: This limiter will set a constant limit that will not change.
  • AIMD: This limiter is based on AIMD TCP congestion algorithm. It increases the limit at a constant rate and when congestion occurs (by timeout or result failure) it will decrease by a configured factor
Result policy
  • everyExternalErrorAsFailurePolicy: is the default policy. for errors that are errors.ErrRejectedExecution they will act as ignored by the limit algorithms, the rest of the errors will be treat as failures.

Other

Metrics

All the runners can be measured using a metrics.Recorder, but instead of passing to every runner, the runners will try to get this recorder from the context. So you can wrap any runner using metrics.NewMiddleware and it will activate the metrics support on the wrapped runners. This should be the first runner of the chain.

At this moment only Prometheus is supported.

In this example the runners are measured.

Measuring has always a performance hit (not too high), on most cases is not a problem, but there is a benchmark to see what are the numbers:

BenchmarkMeasuredRunner/Without_measurement_(Dummy).-4            300000              6580 ns/op             677 B/op         12 allocs/op
BenchmarkMeasuredRunner/With_prometheus_measurement.-4            200000             12901 ns/op             752 B/op         15 allocs/op
Hystrix-like

Using the different runners a hystrix like library flow can be obtained. You can see a simple example of how it can be done on this example

http middleware

Creating HTTP middlewares with goresilience runners is simple and clean. You can see an example of how it can be done on this example. The example shows how you can protect the server by load shedding using an adaptive concurrencylimit goresilience.Runner.

Architecture

At its core, goresilience is based on a very simple idea, the Runner interface, Runner interface is the unit of execution, its accepts a context.Context, a goresilience.Func and returns an error.

The idea of the Runner is the same as the go's http.Handler, having a interface you could create chains of runners, also known as middlewares (Also called decorator pattern).

The library comes with decorators called Middleware that return a function that wraps a runner with another runner and gives us the ability to create a resilient execution flow having the ability to wrap any runner to customize with the pieces that we want including custom ones not in this library.

This way we could create execution flow like this example:

Circuit breaker
└── Timeout
    └── Retry

Extend using your own runners

To create your own runner, You need to have 2 things in mind.

  • Implement the goresilience.Runner interface.
  • Give constructors to get a goresilience.Middleware, this way your Runner could be chained with other Runners.

In this example (full example here) we create a new resilience runner to make chaos engineering that will fail at a constant rate set on the Config.FailEveryTimes setting.

Following the library convention with NewFailer we get the standalone Runner (the one that is not chainable). And with NewFailerMiddleware We get a Middleware that can be used with goresilience.RunnerChain to chain with other Runners.

Note: We can use nil on New because NewMiddleware uses goresilience.SanitizeRunner that will return a valid Runner as the last part of the chain in case of being nil (for more information about this check goresilience.command).

// Config is the configuration of constFailer
type Config struct {
    // FailEveryTimes will make the runner return an error every N executed times.
    FailEveryTimes int
}

// New is like NewFailerMiddleware but will not wrap any other runner, is standalone.
func New(cfg Config) goresilience.Runner {
    return NewFailerMiddleware(cfg)(nil)
}

// NewMiddleware returns a new middleware that will wrap runners and will fail
// every N times of executions.
func NewMiddleware(cfg Config) goresilience.Middleware {
    return func(next goresilience.Runner) goresilience.Runner {
        calledTimes := 0
        // Use the RunnerFunc helper so we don't need to create a new type.
        return goresilience.RunnerFunc(func(ctx context.Context, f goresilience.Func) error {
            // We should lock the counter writes, not made because this is an example.
            calledTimes++

            if calledTimes == cfg.FailEveryTimes {
                calledTimes = 0
                return fmt.Errorf("failed due to %d call", calledTimes)
            }

            // Run using the the chain.
            next = runnerutils.Sanitize(next)
            return next.Run(ctx, f)
        })
    }
}

Documentation

Overview

Package goresilience is a framework/lirbary of utilities to improve the resilience of programs easily.

The library is based on `goresilience.Runner` interface, this runners can be chained using the decorator pattern (like std library `http.Handler` interface). This makes the library being extensible, flexible and clean to use. The runners can be chained like if they were middlewares that could act on all the execution process of the `goresilience.Func`.

Example (Basic)

Will use a single runner, the retry with the default settings this will make the `gorunner.Func` to be executed and retried N times if it fails.

package main

import (
	"context"
	"fmt"
	"io/ioutil"
	"net/http"

	"github.com/slok/goresilience/retry"
)

func main() {
	// Create our func `runner`. Use nil as it will not be chained with another `Runner`.
	cmd := retry.New(retry.Config{})

	// Execute.
	var result string
	err := cmd.Run(context.TODO(), func(ctx context.Context) error {
		resp, err := http.Get("https://bruce.wayne.is.batman.io")
		if err != nil {
			return err
		}
		defer resp.Body.Close()

		b, err := ioutil.ReadAll(resp.Body)
		if err != nil {
			return err
		}

		result = string(b)
		return nil
	})

	// We could fallback to get a Hystrix like behaviour.
	if err != nil {
		result = "fallback result"
	}

	fmt.Printf("result is: %s\n", result)
}
Output:

Example (Chain)

Will use more than one `goresilience.Runner` and chain them to create a very resilient execution of the `goresilience.Func`. In this case we will create a runner that retries and also times out. And we will configure the timeout.

package main

import (
	"context"
	"fmt"
	"io/ioutil"
	"net/http"
	"time"

	"github.com/slok/goresilience"
	"github.com/slok/goresilience/retry"
	"github.com/slok/goresilience/timeout"
)

func main() {
	// Create our chain, first the retry and then the timeout with 100ms.
	cmd := goresilience.RunnerChain(
		retry.NewMiddleware(retry.Config{}),
		timeout.NewMiddleware(timeout.Config{
			Timeout: 100 * time.Millisecond,
		}),
	)

	var result string
	err := cmd.Run(context.TODO(), func(ctx context.Context) error {
		resp, err := http.Get("https://bruce.wayne.is.batman.io")
		if err != nil {
			return err
		}
		defer resp.Body.Close()

		b, err := ioutil.ReadAll(resp.Body)
		if err != nil {
			return err
		}

		result = string(b)
		return nil
	})

	// We could fallback to get a Hystrix like behaviour.
	if err != nil {
		result = "fallback result"
	}

	fmt.Printf("result is: %s\n", result)
}
Output:

Example (Metrics)

Will measure all the execution through the runners uwing prometheus metrics.

package main

import (
	"context"
	"fmt"
	"net/http"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"github.com/slok/goresilience"
	"github.com/slok/goresilience/metrics"
	"github.com/slok/goresilience/retry"
	"github.com/slok/goresilience/timeout"
)

func main() {
	// Create a prometheus registry and expose that registry over http.
	promreg := prometheus.NewRegistry()
	go func() {
		http.ListenAndServe(":8081", promhttp.HandlerFor(promreg, promhttp.HandlerOpts{}))
	}()

	// Create the metrics recorder for our runner.
	metricsRecorder := metrics.NewPrometheusRecorder(promreg)

	// Create our chain with our metircs wrapper.
	cmd := goresilience.RunnerChain(
		metrics.NewMiddleware("example-metrics", metricsRecorder),
		retry.NewMiddleware(retry.Config{}),
		timeout.NewMiddleware(timeout.Config{
			Timeout: 100 * time.Millisecond,
		}),
	)

	var result string
	err := cmd.Run(context.TODO(), func(ctx context.Context) error {
		sec := time.Now().Second()
		if sec%2 == 0 {
			return fmt.Errorf("error because %d is even", sec)
		}
		return nil
	})

	// We could fallback to get a Hystrix like behaviour.
	if err != nil {
		result = "fallback result"
	}

	fmt.Printf("result is: %s\n", result)
}
Output:

Example (Noresult)

Is an example to show that when the result is not needed we don't need to use and inline function.

package main

import (
	"context"

	"github.com/slok/goresilience/retry"
)

func myFunc(ctx context.Context) error { return nil }

func main() {
	cmd := retry.New(retry.Config{})

	// Execute.
	err := cmd.Run(context.TODO(), myFunc)
	if err != nil {
		// Do fallback.
	}
}
Output:

Example (Structresult)

Is an example to show that we could use objects aslo to pass parameter and get our results.

package main

import (
	"context"
	"errors"
	"fmt"

	"github.com/slok/goresilience/retry"
)

func main() {
	type myfuncResult struct {
		name     string
		lastName string
		result   string
	}

	cmd := retry.New(retry.Config{})

	// Execute.
	res := myfuncResult{
		name:     "Bruce",
		lastName: "Wayne",
	}
	err := cmd.Run(context.TODO(), func(ctx context.Context) error {
		if res.name == "Bruce" && res.lastName == "Wayne" {
			res.result = "Batman"
		}
		return errors.New("identity unknown")
	})

	if err != nil {
		res.result = "Unknown"
	}

	fmt.Printf("%s %s is %s", res.name, res.lastName, res.result)
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Func

type Func func(ctx context.Context) error

Func is the function to execute with resilience.

type Middleware

type Middleware func(Runner) Runner

Middleware represents a middleware for a runner, it takes a runner and returns a runner.

type Runner

type Runner interface {
	// Run will run the unit of execution passed on f.
	Run(ctx context.Context, f Func) error
}

Runner knows how to execute a execution logic and returns error if errors.

func RunnerChain

func RunnerChain(middlewares ...Middleware) Runner

RunnerChain will get N middleares and will create a Runner chain with them in the order that have been passed.

func SanitizeRunner

func SanitizeRunner(r Runner) Runner

SanitizeRunner returns a safe execution Runner if the runner is nil. Usually this helper will be used for the last part of the runner chain when the runner is nil, so instead of acting on a nil Runner its executed on a `command` Runner, this runner knows how to execute the `Func` function. It's safe to use it always as if it encounters a safe Runner it will return that Runner.

type RunnerFunc

type RunnerFunc func(ctx context.Context, f Func) error

RunnerFunc is a helper that will satisfies circuit.Breaker interface by using a function.

func (RunnerFunc) Run

func (r RunnerFunc) Run(ctx context.Context, f Func) error

Run satisfies Runner interface.

Directories

Path Synopsis
examples
internal
mocks
Package mocks will have all the mocks of the library, we'll try to use mocking using blackbox testing and integration tests whenever is possible.
Package mocks will have all the mocks of the library, we'll try to use mocking using blackbox testing and integration tests whenever is possible.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL