leaderelection

package module
v0.0.0-...-509b79d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2021 License: Apache-2.0 Imports: 8 Imported by: 1

README

GoLea Goalie

GoLea provides the capability for a set of distributed processes to compete for leadership for a shared resource. It is implemented using Zookeeper for the underlying support. For a general description of leader election, see the wikipedia page describing leader election.

Terms

Term

Description

Nominee

Nominee is a potential leader. Each client nominates itself. One will become Leader, the others will become Candidates.

Candidate

See Nominee.

Client

Client is the customer, or client, of the Library (see below). The Client uses the Library to negotiate leadership for a given resource.

Election Node

ElectionNode is the base Zookeeper node that leader candidates will place their nominations for leadership. For the leader election examples in this package this node is called /election. For an actual application elections may be defined as `/election-type`. Using politics as an example, the `election-type` might be `president`.

An application that has the concept of a *primary* and *standby* `election-type` might be *primary*. For applications that have multiple components the *Election Node* might have 2 or more Zookeeper nodes. An example might be `/component-type/primary` where multiple application components have the concept of `primary` and `standby`.

Resource

Resource is the target of the leadership request. If leadership is granted the leader is free to manage the resource in whatever way is appropriate to the application without worrying that other clients are concurrently managing that resource. *Resource* is a more general name for *Election Node*.

Follower

Follower is an entity that is not the leader for a Resource, but is actively monitoring the resource in the event the Leader is removed from its leadership position. If this happens, a Follower becomes a candidate for Leader which will be decided via an automatic, asynchronous, process.

Leader

Leader is the entity that has exclusive ownership for a given Resource.

Library

Library is this component - i.e., the Leader Election library.

Package

See Library.

ZK

Zookeeper.

How to use

This section provides an overview of the various phases of an election. The file Election_test.go is a good example of how a client is expected to use the leader election package. The integration tests (see the integrationtest directory) may also provide some insight into how to use the package although their intent is primarily to test the package vs. provide examples.

Leader Election (LE) clients will have to import the following packages:

import (
	"github.com/go-zookeeper/zk"
	"github.com/Comcast/go-leaderelection"
)

Leader election clients must provide a ZK connection when creating an election instance. The rationale behind this is to prevent applications that participate in multiple elections from letting the library create multiple ZK connections behind the scenes. This allows the application to optimize the use of ZK connections.

Create a Resource

A resource is represented as a ZK node. During the design and implementation of the library an explicit decision was made to not provide the capability in the library to create election resources. This decision may be revisited in the future.

// Create the ZK connection
zkConn, _, err := zk.Connect([]string{zkURL}, heartBeat)
if err != nil {
    log.Printf("Error in zk.Connect (%s): %v",
        zkURL, err)
    return nil, err
}

// Create the election node in ZooKeeper
_, err = zkConn.Create(electionNode,
    []byte(""), 0, zk.WorldACL(zk.PermAll))
if err != nil {
    log.Printf("Error creating the election node (%s): %v",
        electionNode, err)
    return nil, err
}

candidate, err := leaderelection.NewElection(zkConn, electionNode)

Delete an Election

err := leaderelection.DeleteElection(zkConn, electionNode)

or

leaderelection.EndElection()

Request Leadership for an Election

Leadership election is inherently an asynchronous process. For candidates that win an election the inital response is fairly rapid. Even so, leaders can still receive errors from the election that will impact their leadership of a resource. The most common example is when the underlying election instance gets disconnected from Zookeeper. In the event of errors, leaders expected to resign their leadership. This is likely a futile process if the ZooKeeper connection is lost, but it's good practice to do this for any error.

Followers of an election can wait an indeterminate amount of time to become the leader of an election. If a follower becomes a leader they will be notified. In some cases the election may end before a follower can become a leader. Like leaders, followers can receive errors from the election. Like leaders, they should resign from the election in this case.

Finally, an election can be unilaterally ended by any actor in the application. As with errors, this results in a status notification to all candidates, including the leader. Candidates are expected to resign from the election in this case as well.

Given that leadership election is an asynchronous process clients should start the election as a goroutine as shown below. All events pertaining to the status of an election are communicated via channels. So the typical pattern is for a client to monitor the election status in a for/select loop as shown below.

// Each candidation should register interest in participating for an election
candidate, err := NewElection(zkConn, "/election", "myhostname")
...
// Each candidate should begin their participation in the
// by starting a goroutine.
go candidate.ElectLeader()
...

Monitor election and candidate's role in the election

var status Status
var ok bool

for {
    select {
    case status, ok = <-leaderElector.Status():
        if !ok {
            fmt.Println("\t\t\tChannel closed, election is terminated!!!")
            respCh <- ElectionResponse{false, status.CandidateID}
            leaderElector.Resign()
            wg.Done()
            return
        }
        if status.Err != nil {
            fmt.Println("Received election status error <<", status.Err, ">> for candidate <", leaderElector.candidateID, ">.")
            leaderElector.Resign()
            wg.Done()
            return
        }

        fmt.Println("Candidate received status message: <", status, ">.")
        if status.Role == Leader {
            doLeaderStuff(leaderElector, status, respCh, connFailCh, waitFor)
            wg.Done()
            return
        }
    case <-connFailCh:
        fmt.Println("\t\t\tZK connection failed for candidate <", status.CandidateID, ">, exiting")
        respCh <- ElectionResponse{false, status.CandidateID}
        leaderElector.Resign()
        wg.Done()
        return
    case <-time.After(time.Second * 100):
        fmt.Println("\t\t\tERROR!!! Timer expired, stop waiting to become leader for", status.CandidateID)
        leaderElector.Resign()
        respCh <- ElectionResponse{false, status.CandidateID}
        wg.Done()
        return
}

}

Resign from an Election

candidate.Resign()

Leadership change

See the Monitor election and candidate's role in the election above.

Query status

Candidates are always notified when an election's status changes. It is up to the client to cache this status if they need to reference it between status changes.

Prerequisites

  1. go-leaderelection uses github.com/go-zookeeper/zk.
  2. All tests require the availability of a Zookeeper installation. zkServer.sh must be in the path. Election_test.go requires that Zookeeper be running. The integration tests control Zookeeper so Zookeeper should not be running when executing the integration tests.

Testing the package has additional prerequisites:

  1. The integration tests leverage github.com/Comcast/goint. This package must be installed prior to executing integration tests.

Documentation

Overview

Package leaderelection implements a leader election protocol in Go using the Zookeeper leader election recipe. The main steps to implement a leader election as as follows:

Initialize an election

Before leader election can take place the election needs to be initialized. This involves providing a live connection to Zookeeper and an existing Zookeeper node (resource) that represents the top level of the election to the NewElection function.

election, err := NewElection(zkConn, "/election")

In the above example if zkConn is not valid or "/election" doesn't already exist an error will be returned.

Run an election

Running an election will register this election instance as interested in becoming leader for the election resource that was provided in the call to NewElection (e.g., "/election"). Multiple candidates may be registered for the same election resource and contend for leadership of the resource. One will be chosen as the leader and the rest will become followers. In the event that the leader exits prior to the election terminating one of the followers will become leader. Example:

go leaderElector.ElectLeader()

This registers the candidate that created the election instance as a potential leader for the election resource. As it is started as a goroutine the candidate is expected to monitor one of several election related channels for election events.

Events that can happen while an election is running

There are several channels that must be monitored during an election. The first is the channel returned by election.Status(). This channel is used to indicate to a follower that it has become leader. It is also used to signal the end of the election. This signaling occurs when the Election instance closes the Status() channel. When election end has been signaled via the closing of the Status() channel the client is expected to stop using the election. Errors may also be returned via the Status() channel. Errors generally indicate there has been a problem with the election. A network partition from Zookeeper is an example of an error that may occur. Errors are unrecoverable and mean the election is over for that candidate. The connFailCh highlights that the client owns the Zookeeper connection and is responsible for handling any errors associated with the Zookeeper connection.

for {
	select {
	case status, ok = <- leaderElector.Status():
		if !ok {
		fmt.Println("\t\t\tChannel closed, election is terminated!!!")
		leaderElector.Resign()
		return
		}
		if status.Err != nil {
			fmt.Println("Received election status error <<", status.Err, ">> for candidate <", leaderElector.candidateID, ">.")
			leaderElector.Resign()
			return
		}
		fmt.Println("Candidate received status message: <", status, ">.")
		if status.Role == Leader {
			doLeaderStuff(leaderElector, status, respCh, connFailCh, waitFor)
			leaderElection.EndElection() // Terminates the election and signals all followers the election is over.
			return
		}
	case <-connFailCh:
		fmt.Println("\t\t\tZK connection failed for candidate <", status.CandidateID, ">, exiting")
		respCh <- ElectionResponse{false, status.CandidateID}
		leaderElector.Resign()
		wg.Done()
		return

	case ... // Any other channels the client may need to monitor.
	...
	}
}

The for-ever loop indicates that the election continues until one of the halting conditions described above occurs. The "case <- connFailCh:" branch is used to monitor for Zookeeper connection problems. It is up to the client to decide what to do in this event.

On channel close or Error events the client is expected to resign from the election as shown in the above code snippet. When the leader is done with the work associated with the election it is expected to terminate the election by calling the EndElection method. This is required to properly clean up election resources including termination of any existing followers.

Index

Constants

View Source
const (
	// Follower indicates that this candidate is following another candidate/leader.
	Follower = iota
	// Leader indicates that this candidate is the leader for the election.
	Leader
)
View Source
const (
	// ElectionCompletedNotify indicates the election is over and no new candidates can be nominated.
	ElectionCompletedNotify = "Election is over, no new candidates allowed"
	// ElectionSelfDltNotify indicates that the referenced candidate was deleted. This can happen if the election is
	// ended (EndElection) or deleted (DeleteElection)
	ElectionSelfDltNotify = "Candidate has been deleted"
)

Variables

This section is empty.

Functions

func DeleteCandidates

func DeleteCandidates(zkConn *zk.Conn, electionName string, doneNodePath string) error

DeleteCandidates will remove all the candidates for the provided electionName.

func DeleteElection

func DeleteElection(zkConn *zk.Conn, electionResource string) error

DeleteElection removes the election resource passed to NewElection.

Types

type Election

type Election struct {
	ElectionResource string // This is the Zookeeper node that represents the overall election. Candidates
	// contains filtered or unexported fields
}

Election is a structure that represents a new instance of a Election. This instance can then be used to request leadership for a specific resource.

func NewElection

func NewElection(zkConn *zk.Conn, electionResource string, clientName string) (*Election, error)

NewElection initializes a new instance of an Election that can later be used to request leadership for a specific resource.

It accepts: zkConn - a connection to a running Zookeeper instance; electionResource - resource represents the thing for which the election is being held. For example, /election/president for an election for president, /election/senator for an election for senator, etc.

It will return either a non-nil Election instance and a nil error, or a nil Election and a non-nil error.

func (*Election) ElectLeader

func (le *Election) ElectLeader()

ElectLeader will make the caller a candidate for leadership and determine if the candidate becomes the leader.

ElectLeader returns: true if the candidate was elected leader, false otherwise. Candidates that aren't elected leader will be placed in the pool of possible leaders (aka followers). Candidates must explicitly resign if they don't want to be considered for future leadership.

func (*Election) EndElection

func (le *Election) EndElection()

EndElection is called by the Client (leader) to signal that any work it was doing as a result of the election has completed.

Ending an election results in all followers being notified that the election is over. Followers are expected to resign from the election and move on to whatever they do when not actively involved in an Election. Ending an election also results in the freeing of all resources associated with an election.

func (*Election) Resign

func (le *Election) Resign()

Resign results in the resignation of the associated candidate from the election.

Resign is called by the Client, either in a leader or follower role, to indicate that it is no longer interested in being a party to the election.

If the candidate is the leader, then a new leader election will be triggered assuming there are other candidates. If the candidate is not the leader then Resign merely results in the removal of the associated candidate from the set of possible leaders.

Resign returns nil if everything worked as expected. It will return an error if there was any problem that prevented the complete resignation of the candidate. In the event that an error is returned the client will need to perform any processing appropriate to the failure.

func (*Election) Status

func (le *Election) Status() <-chan Status

Status returns a channel that is used by the library to provide election status updates to clients.

func (*Election) String

func (le *Election) String() string

String is the Stringer implementation for this type..

type Role

type Role int

Role represents whether the associated candidate is a Leader or Follower

type Status

type Status struct {
	CandidateID string // CandidateID is the identifier assigned to the associated Candidate for this Election.
	Err         error  // Err is used to communicate the specific error associated with this Election.
	// nil means there is no Err.
	NowFollowing string // NowFollowing is the CandidateID of the candidate that this Candidate is following.
	// If the followed Candidate is the Leader, then this Candidate will become Leader
	// if the current Leader fails. It will be an empty string ("") if the associated
	// candidate is the Leader.
	Role         Role   // Role is used to indicate if the associated Candidate is the Leader or a Follower.
	WasFollowing string // WasFollowing is the CandidateID of the Candidate that this Candidate was following

}

Status is used to communicate the current state of the Election.

func (Status) String

func (status Status) String() string

String returns a formatted string representation of Status.

Directories

Path Synopsis
leader_crash_test
The goal of this mock worker is to simulate the behavior of a client who is the leader crashing and a follower client picking up the same work.
The goal of this mock worker is to simulate the behavior of a client who is the leader crashing and a follower client picking up the same work.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL