lunk

package module
v0.0.0-...-3bb9f3b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 20, 2014 License: MIT Imports: 17 Imported by: 0

Documentation

Overview

Package lunk provides a set of tools for structured logging in the style of Google's Dapper or Twitter's Zipkin.

The Context Of Structured Logging

When we consider a complex event in a distributed system, we're actually considering a partially-ordered tree of events from various services, libraries, and modules.

Consider a user-initiated web request. Their browser sends an HTTP request to an edge server, which extracts the credentials (e.g., OAuth token) and authenticates the request by communicating with an internal authentication service, which returns a signed set of internal credentials (e.g., signed user ID). The edge web server then proxies the request to a cluster of web servers, each running a PHP application. The PHP application loads some data from several databases, places the user in a number of treatment groups for running A/B experiments, writes some data to a Dynamo-style distributed database, and returns an HTML response. The edge server receives this response and proxies it to the user's browser.

In this scenario we have a number of infrastructure-specific events:

  1. The edge server handled a request, which took 142ms and whose response had a status of "200 OK".
  2. The edge server sent a request to the authentication service, which took 5ms to handle and identified the principal as user 14002.
  3. The authentication service handled a request, which took 4ms to handle and was served entirely from memory.
  4. The edge server proxied a request to the app cluster, which took 132ms and whose response had a status of "200 OK".
  5. The app load balancer handled a request, which took 131ms and whose response had a status of "200 OK".
  6. The app load balancer proxied a request to the app, which took 130ms and was handled by app server 10.
  7. The app handled a request, which took 129ms, and was handled by PhotoController.
  8. The app sent a query to database A, which took 1ms.
  9. The app sent a query to database B, which took 53ms.
  10. The app rendered template "photo.tpl", which took 4ms.
  11. The app wrote a value to the distributed database, which took 10ms.
  12. The distributed database handled the write locally on one node, and proxied it to two others, which took 9ms.
  13. Those distributed database nodes concurrently handled the write locally, which took 4ms and 8ms.

This scenario also involves a number of events which have little to do with the infrastructure, but are still critical information for the business the system supports:

  1. The app gave the user the control treatment for experiment 15 ("Really Big Buttons v2").
  2. The app gave the user the experimental treatment for experiment 54 ("More Yelling v1").
  3. User 14002 viewed photo 1819 ("rude-puppy.gif").

There are a number of different teams all trying to monitor and improve aspects of this system. Operational staff need to know if a particular host or service is experiencing a latency spike or drop in throughput. Development staff need to know if their application's response times have gone down as a result of a recent deploy. Customer support staff need to know if the system is operating nominally as a whole, and for customers in particular. Product designers and managers need to know the effect of an A/B test on user behavior. But the fact that these teams will be consuming the data in different ways for different purposes does mean that they are working on different systems.

In order to instrument the various components of the system, we need a common data model.

Trees Of Events

We adopt Dapper's notion of a tree to mean a partially-ordered tree of events from a distributed system. A tree in Lunk is identified by its root ID, which is the unique ID of its root event. All events in a common tree share a root ID. In our photo example, we would assign a unique root ID as soon as the edge server received the request.

Events inside a tree are causally ordered: each event has a unique ID, and an optional parent ID. By passing the IDs across systems, we establish causal ordering between events. In our photo example, the two database queries from the app would share the same parent ID--the ID of the event corresponding to the app handling the request which caused those queries.

Each event has a schema of properties, which allow us to record specific pieces of information about each event. For HTTP requests, we can record the method, the request URI, the elapsed time to handle the request, etc.

Event Aggregation

Lunk is agnostic in terms of aggregation technologies, but two use cases seem clear: real-time process monitoring and offline causational analysis.

For real-time process monitoring, events can be streamed to a aggregation service like Riemann (http://riemann.io) or Storm (http://storm.incubator.apache.org), which can calculate process statistics (e.g., the 95th percentile latency for the edge server responses) in real-time. This allows for adaptive monitoring of all services, with the option of including example root IDs in the alerts (e.g., 95th percentile latency is over 300ms, mostly as a result of requests like those in tree XXXXX).

For offline causational analysis, events can be written in batches to batch processing systems like Hadoop or OLAP databases like Vertica. These aggregates can be queried to answer questions traditionally reserved for A/B testing systems. "Did users who were show the new navbar view more photos?" "Did the new image optimization algorithm we enabled for 1% of views run faster? Did it produce smaller images? Did it have any effect on user engagement?" "Did any services have increased exception rates after any recent deploys?" &tc &tc

Observing Specific Events

By capturing the root ID of a particular web request, we can assemble a partially-ordered tree of events which were involved in the handling of that request. All events with a common root ID are in a common tree, which allows for O(M) retrieval for a tree of M events.

Sending And Receiving HTTP Requests

To send a request with a root ID and a parent ID, use the Event-ID HTTP header:

GET /woo HTTP/1.1
Accept: application/json
Event-ID: d6cb1d852bbf32b6/6eeee64a8ef56225

The header value is simply the root ID and event ID, hex-encoded and separated with a slash. If the event has a parent ID, that may be included as an optional third parameter. A server that receives a request with this header can use this to properly parent its own events.

Event Properties

Each event has a set of named properties, the keys and values of which are strings. This allows aggregation layers to take advantage of simplifying assumptions and either store events in normalized form (with event data separate from property data) or in denormalized form (essentially pre-materializing an outer join of the normalized relations). Durations are always recorded as fractional milliseconds.

Log Formats

Lunk currently provides two formats for log entries: text and JSON. Text-based logs encode each entry as a single line of text, using key="value" formatting for all properties. Event property keys are scoped to avoid collisions. JSON logs encode each entry as a single JSON object.

Example
l := NewJSONEventLogger(os.Stdout)

rootID := NewRootEventID()
l.Log(rootID, Message("root action"))

subID := NewEventID(rootID)
l.Log(subID, Message("sub action"))

leafID := NewEventID(subID)
l.Log(leafID, Message("leaf action"))

// Produces something like this:
// {
//     "properties": {
//         "msg": "root action"
//     },
//     "pid": 44345,
//     "host": "server1.example.com",
//     "time": "2014-04-28T13:58:32.201883418-07:00",
//     "id": "09c84ee90e7d9b74",
//     "root": "ca2e3c0fdfcf3f5e",
//     "schema": "message"
// }
// {
//     "properties": {
//         "msg": "sub action"
//     },
//     "pid": 44345,
//     "host": "server1.example.com",
//     "time": "2014-04-28T13:58:32.202241745-07:00",
//     "parent": "09c84ee90e7d9b74",
//     "id": "794f8bde67a7f1a7",
//     "root": "ca2e3c0fdfcf3f5e",
//     "schema": "message"
// }
// {
//     "properties": {
//         "msg": "leaf action"
//     },
//     "pid": 44345,
//     "host": "server1.example.com",
//     "time": "2014-04-28T13:58:32.202257354-07:00",
//     "parent": "794f8bde67a7f1a7",
//     "id": "33cff19e8bfb7cef",
//     "root": "ca2e3c0fdfcf3f5e",
//     "schema": "message"
// }
Output:

Index

Examples

Constants

View Source
const (
	// EventIDDelimiter is the delimiter used to concatenate an EventID's
	// components.
	EventIDDelimiter = "/"
)

Variables

View Source
var (
	// NormalizedEventHeaders are the set of headers used for storing events in
	// normalized CSV files.
	NormalizedEventHeaders = []string{
		"root",
		"id",
		"parent",
		"schema",
		"time",
		"host",
		"pid",
		"deploy",
	}

	// NormalizedPropertyHeaders are the set of headers used for storing
	// properties in normalized CSV files.
	NormalizedPropertyHeaders = []string{
		"root",
		"id",
		"parent",
		"prop_name",
		"prop_value",
	}

	// DenormalizedEventHeaders are the set of headers used for storing events
	// in denormalized CSV files.
	DenormalizedEventHeaders = []string{
		"root",
		"id",
		"parent",
		"schema",
		"time",
		"host",
		"pid",
		"deploy",
		"prop_name",
		"prop_value",
	}
)
View Source
var (
	// ErrBadEventID is returned when the event ID cannot be parsed.
	ErrBadEventID = errors.New("bad event ID")
)

Functions

This section is empty.

Types

type Entry

type Entry struct {
	EventID

	// Schema is the schema of the event.
	Schema string `json:"schema"`

	// Time is the timestamp of the event.
	Time time.Time `json:"time"`

	// Host is the name of the host on which the event occurred.
	Host string `json:"host,omitempty"`

	// Deploy is the ID of the deployed artifact, read from the DEPLOY
	// environment variable on startup.
	Deploy string `json:"deploy,omitempty"`

	// PID is the process ID which generated the event.
	PID int `json:"pid"`

	// Properties are the flattened event properties.
	Properties map[string]string `json:"properties"`
}

An Entry is the combination of an event and its metadata.

func NewEntry

func NewEntry(id EventID, e Event) Entry

NewEntry creates a new entry for the given ID and event.

type EntryRecorder

type EntryRecorder interface {
	Record(Entry) error
}

An EntryRecorder records entries, e.g. to a streaming processing system or an OLAP database.

func NewDenormalizedCSVEntryRecorder

func NewDenormalizedCSVEntryRecorder(w *csv.Writer) EntryRecorder

NewDenormalizedCSVEntryRecorder returns an EntryRecorder which writes events and their properties to a single CSV file, duplicating event data when necessary.

func NewNormalizedCSVEntryRecorder

func NewNormalizedCSVEntryRecorder(events, props *csv.Writer) EntryRecorder

NewNormalizedCSVEntryRecorder returns an EntryRecorder which writes events to one CSV file and properties to another.

type Event

type Event interface {
	// Schema returns the schema of the event. This should be constant.
	Schema() string
}

An Event is a record of the occurrence of something.

func Message

func Message(msg string) Event

Message returns an Event which contains only a human-readable message.

type EventID

type EventID struct {
	// Root is the root ID of the tree which contains all of the events related
	// to this one.
	Root ID `json:"root"`

	// ID is an ID uniquely identifying the event.
	ID ID `json:"id"`

	// Parent is the ID of the parent event, if any.
	Parent ID `json:"parent,omitempty"`
}

EventID is the ID of an event, its parent event, and its root event.

func NewEventID

func NewEventID(parent EventID) EventID

NewEventID returns a new ID for an event which is the child of the given parent ID. This should be used to track causal relationships between events.

func NewRootEventID

func NewRootEventID() EventID

NewRootEventID generates a new event ID for a root event. This should only be used to generate entries for events caused exclusively by events which are outside of your system as a whole (e.g., a root event for the first time you see a user request).

func ParseEventID

func ParseEventID(s string) (*EventID, error)

ParseEventID parses the given string as a slash-separated set of parameters.

func (EventID) Format

func (id EventID) Format(s string, args ...interface{}) string

Format formats according to a format specifier and returns the resulting string. The receiver's string representation is the first argument.

Example
// Assume we're connected to a database.
var (
	event  EventID
	db     *sql.DB
	userID int
)

// This passes the root ID and the parent event ID to the database, which
// allows us to correlate, for example, slow queries with the web requests
// which caused them.
query := event.Format(`/* %s/%s */ %s`, `SELECT email FROM users WHERE id = ?`)
r := db.QueryRow(query, userID)
if r == nil {
	panic("user not found")
}

var email string
if err := r.Scan(&email); err != nil {
	panic("couldn't read email")
}

fmt.Printf("User's email: %s\n", email)
Output:

func (EventID) String

func (id EventID) String() string

String returns the EventID as a slash-separated, set of hex-encoded parameters (root, ID, parent). If the EventID has no parent, that value is elided.

type EventLogger

type EventLogger interface {
	// Log adds the given event to the log stream.
	Log(id EventID, e Event)
}

An EventLogger logs events and their metadata.

func NewJSONEventLogger

func NewJSONEventLogger(w io.Writer) EventLogger

NewJSONEventLogger returns an EventLogger which writes entries as streaming JSON to the given writer.

func NewTextEventLogger

func NewTextEventLogger(w io.Writer) EventLogger

NewTextEventLogger returns an EventLogger which writes entries as single lines of attr="value" formatted text.

type ID

type ID uint64

An ID is a unique, uniformly distributed 64-bit ID.

func ParseID

func ParseID(s string) (ID, error)

ParseID parses the given string as a hexadecimal string.

func (ID) MarshalJSON

func (id ID) MarshalJSON() ([]byte, error)

MarshalJSON encodes the ID as a hex string.

func (ID) String

func (id ID) String() string

String returns the ID as a hex string.

func (*ID) UnmarshalJSON

func (id *ID) UnmarshalJSON(data []byte) error

UnmarshalJSON decodes the given data as either a hexadecimal string or JSON integer.

type SamplingEventLogger

type SamplingEventLogger struct {
	// contains filtered or unexported fields
}

A SamplingEventLogger logs a uniform sampling of events, if configured to do so.

func NewSamplingEventLogger

func NewSamplingEventLogger(l EventLogger) *SamplingEventLogger

NewSamplingEventLogger returns a new SamplingEventLogger, passing events through to the given EventLogger.

func (SamplingEventLogger) Log

func (l SamplingEventLogger) Log(id EventID, e Event)

Log passes the event to the underlying EventLogger, probabilistically dropping some events.

func (SamplingEventLogger) SetRootSampleRate

func (l SamplingEventLogger) SetRootSampleRate(root ID, p float64)

SetRootSampleRate sets the sampling rate for all events with the given root ID. p should be between 0.0 (no events logged) and 1.0 (all events logged), inclusive.

func (SamplingEventLogger) SetSchemaSampleRate

func (l SamplingEventLogger) SetSchemaSampleRate(schema string, p float64)

SetSchemaSampleRate sets the sampling rate for all events with the given schema. p should be between 0.0 (no events logged) and 1.0 (all events logged), inclusive.

func (SamplingEventLogger) UnsetRootSampleRate

func (l SamplingEventLogger) UnsetRootSampleRate(root ID)

UnsetRootSampleRate removes any settings for events with the given root ID.

func (SamplingEventLogger) UnsetSchemaSampleRate

func (l SamplingEventLogger) UnsetSchemaSampleRate(schema string)

UnsetSchemaSampleRate removes any settings for events with the given root ID.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL