Documentation ¶
Overview ¶
Package lunk provides a set of tools for structured logging in the style of Google's Dapper or Twitter's Zipkin.
The Context Of Structured Logging ¶
When we consider a complex event in a distributed system, we're actually considering a partially-ordered tree of events from various services, libraries, and modules.
Consider a user-initiated web request. Their browser sends an HTTP request to an edge server, which extracts the credentials (e.g., OAuth token) and authenticates the request by communicating with an internal authentication service, which returns a signed set of internal credentials (e.g., signed user ID). The edge web server then proxies the request to a cluster of web servers, each running a PHP application. The PHP application loads some data from several databases, places the user in a number of treatment groups for running A/B experiments, writes some data to a Dynamo-style distributed database, and returns an HTML response. The edge server receives this response and proxies it to the user's browser.
In this scenario we have a number of infrastructure-specific events:
- The edge server handled a request, which took 142ms and whose response had a status of "200 OK".
- The edge server sent a request to the authentication service, which took 5ms to handle and identified the principal as user 14002.
- The authentication service handled a request, which took 4ms to handle and was served entirely from memory.
- The edge server proxied a request to the app cluster, which took 132ms and whose response had a status of "200 OK".
- The app load balancer handled a request, which took 131ms and whose response had a status of "200 OK".
- The app load balancer proxied a request to the app, which took 130ms and was handled by app server 10.
- The app handled a request, which took 129ms, and was handled by PhotoController.
- The app sent a query to database A, which took 1ms.
- The app sent a query to database B, which took 53ms.
- The app rendered template "photo.tpl", which took 4ms.
- The app wrote a value to the distributed database, which took 10ms.
- The distributed database handled the write locally on one node, and proxied it to two others, which took 9ms.
- Those distributed database nodes concurrently handled the write locally, which took 4ms and 8ms.
This scenario also involves a number of events which have little to do with the infrastructure, but are still critical information for the business the system supports:
- The app gave the user the control treatment for experiment 15 ("Really Big Buttons v2").
- The app gave the user the experimental treatment for experiment 54 ("More Yelling v1").
- User 14002 viewed photo 1819 ("rude-puppy.gif").
There are a number of different teams all trying to monitor and improve aspects of this system. Operational staff need to know if a particular host or service is experiencing a latency spike or drop in throughput. Development staff need to know if their application's response times have gone down as a result of a recent deploy. Customer support staff need to know if the system is operating nominally as a whole, and for customers in particular. Product designers and managers need to know the effect of an A/B test on user behavior. But the fact that these teams will be consuming the data in different ways for different purposes does mean that they are working on different systems.
In order to instrument the various components of the system, we need a common data model.
Trees Of Events ¶
We adopt Dapper's notion of a tree to mean a partially-ordered tree of events from a distributed system. A tree in Lunk is identified by its root ID, which is the unique ID of its root event. All events in a common tree share a root ID. In our photo example, we would assign a unique root ID as soon as the edge server received the request.
Events inside a tree are causally ordered: each event has a unique ID, and an optional parent ID. By passing the IDs across systems, we establish causal ordering between events. In our photo example, the two database queries from the app would share the same parent ID--the ID of the event corresponding to the app handling the request which caused those queries.
Each event has a schema of properties, which allow us to record specific pieces of information about each event. For HTTP requests, we can record the method, the request URI, the elapsed time to handle the request, etc.
Event Aggregation ¶
Lunk is agnostic in terms of aggregation technologies, but two use cases seem clear: real-time process monitoring and offline causational analysis.
For real-time process monitoring, events can be streamed to a aggregation service like Riemann (http://riemann.io) or Storm (http://storm.incubator.apache.org), which can calculate process statistics (e.g., the 95th percentile latency for the edge server responses) in real-time. This allows for adaptive monitoring of all services, with the option of including example root IDs in the alerts (e.g., 95th percentile latency is over 300ms, mostly as a result of requests like those in tree XXXXX).
For offline causational analysis, events can be written in batches to batch processing systems like Hadoop or OLAP databases like Vertica. These aggregates can be queried to answer questions traditionally reserved for A/B testing systems. "Did users who were show the new navbar view more photos?" "Did the new image optimization algorithm we enabled for 1% of views run faster? Did it produce smaller images? Did it have any effect on user engagement?" "Did any services have increased exception rates after any recent deploys?" &tc &tc
Observing Specific Events ¶
By capturing the root ID of a particular web request, we can assemble a partially-ordered tree of events which were involved in the handling of that request. All events with a common root ID are in a common tree, which allows for O(M) retrieval for a tree of M events.
Sending And Receiving HTTP Requests ¶
To send a request with a root ID and a parent ID, use the Event-ID HTTP header:
GET /woo HTTP/1.1 Accept: application/json Event-ID: d6cb1d852bbf32b6/6eeee64a8ef56225
The header value is simply the root ID and event ID, hex-encoded and separated with a slash. If the event has a parent ID, that may be included as an optional third parameter. A server that receives a request with this header can use this to properly parent its own events.
Event Properties ¶
Each event has a set of named properties, the keys and values of which are strings. This allows aggregation layers to take advantage of simplifying assumptions and either store events in normalized form (with event data separate from property data) or in denormalized form (essentially pre-materializing an outer join of the normalized relations). Durations are always recorded as fractional milliseconds.
Log Formats ¶
Lunk currently provides two formats for log entries: text and JSON. Text-based logs encode each entry as a single line of text, using key="value" formatting for all properties. Event property keys are scoped to avoid collisions. JSON logs encode each entry as a single JSON object.
Example ¶
l := NewJSONEventLogger(os.Stdout) rootID := NewRootEventID() l.Log(rootID, Message("root action")) subID := NewEventID(rootID) l.Log(subID, Message("sub action")) leafID := NewEventID(subID) l.Log(leafID, Message("leaf action")) // Produces something like this: // { // "properties": { // "msg": "root action" // }, // "pid": 44345, // "host": "server1.example.com", // "time": "2014-04-28T13:58:32.201883418-07:00", // "id": "09c84ee90e7d9b74", // "root": "ca2e3c0fdfcf3f5e", // "schema": "message" // } // { // "properties": { // "msg": "sub action" // }, // "pid": 44345, // "host": "server1.example.com", // "time": "2014-04-28T13:58:32.202241745-07:00", // "parent": "09c84ee90e7d9b74", // "id": "794f8bde67a7f1a7", // "root": "ca2e3c0fdfcf3f5e", // "schema": "message" // } // { // "properties": { // "msg": "leaf action" // }, // "pid": 44345, // "host": "server1.example.com", // "time": "2014-04-28T13:58:32.202257354-07:00", // "parent": "794f8bde67a7f1a7", // "id": "33cff19e8bfb7cef", // "root": "ca2e3c0fdfcf3f5e", // "schema": "message" // }
Output:
Index ¶
- Constants
- Variables
- type Entry
- type EntryRecorder
- type Event
- type EventID
- type EventLogger
- type ID
- type SamplingEventLogger
- func (l SamplingEventLogger) Log(id EventID, e Event)
- func (l SamplingEventLogger) SetRootSampleRate(root ID, p float64)
- func (l SamplingEventLogger) SetSchemaSampleRate(schema string, p float64)
- func (l SamplingEventLogger) UnsetRootSampleRate(root ID)
- func (l SamplingEventLogger) UnsetSchemaSampleRate(schema string)
Examples ¶
Constants ¶
const ( // EventIDDelimiter is the delimiter used to concatenate an EventID's // components. EventIDDelimiter = "/" )
Variables ¶
var ( // NormalizedEventHeaders are the set of headers used for storing events in // normalized CSV files. NormalizedEventHeaders = []string{ "root", "id", "parent", "schema", "time", "host", "pid", "deploy", } // NormalizedPropertyHeaders are the set of headers used for storing // properties in normalized CSV files. NormalizedPropertyHeaders = []string{ "root", "id", "parent", "prop_name", "prop_value", } // DenormalizedEventHeaders are the set of headers used for storing events // in denormalized CSV files. DenormalizedEventHeaders = []string{ "root", "id", "parent", "schema", "time", "host", "pid", "deploy", "prop_name", "prop_value", } )
var ( // ErrBadEventID is returned when the event ID cannot be parsed. ErrBadEventID = errors.New("bad event ID") )
Functions ¶
This section is empty.
Types ¶
type Entry ¶
type Entry struct { EventID // Schema is the schema of the event. Schema string `json:"schema"` // Time is the timestamp of the event. Time time.Time `json:"time"` // Host is the name of the host on which the event occurred. Host string `json:"host,omitempty"` // Deploy is the ID of the deployed artifact, read from the DEPLOY // environment variable on startup. Deploy string `json:"deploy,omitempty"` // PID is the process ID which generated the event. PID int `json:"pid"` // Properties are the flattened event properties. Properties map[string]string `json:"properties"` }
An Entry is the combination of an event and its metadata.
type EntryRecorder ¶
An EntryRecorder records entries, e.g. to a streaming processing system or an OLAP database.
func NewDenormalizedCSVEntryRecorder ¶
func NewDenormalizedCSVEntryRecorder(w *csv.Writer) EntryRecorder
NewDenormalizedCSVEntryRecorder returns an EntryRecorder which writes events and their properties to a single CSV file, duplicating event data when necessary.
func NewNormalizedCSVEntryRecorder ¶
func NewNormalizedCSVEntryRecorder(events, props *csv.Writer) EntryRecorder
NewNormalizedCSVEntryRecorder returns an EntryRecorder which writes events to one CSV file and properties to another.
type Event ¶
type Event interface { // Schema returns the schema of the event. This should be constant. Schema() string }
An Event is a record of the occurrence of something.
type EventID ¶
type EventID struct { // Root is the root ID of the tree which contains all of the events related // to this one. Root ID `json:"root"` // ID is an ID uniquely identifying the event. ID ID `json:"id"` // Parent is the ID of the parent event, if any. Parent ID `json:"parent,omitempty"` }
EventID is the ID of an event, its parent event, and its root event.
func NewEventID ¶
NewEventID returns a new ID for an event which is the child of the given parent ID. This should be used to track causal relationships between events.
func NewRootEventID ¶
func NewRootEventID() EventID
NewRootEventID generates a new event ID for a root event. This should only be used to generate entries for events caused exclusively by events which are outside of your system as a whole (e.g., a root event for the first time you see a user request).
func ParseEventID ¶
ParseEventID parses the given string as a slash-separated set of parameters.
func (EventID) Format ¶
Format formats according to a format specifier and returns the resulting string. The receiver's string representation is the first argument.
Example ¶
// Assume we're connected to a database. var ( event EventID db *sql.DB userID int ) // This passes the root ID and the parent event ID to the database, which // allows us to correlate, for example, slow queries with the web requests // which caused them. query := event.Format(`/* %s/%s */ %s`, `SELECT email FROM users WHERE id = ?`) r := db.QueryRow(query, userID) if r == nil { panic("user not found") } var email string if err := r.Scan(&email); err != nil { panic("couldn't read email") } fmt.Printf("User's email: %s\n", email)
Output:
type EventLogger ¶
type EventLogger interface { // Log adds the given event to the log stream. Log(id EventID, e Event) }
An EventLogger logs events and their metadata.
func NewJSONEventLogger ¶
func NewJSONEventLogger(w io.Writer) EventLogger
NewJSONEventLogger returns an EventLogger which writes entries as streaming JSON to the given writer.
func NewTextEventLogger ¶
func NewTextEventLogger(w io.Writer) EventLogger
NewTextEventLogger returns an EventLogger which writes entries as single lines of attr="value" formatted text.
type ID ¶
type ID uint64
An ID is a unique, uniformly distributed 64-bit ID.
func (ID) MarshalJSON ¶
MarshalJSON encodes the ID as a hex string.
func (*ID) UnmarshalJSON ¶
UnmarshalJSON decodes the given data as either a hexadecimal string or JSON integer.
type SamplingEventLogger ¶
type SamplingEventLogger struct {
// contains filtered or unexported fields
}
A SamplingEventLogger logs a uniform sampling of events, if configured to do so.
func NewSamplingEventLogger ¶
func NewSamplingEventLogger(l EventLogger) *SamplingEventLogger
NewSamplingEventLogger returns a new SamplingEventLogger, passing events through to the given EventLogger.
func (SamplingEventLogger) Log ¶
func (l SamplingEventLogger) Log(id EventID, e Event)
Log passes the event to the underlying EventLogger, probabilistically dropping some events.
func (SamplingEventLogger) SetRootSampleRate ¶
func (l SamplingEventLogger) SetRootSampleRate(root ID, p float64)
SetRootSampleRate sets the sampling rate for all events with the given root ID. p should be between 0.0 (no events logged) and 1.0 (all events logged), inclusive.
func (SamplingEventLogger) SetSchemaSampleRate ¶
func (l SamplingEventLogger) SetSchemaSampleRate(schema string, p float64)
SetSchemaSampleRate sets the sampling rate for all events with the given schema. p should be between 0.0 (no events logged) and 1.0 (all events logged), inclusive.
func (SamplingEventLogger) UnsetRootSampleRate ¶
func (l SamplingEventLogger) UnsetRootSampleRate(root ID)
UnsetRootSampleRate removes any settings for events with the given root ID.
func (SamplingEventLogger) UnsetSchemaSampleRate ¶
func (l SamplingEventLogger) UnsetSchemaSampleRate(schema string)
UnsetSchemaSampleRate removes any settings for events with the given root ID.