flashback

package module

v0.0.0-...-29326ed Latest Latest Go to latest Published: Dec 19, 2016 License: BSD-3-Clause Imports: 12 Imported by: 3

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ParsePlatform/flashback

Links

Open Source Insights

README ¶

What is Flashback

How can you measure how good your MongoDB (or other databases with similar interface) performance is? Easy, you can benchmark it. A general way to solve this problem is to use a benchmark tool to generate queries with random contents under certain random distribution.

But sometimes you are not satisfied with the randomly generated queries, since you're not confident in how much these queries resemble your real workload.

The difficulty compounds when one MongoDB instance may host completely different types of databases that each have their own unique and complicated access patterns.

That is the reason we came up with Flashback, a MongoDB benchmark framework that allows us to benchmark with "real" queries. It is comprised of a set of scripts that fall into the 2 categories:

records the operations(ops) that occur during a stretch of time;
replays the recorded ops.

The two parts are not tied to each other and can be used independently for different purposes.

How it works

Record

How do you know which ops are performed by MongoDB? There are a lot of ways to do this. But in Flashback, we record the ops by enabling MongoDB's profiling.

By setting the profile level to 2 (profile all ops), we'll be able to fetch the ops information detailed enough for future replay -- except for insert ops.

MongoDB does not log insertion details in the profile DB. However, if a MongoDB instance is working in a "replica set", we can capture insert information by reading the oplog.

Thus, we record the ops with the following steps:

The script starts multiple threads to pull the profiling results and oplog entries for collections and databases that we are interested in. Each thread works independently.
After fetching the entries, we'll merge the results from all sources to get a full picture of all operations.

Replay

With the ops being recorded, we also have a replayer to replay them in different ways:

Replay ops with "best effort". The replayer diligently sends these ops to databases as fast as possible. This style can help us to measure the limits of databases. Please note to reduce the overhead for loading ops, we'll preload the ops to the memory and replay them as fast as possible. This potentially limits the number of ops played back per session to the available memory on the Replay host.
Reply ops in accordance to their original timestamps, which allows us to imitate regular traffic.

The replay module is written in Go because Python doesn't do a good job in concurrent CPU intensive tasks.

How to use it

Record

Prerequisites

The "record" module is written in python. You'll need to have pymongo, mongodb's python driver installed.
Set MongoDB profiling level to be 2, which captures all the ops.
Run MongoDB in a replica set mode (even there is only one node), which allows us to access the oplog.

Configuration

If you are a first time user, please run cp config.py.example config.py.
In config.py, modify it based on your need. Here are some notes:
- We intentionally separate the servers for oplog retrieval and profiling results retrieval. As a good practice, it's better to pull oplog from secondaries. However profiling results must be pulled from the primary server.
- duration_secs indicates the length for the recording.

Start Recording

After configuration, please simply run python record.py.

Replay

Prerequisites

Go 1.4
PyMongo 2.9.x (earlier 2.x versions may work. 3.x does NOT currently work)

Installation

$ go get github.com/ParsePlatform/flashback/cmd/flashback

Command

Required options:

flashback \
    --style=[real|stress] \
    --ops_filename=<file_name> \ # Operations file, such as generated by the Record tool

To use a specific host/port and/or to use authentication, specify a mongodb:// url:

flashback \
    --url=mongodb://myuser:mypass@mongodb01.example.com:27017
    ...

For a full list of options:

flashback --help

Misc

pcap_converter

pcap_converter is an experimental way to build a recorded ops file from a pcap of mongo traffic.

Note: 'getmore' operations are not yet supported by pcap_converter

$ go get github.com/ParsePlatform/flashback/cmd/pcap_converter
$ tcpdump -i lo0 -w some_mongo_cap.pcap 'tcp and dst port 27017'
$ pcap_converter -f some_mongo_cap.pcap -o ops_filename.bson

Documentation ¶

Index ¶

Constants
Variables
func GetElem(doc bson.D, key string) (interface{}, bool)
func NewBestEffortOpsDispatcher(reader OpsReader, opsSize int, logger *Logger) chan *Op
func NewByTimeOpsDispatcher(reader OpsReader, opsSize int, logger *Logger, speedup float64) chan *Op
type ByLineOpsReader
- func NewByLineOpsReader(reader io.ReadCloser, logger *Logger, opFilter string) (error, *ByLineOpsReader)
- func NewFileByLineOpsReader(filename string, logger *Logger, opFilter string) (error, *ByLineOpsReader)
- func (r *ByLineOpsReader) AllLoaded() bool
- func (r *ByLineOpsReader) Close()
- func (r *ByLineOpsReader) Err() error
- func (r *ByLineOpsReader) Next() *Op
- func (r *ByLineOpsReader) OpsRead() int
- func (r *ByLineOpsReader) SetStartTime(startTime int64) (int64, error)
- func (r *ByLineOpsReader) SkipOps(numSkipOps int) error
type CyclicOpsReader
- func NewCyclicOpsReader(maker func() OpsReader, logger *Logger) *CyclicOpsReader
- func (c *CyclicOpsReader) AllLoaded() bool
- func (c *CyclicOpsReader) Close()
- func (c *CyclicOpsReader) Err() error
- func (c *CyclicOpsReader) Next() *Op
- func (c *CyclicOpsReader) OpsRead() int
- func (c *CyclicOpsReader) SetStartTime(startTime int64) (int64, error)
- func (c *CyclicOpsReader) SkipOps(numSkipOps int) error
type Document
type ExecutionStatus
type Logger
- func NewLogger(stdout string, stderr string) (logger *Logger, err error)
- func (l *Logger) Close()
- func (l *Logger) Error(v ...interface{})
- func (l *Logger) Errorf(format string, v ...interface{})
- func (l *Logger) Info(v ...interface{})
- func (l *Logger) Infof(format string, v ...interface{})
type Op
- func CanonicalizeOp(op *Op) *Op
type OpStat
type OpType
type OpsExecutor
- func NewOpsExecutor(session *mgo.Session, statsChan chan OpStat, logger *Logger) *OpsExecutor
- func (e *OpsExecutor) Execute(op *Op) error
- func (e *OpsExecutor) LastLatency() time.Duration
type OpsReader
type StatsAnalyzer
- func NewStatsAnalyzer(statsChan chan OpStat) *StatsAnalyzer
- func (s *StatsAnalyzer) GetStatus() *ExecutionStatus

Constants ¶

View Source

const (
	P50 = iota
	P70 = iota
	P90 = iota
	P95 = iota
	P99 = iota
)

Percentiles

Variables ¶

View Source

var AllOpTypes = []OpType{
	Insert,
	Update,
	Remove,
	Query,
	Count,
	FindAndModify,
	GetMore,
}

AllOpTypes specifies all supported op types

View Source

var (
	NotSupported = errors.New("op type not supported")
)

Functions ¶

func GetElem ¶

func GetElem(doc bson.D, key string) (interface{}, bool)

GetElem is a helper to fetch a specific key from bson.D The second return value indicates whether or not the key exists

func NewBestEffortOpsDispatcher ¶

func NewBestEffortOpsDispatcher(reader OpsReader, opsSize int, logger *Logger) chan *Op

func NewByTimeOpsDispatcher ¶

func NewByTimeOpsDispatcher(reader OpsReader, opsSize int, logger *Logger, speedup float64) chan *Op

Types ¶

type ByLineOpsReader ¶

type ByLineOpsReader struct {
	// contains filtered or unexported fields
}

ByLineOpsReader reads ops from a json file that is exported from python's json_util module, where each line is a json-represented op.

Note: After parse each json-represented op, we need perform post-process to convert some "metadata" into MongoDB specific data structures, like "Object Id" and datetime.

func NewByLineOpsReader ¶

func NewByLineOpsReader(reader io.ReadCloser, logger *Logger, opFilter string) (error, *ByLineOpsReader)

func NewFileByLineOpsReader ¶

func NewFileByLineOpsReader(filename string, logger *Logger, opFilter string) (error, *ByLineOpsReader)

func (*ByLineOpsReader) AllLoaded ¶

func (r *ByLineOpsReader) AllLoaded() bool

func (*ByLineOpsReader) Close ¶

func (r *ByLineOpsReader) Close()

func (*ByLineOpsReader) Err ¶

func (r *ByLineOpsReader) Err() error

func (*ByLineOpsReader) Next ¶

func (r *ByLineOpsReader) Next() *Op

func (*ByLineOpsReader) OpsRead ¶

func (r *ByLineOpsReader) OpsRead() int

func (*ByLineOpsReader) SetStartTime ¶

func (r *ByLineOpsReader) SetStartTime(startTime int64) (int64, error)

func (*ByLineOpsReader) SkipOps ¶

func (r *ByLineOpsReader) SkipOps(numSkipOps int) error

type CyclicOpsReader ¶

type CyclicOpsReader struct {
	// contains filtered or unexported fields
}

func NewCyclicOpsReader ¶

func NewCyclicOpsReader(maker func() OpsReader, logger *Logger) *CyclicOpsReader

func (*CyclicOpsReader) AllLoaded ¶

func (c *CyclicOpsReader) AllLoaded() bool

func (*CyclicOpsReader) Close ¶

func (c *CyclicOpsReader) Close()

func (*CyclicOpsReader) Err ¶

func (c *CyclicOpsReader) Err() error

func (*CyclicOpsReader) Next ¶

func (c *CyclicOpsReader) Next() *Op

func (*CyclicOpsReader) OpsRead ¶

func (c *CyclicOpsReader) OpsRead() int

func (*CyclicOpsReader) SetStartTime ¶

func (c *CyclicOpsReader) SetStartTime(startTime int64) (int64, error)

func (*CyclicOpsReader) SkipOps ¶

func (c *CyclicOpsReader) SkipOps(numSkipOps int) error

type Document ¶

type Document map[string]interface{}

Document represents the json-like infromation of an op

type ExecutionStatus ¶

type ExecutionStatus struct {
	OpsExecuted         int64
	IntervalOpsExecuted int64
	OpsErrors           int64
	IntervalOpsErrors   int64
	OpsPerSec           float64
	IntervalOpsPerSec   float64
	IntervalDuration    time.Duration
	Latencies           map[OpType][]float64
	IntervalLatencies   map[OpType][]float64
	MaxLatency          map[OpType]float64
	IntervalMaxLatency  map[OpType]float64
	Counts              map[OpType]int64
	IntervalCounts      map[OpType]int64
	TypeOpsSec          map[OpType]float64
	IntervalTypeOpsSec  map[OpType]float64
}

ExecutionStatus encapsulates the aggregated information for the execution

type Logger ¶

type Logger struct {
	// contains filtered or unexported fields
}

Logger provides a way to send different types of log messages to stderr/stdout

func NewLogger ¶

func NewLogger(stdout string, stderr string) (logger *Logger, err error)

NewLogger creates a new logger

func (*Logger) Close ¶

func (l *Logger) Close()

Close the underlying files

func (*Logger) Error ¶

func (l *Logger) Error(v ...interface{})

Error prints message to stderr

func (*Logger) Errorf ¶

func (l *Logger) Errorf(format string, v ...interface{})

Errorf prints message to stderr

func (*Logger) Info ¶

func (l *Logger) Info(v ...interface{})

Info prints message to stdout

func (*Logger) Infof ¶

func (l *Logger) Infof(format string, v ...interface{})

Infof prints message to stdout

type Op ¶

type Op struct {
	Ns         string    `bson:"ns"`
	Timestamp  time.Time `bson:"ts"`
	Type       OpType    `bson:"op"`
	NToSkip    int64     `bson:"ntoskip,omitempty"`
	NToReturn  int64     `bson:"ntoreturn,omitempty"`
	QueryDoc   bson.D    `bson:"query,omitempty"`
	CommandDoc bson.D    `bson:"command,omitempty"`
	InsertDoc  bson.D    `bson:"o,omitempty"`
	UpdateDoc  bson.D    `bson:"updateobj,omitempty"`
	Database   string    `bson:",omitempty"`
	Collection string    `bson:",omitempty"`
}

Op represents an op generated by the record utility It must (currently) be massaged a little before handing off to the executor

func CanonicalizeOp ¶

func CanonicalizeOp(op *Op) *Op

We only support handful op types. This function helps us to process supported ops in a universal way.

We do not canonicalize the ops in OpsReader because we hope ops reader to do its job honestly and the consumer of these ops decide how to further process the original ops.

type OpStat ¶

type OpStat struct {
	OpType  OpType
	Latency time.Duration
	OpError bool
}

type OpType ¶

type OpType string

OpType is the name of mongo op type

const (
	Insert        OpType = "insert"
	Update        OpType = "update"
	Remove        OpType = "remove"
	Query         OpType = "query"
	Command       OpType = "command"
	Count         OpType = "command.count"
	FindAndModify OpType = "command.findandmodify"
	GetMore       OpType = "getmore"
)

Contains a list of mongo op types

type OpsExecutor ¶

type OpsExecutor struct {
	// contains filtered or unexported fields
}

func NewOpsExecutor ¶

func NewOpsExecutor(session *mgo.Session, statsChan chan OpStat, logger *Logger) *OpsExecutor

func (*OpsExecutor) Execute ¶

func (e *OpsExecutor) Execute(op *Op) error

func (*OpsExecutor) LastLatency ¶

func (e *OpsExecutor) LastLatency() time.Duration

type OpsReader ¶

type OpsReader interface {
	// Move to next op and return it. Nil will be returned if the last ops had
	// already been read, or there is any error occurred.
	// TODO change from Document to Op
	Next() *Op

	// Allow skipping the first N ops in the source file
	SkipOps(int) error

	// Start at a specific time in the set of ops
	// Return an error if we get to EOF without finding an op
	// Can be used with SkipOps, but you should call SkipOps after SetStartTime
	SetStartTime(int64) (int64, error)

	// How many ops are read so far
	OpsRead() int

	// Have all the ops been read?
	AllLoaded() bool

	// indicate the latest error occurs when reading ops.
	Err() error

	Close()
}

OpsReader Reads the ops from a source and present a interface for consumers to fetch these ops sequentially.

type StatsAnalyzer ¶

type StatsAnalyzer struct {
	// contains filtered or unexported fields
}

func NewStatsAnalyzer ¶

func NewStatsAnalyzer(statsChan chan OpStat) *StatsAnalyzer

func (*StatsAnalyzer) GetStatus ¶

func (s *StatsAnalyzer) GetStatus() *ExecutionStatus

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
flashback
pcap_converter Simple program which accepts a pcap file and prints a flashback-compatible ops stream	Simple program which accepts a pcap file and prints a flashback-compatible ops stream

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL