cockroach: github.com/cockroachdb/cockroach/pkg/workload Index | Files | Directories

package workload

import "github.com/cockroachdb/cockroach/pkg/workload"

Package workload provides an abstraction for generators of sql query loads (and requisite initial data) as well as tools for working with these generators.

Index

Package Files

connection.go csv.go driver.go pgx_helpers.go sql_runner.go stats.go workload.go

Constants

const AutoStatsName = "__auto__"

AutoStatsName is copied from stats.AutoStatsName to avoid pulling in a dependency on sql/stats.

func ApproxDatumSize Uses

func ApproxDatumSize(x interface{}) int64

ApproxDatumSize returns the canonical size of a datum as returned from a call to `Table.InitialRowFn`. NB: These datums end up getting serialized in different ways, which means there's no one size that will be correct for all of them.

func CSVMux Uses

func CSVMux(metas []Meta) *http.ServeMux

CSVMux returns a mux over http handers for csv data in all tables in the given generators.

func ColBatchToRows Uses

func ColBatchToRows(cb coldata.Batch) [][]interface{}

ColBatchToRows materializes the columnar data in a coldata.Batch into rows.

func DistinctCount Uses

func DistinctCount(rowCount, maxDistinctCount uint64) uint64

DistinctCount returns the expected number of distinct values in a column with rowCount rows, given that the values are chosen from maxDistinctCount possible values using uniform random sampling with replacement.

func HandleCSV Uses

func HandleCSV(w http.ResponseWriter, req *http.Request, prefix string, meta Meta) error

HandleCSV configures a Generator with url params and outputs the data for a single Table as a CSV (optionally limiting the rows via `row-start` and `row-end` params). It is intended for use in implementing a `net/http.Handler`.

func NewCSVRowsReader Uses

func NewCSVRowsReader(t Table, batchStart, batchEnd int) io.Reader

NewCSVRowsReader returns an io.Reader that outputs the initial data of the given table as CSVs. If batchEnd is the zero-value it defaults to the end of the table.

func Register Uses

func Register(m Meta)

Register is a hook for init-time registration of Generator implementations. This allows only the necessary generators to be compiled into a given binary.

func SanitizeUrls Uses

func SanitizeUrls(gen Generator, dbOverride string, urls []string) (string, error)

SanitizeUrls verifies that the give SQL connection strings have the correct SQL database set, rewriting them in place if necessary. This database name is returned.

func WriteCSVRows Uses

func WriteCSVRows(
    ctx context.Context, w io.Writer, table Table, rowStart, rowEnd int, sizeBytesLimit int64,
) (rowBatchIdx int, err error)

WriteCSVRows writes the specified table rows as a csv. If sizeBytesLimit is > 0, it will be used as an approximate upper bound for how much to write. The next rowStart is returned (so last row written + 1).

type BatchedTuples Uses

type BatchedTuples struct {
    // NumBatches is the number of batches of tuples.
    NumBatches int
    // FillBatch is a function to deterministically compute a columnar-batch of
    // tuples given its index.
    //
    // To save allocations, the Vecs in the passed Batch are reused when possible,
    // so the results of this call are invalidated the next time the same Batch is
    // passed to FillBatch. Ditto the ByteAllocator, which can be reset in between
    // calls. If a caller needs the Batch and its contents to be long lived,
    // simply pass a new Batch to each call and don't reset the ByteAllocator.
    FillBatch func(int, coldata.Batch, *bufalloc.ByteAllocator)
}

BatchedTuples is a generic generator of tuples (SQL rows, PKs to split at, etc). Tuples are generated in batches of arbitrary size. Each batch has an index in `[0,NumBatches)` and a batch can be generated given only its index.

func Tuples Uses

func Tuples(count int, fn func(int) []interface{}) BatchedTuples

Tuples is like TypedTuples except that it tries to guess the type of each datum. However, if the function ever returns nil for one of the datums, you need to use TypedTuples instead and specify the coltypes.

func TypedTuples Uses

func TypedTuples(count int, colTypes []coltypes.T, fn func(int) []interface{}) BatchedTuples

TypedTuples returns a BatchedTuples where each batch has size 1. It's intended to be easier to use than directly specifying a BatchedTuples, but the tradeoff is some bit of performance. If colTypes is nil, an attempt is made to infer them.

func (BatchedTuples) BatchRows Uses

func (b BatchedTuples) BatchRows(batchIdx int) [][]interface{}

BatchRows is a function to deterministically compute a row-batch of tuples given its index. BatchRows doesn't attempt any reuse and so is allocation heavy. In performance-critical code, FillBatch should be used directly, instead.

type ConnFlags Uses

type ConnFlags struct {
    *pflag.FlagSet
    DBOverride  string
    Concurrency int
    // Method for issuing queries; see SQLRunner.
    Method string
}

ConnFlags is helper of common flags that are relevant to QueryLoads.

func NewConnFlags Uses

func NewConnFlags(genFlags *Flags) *ConnFlags

NewConnFlags returns an initialized ConnFlags.

type FlagMeta Uses

type FlagMeta struct {
    // RuntimeOnly may be set to true only if the corresponding flag has no
    // impact on the behavior of any Tables in this workload.
    RuntimeOnly bool
    // CheckConsistencyOnly is expected to be true only if the corresponding
    // flag only has an effect on the CheckConsistency hook.
    CheckConsistencyOnly bool
}

FlagMeta is metadata about a workload flag.

type Flags Uses

type Flags struct {
    *pflag.FlagSet
    // Meta is keyed by flag name and may be nil if no metadata is needed.
    Meta map[string]FlagMeta
}

Flags is a container for flags and associated metadata.

type Flagser Uses

type Flagser interface {
    Generator
    Flags() Flags
}

Flagser returns the flags this Generator is configured with. Any randomness in the Generator must be deterministic from these options so that table data initialization, query work, etc can be distributed by sending only these flags.

type Generator Uses

type Generator interface {
    // Meta returns meta information about this generator, including a name,
    // description, and a function to create instances of it.
    Meta() Meta

    // Tables returns the set of tables for this generator, including schemas
    // and initial data.
    Tables() []Table
}

Generator represents one or more sql query loads and associated initial data.

func FromFlags Uses

func FromFlags(meta Meta, flags ...string) Generator

FromFlags returns a new validated generator with the given flags. If anything goes wrong, it panics. FromFlags is intended for use with unit test helpers in individual generators, see its callers for examples.

type Hooks Uses

type Hooks struct {
    // Validate is called after workload flags are parsed. It should return an
    // error if the workload configuration is invalid.
    Validate func() error
    // PreLoad is called after workload tables are created and before workload
    // data is loaded. It is not called when storing or loading a fixture.
    // Implementations should be idempotent.
    //
    // TODO(dan): Deprecate the PreLoad hook, it doesn't play well with fixtures.
    // It's only used in practice for zone configs, so it should be reasonably
    // straightforward to make zone configs first class citizens of
    // workload.Table.
    PreLoad func(*gosql.DB) error
    // PostLoad is called after workload tables are created workload data is
    // loaded. It called after restoring a fixture. This, for example, is where
    // creating foreign keys should go. Implementations should be idempotent.
    PostLoad func(*gosql.DB) error
    // PostRun is called after workload run has ended, with the duration of the
    // run. This is where any post-run special printing or validation can be done.
    PostRun func(time.Duration) error
    // CheckConsistency is called to run generator-specific consistency checks.
    // These are expected to pass after the initial data load as well as after
    // running queryload.
    CheckConsistency func(context.Context, *gosql.DB) error
    // Partition is used to run a partitioning step on the data created by the workload.
    // TODO (rohany): migrate existing partitioning steps (such as tpcc's) into here.
    Partition func(*gosql.DB) error
}

Hooks stores functions to be called at points in the workload lifecycle.

type Hookser Uses

type Hookser interface {
    Generator
    Hooks() Hooks
}

Hookser returns any hooks associated with the generator.

type InitialDataLoader Uses

type InitialDataLoader interface {
    InitialDataLoad(context.Context, *gosql.DB, Generator) (int64, error)
}

InitialDataLoader loads the initial data for all tables in a workload. It returns a measure of how many bytes were loaded.

TODO(dan): It would be lovely if the number of bytes loaded was comparable between implementations but this is sadly not the case right now.

var ImportDataLoader InitialDataLoader = requiresCCLBinaryDataLoader(`IMPORT`)

ImportDataLoader is a hook for binaries that include CCL code to inject an IMPORT-based InitialDataLoader implementation.

type JSONStatistic Uses

type JSONStatistic struct {
    Name          string   `json:"name,omitempty"`
    CreatedAt     string   `json:"created_at"`
    Columns       []string `json:"columns"`
    RowCount      uint64   `json:"row_count"`
    DistinctCount uint64   `json:"distinct_count"`
    NullCount     uint64   `json:"null_count"`
}

JSONStatistic is copied from stats.JSONStatistic to avoid pulling in a dependency on sql/stats.

func MakeStat Uses

func MakeStat(columns []string, rowCount, distinctCount, nullCount uint64) JSONStatistic

MakeStat returns a JSONStatistic given the column names, row count, distinct count, and null count.

type Meta Uses

type Meta struct {
    // Name is a unique name for this generator.
    Name string
    // Description is a short description of this generator.
    Description string
    // Details optionally allows specifying longer, more in-depth usage details.
    Details string
    // Version is a semantic version for this generator. It should be bumped
    // whenever InitialRowFn or InitialRowCount change for any of the tables.
    Version string
    // PublicFacing indicates that this workload is also intended for use by
    // users doing their own testing and evaluations. This allows hiding workloads
    // that are only expected to be used in CockroachDB's internal development to
    // avoid confusion. Workloads setting this to true should pay added attention
    // to their documentation and help-text.
    PublicFacing bool
    // New returns an unconfigured instance of this generator.
    New func() Generator
}

Meta is used to register a Generator at init time and holds meta information about this generator, including a name, description, and a function to create instances of it.

func Get Uses

func Get(name string) (Meta, error)

Get returns the registered Generator with the given name, if it exists.

func Registered Uses

func Registered() []Meta

Registered returns all registered Generators.

type MultiConnPool Uses

type MultiConnPool struct {
    Pools []*pgx.ConnPool
    // contains filtered or unexported fields
}

MultiConnPool maintains a set of pgx ConnPools (to different servers).

func NewMultiConnPool Uses

func NewMultiConnPool(cfg MultiConnPoolCfg, urls ...string) (*MultiConnPool, error)

NewMultiConnPool creates a new MultiConnPool.

Each URL gets one or more pools, and each pool has at most MaxConnsPerPool connections.

The pools have approximately the same number of max connections, adding up to MaxTotalConnections.

func (*MultiConnPool) Close Uses

func (m *MultiConnPool) Close()

Close closes all the pools.

func (*MultiConnPool) Get Uses

func (m *MultiConnPool) Get() *pgx.ConnPool

Get returns one of the pools, in round-robin manner.

func (*MultiConnPool) PrepareEx Uses

func (m *MultiConnPool) PrepareEx(
    ctx context.Context, name, sql string, opts *pgx.PrepareExOptions,
) (*pgx.PreparedStatement, error)

PrepareEx prepares the given statement on all the pools.

type MultiConnPoolCfg Uses

type MultiConnPoolCfg struct {
    // MaxTotalConnections is the total maximum number of connections across all
    // pools.
    MaxTotalConnections int

    // MaxConnsPerPool is the maximum number of connections in any single pool.
    // Limiting this is useful especially for prepared statements, which are
    // prepared on each connection inside a pool (serially).
    // If 0, there is no per-pool maximum (other than the total maximum number of
    // connections which still applies).
    MaxConnsPerPool int
}

MultiConnPoolCfg encapsulates the knobs passed to NewMultiConnPool.

type Opser Uses

type Opser interface {
    Generator
    Ops(urls []string, reg *histogram.Registry) (QueryLoad, error)
}

Opser returns the work functions for this generator. The tables are required to have been created and initialized before running these.

type PgxTx Uses

type PgxTx pgx.Tx

PgxTx is a thin wrapper that implements the crdb.Tx interface, allowing pgx transactions to be used with ExecuteInTx.

func (*PgxTx) Commit Uses

func (tx *PgxTx) Commit() error

Commit is part of the crdb.Tx interface.

func (*PgxTx) ExecContext Uses

func (tx *PgxTx) ExecContext(
    ctx context.Context, sql string, args ...interface{},
) (gosql.Result, error)

ExecContext is part of the crdb.Tx interface.

func (*PgxTx) Rollback Uses

func (tx *PgxTx) Rollback() error

Rollback is part of the crdb.Tx interface.

type QueryLoad Uses

type QueryLoad struct {
    SQLDatabase string

    // WorkerFns is one function per worker. It is to be called once per unit of
    // work to be done.
    WorkerFns []func(context.Context) error

    // Close, if set, is called before the process exits, giving workloads a
    // chance to print some information.
    // It's guaranteed that the ctx passed to WorkerFns (if they're still running)
    // has been canceled by the time this is called (so an implementer can
    // synchronize with the WorkerFns if need be).
    Close func(context.Context)

    // ResultHist is the name of the NamedHistogram to use for the benchmark
    // formatted results output at the end of `./workload run`. The empty string
    // will use the sum of all histograms.
    //
    // TODO(dan): This will go away once more of run.go moves inside Operations.
    ResultHist string
}

QueryLoad represents some SQL query workload performable on a database initialized with the requisite tables.

type SQLRunner Uses

type SQLRunner struct {
    // contains filtered or unexported fields
}

SQLRunner is a helper for issuing SQL statements; it supports multiple methods for issuing queries.

Queries need to first be defined using calls to Define. Then the runner must be initialized, after which we can use the handles returned by Define.

Sample usage:

sr := &workload.SQLRunner{}

sel:= sr.Define("SELECT x FROM t WHERE y = $1")
ins:= sr.Define("INSERT INTO t(x, y) VALUES ($1, $2)")

err := sr.Init(ctx, conn, flags)
// [handle err]

row := sel.QueryRow(1)
// [use row]

_, err := ins.Exec(5, 6)
// [handle err]

A runner should typically be associated with a single worker.

func (*SQLRunner) Define Uses

func (sr *SQLRunner) Define(sql string) StmtHandle

Define creates a handle for the given statement. The handle can be used after Init is called.

func (*SQLRunner) Init Uses

func (sr *SQLRunner) Init(
    ctx context.Context, name string, mcp *MultiConnPool, flags *ConnFlags,
) error

Init initializes the runner; must be called after calls to Define and before the StmtHandles are used.

The name is used for naming prepared statements. Multiple workers that use the same set of defined queries can and should use the same name.

The way we issue queries is set by flags.Method:

- "prepare": we prepare the query once during Init, then we reuse it for
  each execution. This results in a Bind and Execute on the server each time
  we run a query (on the given connection). Note that it's important to
  prepare on separate connections if there are many parallel workers; this
  avoids lock contention in the sql.Rows objects they produce. See #30811.

- "noprepare": each query is issued separately (on the given connection).
  This results in Parse, Bind, Execute on the server each time we run a
  query.

- "simple": each query is issued in a single string; parameters are
  rendered inside the string. This results in a single SimpleExecute
  request to the server for each query. Note that only a few parameter types
  are supported.

type StmtHandle Uses

type StmtHandle struct {
    // contains filtered or unexported fields
}

StmtHandle is associated with a (possibly prepared) statement; created by SQLRunner.Define.

func (StmtHandle) Exec Uses

func (h StmtHandle) Exec(ctx context.Context, args ...interface{}) (pgx.CommandTag, error)

Exec executes a query that doesn't return rows. The query is executed on the connection that was passed to SQLRunner.Init.

See pgx.Conn.Exec.

func (StmtHandle) ExecTx Uses

func (h StmtHandle) ExecTx(
    ctx context.Context, tx *pgx.Tx, args ...interface{},
) (pgx.CommandTag, error)

ExecTx executes a query that doesn't return rows, inside a transaction.

See pgx.Conn.Exec.

func (StmtHandle) Query Uses

func (h StmtHandle) Query(ctx context.Context, args ...interface{}) (*pgx.Rows, error)

Query executes a query that returns rows.

See pgx.Conn.Query.

func (StmtHandle) QueryRow Uses

func (h StmtHandle) QueryRow(ctx context.Context, args ...interface{}) *pgx.Row

QueryRow executes a query that is expected to return at most one row.

See pgx.Conn.QueryRow.

func (StmtHandle) QueryRowTx Uses

func (h StmtHandle) QueryRowTx(ctx context.Context, tx *pgx.Tx, args ...interface{}) *pgx.Row

QueryRowTx executes a query that is expected to return at most one row, inside a transaction.

See pgx.Conn.QueryRow.

func (StmtHandle) QueryTx Uses

func (h StmtHandle) QueryTx(
    ctx context.Context, tx *pgx.Tx, args ...interface{},
) (*pgx.Rows, error)

QueryTx executes a query that returns rows, inside a transaction.

See pgx.Tx.Query.

type Table Uses

type Table struct {
    // Name is the unqualified table name, pre-escaped for use directly in SQL.
    Name string
    // Schema is the SQL formatted schema for this table, with the `CREATE TABLE
    // <name>` prefix omitted.
    Schema string
    // InitialRows is the initial rows that will be present in the table after
    // setup is completed. Note that the default value of NumBatches (zero) is
    // special - such a Table will be skipped during `init`; non-zero NumBatches
    // with a nil FillBatch function will trigger an error during `init`.
    InitialRows BatchedTuples
    // Splits is the initial splits that will be present in the table after
    // setup is completed.
    Splits BatchedTuples
    // Stats is the pre-calculated set of statistics on this table. They can be
    // injected using `ALTER TABLE <name> INJECT STATISTICS ...`.
    Stats []JSONStatistic
}

Table represents a single table in a Generator. Included is a name, schema, and initial data.

Directories

PathSynopsis
bank
bulkingestPackage bulkingest defines a workload that is intended to stress some edge cases in our bulk-ingestion infrastructure.
cli
examples
faker
histogram
indexes
interleavedpartitioned
jsonload
kv
ledger
movr
querybench
querylog
queue
rand
sqlsmith
tpcc
tpccchecks
tpcds
tpch
workloadimplPackage workloadimpl provides dependency-light helpers for implementing workload.Generators.
workloadsql
ycsbPackage ycsb is the workload specified by the Yahoo! Cloud Serving Benchmark.

Package workload imports 30 packages (graph) and is imported by 67 packages. Updated 2019-11-12. Refresh now. Tools for package owners.