go-vitess.v1: gopkg.in/src-d/go-vitess.v1/vt/vttablet/tabletserver/splitquery Index | Examples | Files

package splitquery

import "gopkg.in/src-d/go-vitess.v1/vt/vttablet/tabletserver/splitquery"

Package splitquery contains the logic needed for implementing the tabletserver's SplitQuery RPC.

It defines the Splitter type that drives the query splitting procedure. It cooperates with the SplitParams type and splitAlgorithmInterface interface. See example_test.go for a usage example.

General guidelines for contributing to this package: 1) Error messages should not contain the "splitquery:" prefix. It will be added by the calling code in 'tabletserver'.

Code:

package main

import (
    "fmt"

    "gopkg.in/src-d/go-vitess.v1/sqltypes"
    "gopkg.in/src-d/go-vitess.v1/vt/sqlparser"
    "gopkg.in/src-d/go-vitess.v1/vt/vttablet/tabletserver/schema"

    querypb "gopkg.in/src-d/go-vitess.v1/vt/proto/query"
)

func main() {
    // 1. Create a SplitParams object.
    // There are two "constructors": NewSplitParamsGivenSplitCount and
    // NewSplitParamsGivenNumRowsPerQueryPart. They each take several parameters including a "schema"
    // object which should be a map[string]*schema.Table that maps a table name to its schema.Table
    // object. It is used for error-checking the split columns and their types. We use an empty
    // object for this toy example, but in real code this object must have correct entries.
    //
    // This schema can is typically derived from tabletserver.TabletServer.qe.se.
    schema := map[string]*schema.Table{}
    splitParams, err := NewSplitParamsGivenSplitCount(
        &querypb.BoundQuery{
            Sql:           "SELECT * FROM table WHERE id > :id",
            BindVariables: map[string]*querypb.BindVariable{"id": sqltypes.Int64BindVariable(5)},
        },
        []sqlparser.ColIdent{
            sqlparser.NewColIdent("id"),
            sqlparser.NewColIdent("user_id"),
        },  // SplitColumns
        1000, // SplitCount
        schema)
    if err != nil {
        panic(fmt.Sprintf("NewSplitParamsGivenSplitCount failed with: %v", err))
    }

    // 2. Create the SplitAlgorithmInterface object used for splitting.
    // SplitQuery supports multiple algorithms for splitting the query. These are encapsulated as
    // types implementing the SplitAlgorithmInterface. Currently two algorithms are supported
    // represented by the FullScanAlgorithm and EqualSplitsAlgorithm types. See the documentation
    // of these types for more details on each algorithm.
    // To do the split we'll need to create an object of one of these types and pass it to the
    // Splitter (see below). Here we use the FullScan algorithm.
    // We also pass a type implementing the SQLExecuter interface that the algorithm will
    // use to send statements to MySQL.
    algorithm, err := NewFullScanAlgorithm(splitParams, getSQLExecuter())
    if err != nil {
        panic(fmt.Sprintf("NewFullScanAlgorithm failed with: %v", err))
    }

    // 3. Create a splitter object. Always succeeds.
    splitter := NewSplitter(splitParams, algorithm)

    // 4. Call splitter.Split() to Split the query.
    // The result is a slice of &querypb.QuerySplit objects (and an error object).
    queryParts, err := splitter.Split()
    if err != nil {
        panic(fmt.Sprintf("splitter.Split() failed with: %v", err))
    }
    fmt.Println(queryParts)
}

func getSQLExecuter() SQLExecuter {
    // In real code, this should be an object implementing the SQLExecuter interface.
    return nil
}

Index

Examples

Package Files

doc.go equal_splits_algorithm.go full_scan_algorithm.go split_algorithm_interface.go split_params.go splitter.go sql_executer_interface.go utils.go

type EqualSplitsAlgorithm Uses

type EqualSplitsAlgorithm struct {
    // contains filtered or unexported fields
}

EqualSplitsAlgorithm implements the SplitAlgorithmInterface and represents the equal-splits algorithm for generating the boundary tuples. If this algorithm is used then SplitParams.split_columns must contain only one split_column. Additionally, the split_column must have numeric type (integral or floating point).

The algorithm works by issuing a query to the database to find the minimum and maximum elements of the split column in the table referenced by the given SQL query. Denote these by min and max, respecitvely. The algorithm then "splits" the interval [min, max] into SplitParams.split_count sub-intervals of equal length: [a_1, a_2], [a_2, a_3],..., [a_{split_count}, a_{split_count+1}], where min=a_1 < a_2 < a_3 < ... < a_split_count < a_{split_count+1}=max. The boundary points returned by this algorithm are then: a_2, a_3, ..., a_{split_count} (an empty list of boundary points is returned if split_count <= 1). If the type of the split column is integral, the boundary points are truncated to the integer part.

func NewEqualSplitsAlgorithm Uses

func NewEqualSplitsAlgorithm(splitParams *SplitParams, sqlExecuter SQLExecuter) (
    *EqualSplitsAlgorithm, error)

NewEqualSplitsAlgorithm constructs a new equal splits algorithm. It requires an SQLExecuter since it needs to execute a query to figure out the minimum and maximum elements in the table.

type FullScanAlgorithm Uses

type FullScanAlgorithm struct {
    // contains filtered or unexported fields
}

FullScanAlgorithm implements the SplitAlgorithmInterface and represents the full-scan algorithm for generating the boundary tuples. The algorithm regards the table as ordered (ascendingly) by the split columns. It then returns boundary tuples from rows which are splitParams.numRowsPerQueryPart rows apart. More precisely, it works as follows: It iteratively executes the following query over the replica’s database (recall that MySQL performs tuple comparisons lexicographically):

SELECT <split_columns> FROM <table> FORCE INDEX (PRIMARY)
                       WHERE :prev_boundary <= (<split_columns>)
                       ORDER BY <split_columns>
                       LIMIT <num_rows_per_query_part>, 1

where <split_columns> denotes the ordered list of split columns and <table> is the value of the FROM clause. The 'prev_boundary' bind variable holds a tuple consisting of split column values. It is updated after each iteration with the result of the query. In the query executed in the first iteration (the initial query) the term ':prev_boundary <= (<split_columns>)' is omitted. The algorithm stops when the query returns no results. The result of this algorithm is the list consisting of the result of each query in order.

Actually, the code below differs slightly from the above description: the lexicographial tuple inequality in the query above is re-written to use only scalar comparisons since MySQL does not optimize queries involving tuple inequalities correctly. Instead of using a single tuple bind variable: 'prev_boundary', the code uses a list of scalar bind-variables--one for each element of the tuple. The bind variable storing the tuple element corresponding to a split-column named 'x' is called <prevBindVariablePrefix><x>, where prevBindVariablePrefix is the string constant defined below.

func NewFullScanAlgorithm Uses

func NewFullScanAlgorithm(
    splitParams *SplitParams, sqlExecuter SQLExecuter) (*FullScanAlgorithm, error)

NewFullScanAlgorithm constructs a new FullScanAlgorithm.

type SQLExecuter Uses

type SQLExecuter interface {
    SQLExecute(sql string, bindVariables map[string]*querypb.BindVariable) (*sqltypes.Result, error)
}

SQLExecuter enacpsulates access to the MySQL database for the this package.

type SplitAlgorithmInterface Uses

type SplitAlgorithmInterface interface {
    // contains filtered or unexported methods
}

SplitAlgorithmInterface defines the interface for a splitting algorithm.

type SplitParams Uses

type SplitParams struct {
    // contains filtered or unexported fields
}

SplitParams stores the context for a splitquery computation. It is used by both the Splitter object and the SplitAlgorithm object and caches some data that is used by both.

func NewSplitParamsGivenNumRowsPerQueryPart Uses

func NewSplitParamsGivenNumRowsPerQueryPart(
    query *querypb.BoundQuery,
    splitColumnNames []sqlparser.ColIdent,
    numRowsPerQueryPart int64,
    schema map[string]*schema.Table,
) (*SplitParams, error)

NewSplitParamsGivenNumRowsPerQueryPart returns a new SplitParams object to be used in a splitquery request in which the Vitess client specified a numRowsPerQueryPart parameter. See NewSplitParamsGivenSplitCount for the constructor to use if the Vitess client specified a splitCount parameter.

Parameters:

'sql' is the SQL query to split. The query must satisfy the restrictions found in the documentation of the vtgate.SplitQueryRequest.query protocol buffer field.

'bindVariables' are the bind-variables for the sql query.

'splitColumnNames' should contain the names of split columns to use. These must adhere to the restrictions found in the documentation of the vtgate.SplitQueryRequest.split_column protocol buffer field. If splitColumnNames is empty, the split columns used are the primary key columns (in order).

'numRowsPerQueryPart' is the desired number of rows per query part returned. The actual number may be different depending on the split-algorithm used.

'schema' should map a table name to a schema.Table. It is used for looking up the split-column types and error checking.

func NewSplitParamsGivenSplitCount Uses

func NewSplitParamsGivenSplitCount(
    query *querypb.BoundQuery,
    splitColumnNames []sqlparser.ColIdent,
    splitCount int64,
    schema map[string]*schema.Table,
) (*SplitParams, error)

NewSplitParamsGivenSplitCount returns a new SplitParams object to be used in a splitquery request in which the Vitess client specified a splitCount parameter. See NewSplitParamsGivenNumRowsPerQueryPart for the constructor to use if the Vitess client specified a numRowsPerQueryPart parameter.

Parameters:

'sql' is the SQL query to split. The query must satisfy the restrictions found in the documentation of the vtgate.SplitQueryRequest.query protocol buffer field.

'bindVariables' are the bind-variables for the sql query.

'splitColumnNames' should contain the names of split columns to use. These must adhere to the restrictions found in the documentation of the vtgate.SplitQueryRequest.split_column protocol buffer field. If splitColumnNames is empty, the split columns used are the primary key columns (in order).

'splitCount' is the desired splitCount to use. The actual number may be different depending on the split-algorithm used.

'schema' should map a table name to a schema.Table. It is used for looking up the split-column types and error checking.

func (*SplitParams) GetSplitTableName Uses

func (sp *SplitParams) GetSplitTableName() sqlparser.TableIdent

GetSplitTableName returns the name of the table to split.

type Splitter Uses

type Splitter struct {
    // contains filtered or unexported fields
}

Splitter is used to drive the splitting procedure.

func NewSplitter Uses

func NewSplitter(splitParams *SplitParams, algorithm SplitAlgorithmInterface) *Splitter

NewSplitter creates a new Splitter object.

func (*Splitter) Split Uses

func (splitter *Splitter) Split() ([]*querypb.QuerySplit, error)

Split does the actual work of splitting the query. It returns a slice of *querypb.QuerySplit objects representing the query parts.

Package splitquery imports 10 packages (graph) and is imported by 1 packages. Updated 2019-06-16. Refresh now. Tools for package owners.