cockroach: github.com/cockroachdb/cockroach/pkg/sql/opt/xform Index | Files

package xform

import "github.com/cockroachdb/cockroach/pkg/sql/opt/xform"

Index

Package Files

coster.go custom_funcs.go explorer.go index_scan_builder.go interesting_orderings.go join_order_builder.go memo_format.go optimizer.go physical_props.go scan_index_iter.go

func BuildChildPhysicalProps Uses

func BuildChildPhysicalProps(
    mem *memo.Memo, parent memo.RelExpr, nth int, parentProps *physical.Required,
) *physical.Required

BuildChildPhysicalProps returns the set of physical properties required of the nth child, based upon the properties required of the parent. For example, the Project operator passes through any ordering requirement to its child, but provides any presentation requirement.

The childProps argument is allocated once by the caller and can be reused repeatedly as physical properties are derived for each child. On each call, buildChildPhysicalProps updates the childProps argument.

func BuildChildPhysicalPropsScalar Uses

func BuildChildPhysicalPropsScalar(mem *memo.Memo, parent opt.Expr, nth int) *physical.Required

BuildChildPhysicalPropsScalar is like BuildChildPhysicalProps, but for when the parent is a scalar expression.

func CanProvidePhysicalProps Uses

func CanProvidePhysicalProps(e memo.RelExpr, required *physical.Required) bool

CanProvidePhysicalProps returns true if the given expression can provide the required physical properties. The optimizer uses this to determine whether an expression provides a required physical property. If it does not, then the optimizer inserts an enforcer operator that is able to provide it.

Some operators, like Select and Project, may not directly provide a required physical property, but do "pass through" the requirement to their input. Operators that do this should return true from the appropriate canProvide method and then pass through that property in the buildChildPhysicalProps method.

func DeriveInterestingOrderings Uses

func DeriveInterestingOrderings(e memo.RelExpr) opt.OrderingSet

DeriveInterestingOrderings calculates and returns the Relational.Rule.InterestingOrderings property of a relational operator.

type AppliedRuleFunc Uses

type AppliedRuleFunc = norm.AppliedRuleFunc

AppliedRuleFunc defines the callback function for the NotifyOnAppliedRule event supported by the optimizer. See the comment in factory.go for more details.

type Coster Uses

type Coster interface {
    // ComputeCost returns the estimated cost of executing the candidate
    // expression. The optimizer does not expect the cost to correspond to any
    // real-world metric, but does expect costs to be comparable to one another,
    // as well as summable.
    ComputeCost(candidate memo.RelExpr, required *physical.Required) memo.Cost
}

Coster is used by the optimizer to assign a cost to a candidate expression that can provide a set of required physical properties. If a candidate expression has a lower cost than any other expression in the memo group, then it becomes the new best expression for the group.

The set of costing formulas maintained by the coster for the set of all operators constitute the "cost model". A given cost model can be designed to maximize any optimization goal, such as:

1. Max aggregate cluster throughput (txns/sec across cluster)
2. Min transaction latency (time to commit txns)
3. Min latency to first row (time to get first row of txns)
4. Min memory usage
5. Some weighted combination of #1 - #4

The cost model in this file targets #1 as the optimization goal. However, note that #2 is implicitly important to that goal, since overall cluster throughput will suffer if there are lots of pending transactions waiting on I/O.

Coster is an interface so that different costing algorithms can be used by the optimizer. For example, the OptSteps command uses a custom coster that assigns infinite costs to some expressions in order to prevent them from being part of the lowest cost tree (for debugging purposes).

func MakeDefaultCoster Uses

func MakeDefaultCoster(mem *memo.Memo) Coster

MakeDefaultCoster creates an instance of the default coster.

type CustomFuncs Uses

type CustomFuncs struct {
    norm.CustomFuncs
    // contains filtered or unexported fields
}

CustomFuncs contains all the custom match and replace functions used by the exploration rules. The unnamed norm.CustomFuncs allows CustomFuncs to provide a clean interface for calling functions from both the xform and norm packages using the same struct.

func (*CustomFuncs) AddPrimaryKeyColsToScanPrivate Uses

func (c *CustomFuncs) AddPrimaryKeyColsToScanPrivate(sp *memo.ScanPrivate) *memo.ScanPrivate

AddPrimaryKeyColsToScanPrivate creates a new ScanPrivate that is the same as the input ScanPrivate, but has primary keys added to the ColSet.

func (*CustomFuncs) AddWithBinding Uses

func (c *CustomFuncs) AddWithBinding(expr memo.RelExpr) opt.WithID

AddWithBinding adds a With binding for the given binding expression. Returns the WithID associated with the binding.

func (*CustomFuncs) CanGenerateInvertedJoin Uses

func (c *CustomFuncs) CanGenerateInvertedJoin(rightInput memo.RelExpr, on memo.FiltersExpr) bool

CanGenerateInvertedJoin is a best-effort check that returns true if it may be possible to generate an inverted join with the given right-side input and on conditions. It may return some false positives, but it is used to avoid applying certain rules such as ConvertLeftToInnerJoin in cases where they may not be beneficial.

func (*CustomFuncs) CanLimitFilteredScan Uses

func (c *CustomFuncs) CanLimitFilteredScan(
    scanPrivate *memo.ScanPrivate, required physical.OrderingChoice,
) bool

CanLimitFilteredScan returns true if the given scan has not already been limited, and is constrained or scans a partial index. This is only possible when the required ordering of the rows to be limited can be satisfied by the Scan operator.

NOTE: Limiting unconstrained, non-partial index scans is done by the

GenerateLimitedScans rule, since that can require IndexJoin operators
to be generated.

func (*CustomFuncs) ConvertIndexToLookupJoinPrivate Uses

func (c *CustomFuncs) ConvertIndexToLookupJoinPrivate(
    indexPrivate *memo.IndexJoinPrivate, outCols opt.ColSet,
) *memo.LookupJoinPrivate

ConvertIndexToLookupJoinPrivate constructs a new LookupJoinPrivate using the given IndexJoinPrivate with the given output columns.

func (*CustomFuncs) EnsureNotNullColFromFilteredScan Uses

func (c *CustomFuncs) EnsureNotNullColFromFilteredScan(expr memo.RelExpr) memo.RelExpr

EnsureNotNullColFromFilteredScan ensures that there is at least one not-null column in the given expression. If there is already at least one not-null column, EnsureNotNullColFromFilteredScan returns the expression unchanged. Otherwise, it calls TryAddKeyToScan, which will try to augment the expression with the primary key of the underlying table scan (guaranteed to be not-null). In order for this call to succeed, the input expression must be a non-virtual Scan, optionally wrapped in a Select. Otherwise, EnsureNotNullColFromFilteredScan will panic.

Note that we cannot just project a not-null constant value because we want the output expression to be like the input: a non-virtual Scan, optionally wrapped in a Select. This is needed to ensure that GenerateInvertedJoins or GenerateInvertedJoinsFromSelect can match on this expression.

func (*CustomFuncs) ExprPairFiltersItemToReplace Uses

func (c *CustomFuncs) ExprPairFiltersItemToReplace(ep ExprPair) *memo.FiltersItem

ExprPairFiltersItemToReplace returns the original FiltersItem that the ExprPair was generated from. This FiltersItem should be replaced by ExprPairLeft and ExprPairRight in the newly generated filters in SplitDisjunction(AddKey).

func (*CustomFuncs) ExprPairForSplitDisjunction Uses

func (c *CustomFuncs) ExprPairForSplitDisjunction(
    sp *memo.ScanPrivate, filters memo.FiltersExpr,
) ExprPair

ExprPairForSplitDisjunction finds the first "interesting" ExprPair in the filters and returns it. If an "interesting" ExprPair is not found, an empty ExprPair is returned.

For details on what makes an ExprPair "interesting", see buildExprPairForSplitDisjunction.

func (*CustomFuncs) ExprPairLeft Uses

func (c *CustomFuncs) ExprPairLeft(ep ExprPair) opt.ScalarExpr

ExprPairLeft returns the left ScalarExpr in an ExprPair.

func (*CustomFuncs) ExprPairRight Uses

func (c *CustomFuncs) ExprPairRight(ep ExprPair) opt.ScalarExpr

ExprPairRight returns the right ScalarExpr in an ExprPair.

func (*CustomFuncs) ExprPairSucceeded Uses

func (c *CustomFuncs) ExprPairSucceeded(ep ExprPair) bool

ExprPairSucceeded returns true if the ExprPair is not nil.

func (*CustomFuncs) GenerateConstrainedScans Uses

func (c *CustomFuncs) GenerateConstrainedScans(
    grp memo.RelExpr, scanPrivate *memo.ScanPrivate, explicitFilters memo.FiltersExpr,
)

GenerateConstrainedScans enumerates all non-inverted secondary indexes on the Scan operator's table and tries to push the given Select filter into new constrained Scan operators using those indexes. Since this only needs to be done once per table, GenerateConstrainedScans should only be called on the original unaltered primary index Scan operator (i.e. not constrained or limited).

For each secondary index that "covers" the columns needed by the scan, there are three cases:

- a filter that can be completely converted to a constraint over that index
  generates a single constrained Scan operator (to be added to the same
  group as the original Select operator):

    (Scan $scanDef)

- a filter that can be partially converted to a constraint over that index
  generates a constrained Scan operator in a new memo group, wrapped in a
  Select operator having the remaining filter (to be added to the same group
  as the original Select operator):

    (Select (Scan $scanDef) $filter)

- a filter that cannot be converted to a constraint generates nothing

And for a secondary index that does not cover the needed columns:

- a filter that can be completely converted to a constraint over that index
  generates a single constrained Scan operator in a new memo group, wrapped
  in an IndexJoin operator that looks up the remaining needed columns (and
  is added to the same group as the original Select operator)

    (IndexJoin (Scan $scanDef) $indexJoinDef)

- a filter that can be partially converted to a constraint over that index
  generates a constrained Scan operator in a new memo group, wrapped in an
  IndexJoin operator that looks up the remaining needed columns; the
  remaining filter is distributed above and/or below the IndexJoin,
  depending on which columns it references:

    (IndexJoin
      (Select (Scan $scanDef) $filter)
      $indexJoinDef
    )

    (Select
      (IndexJoin (Scan $scanDef) $indexJoinDef)
      $filter
    )

    (Select
      (IndexJoin
        (Select (Scan $scanDef) $innerFilter)
        $indexJoinDef
      )
      $outerFilter
    )

GenerateConstrainedScans will further constrain the enumerated index scans by trying to use the check constraints and computed columns that apply to the table being scanned, as well as the partitioning defined for the index. See comments above checkColumnFilters, computedColFilters, and partitionValuesFilters for more detail.

func (*CustomFuncs) GenerateIndexScans Uses

func (c *CustomFuncs) GenerateIndexScans(grp memo.RelExpr, scanPrivate *memo.ScanPrivate)

GenerateIndexScans enumerates all non-inverted secondary indexes on the given Scan operator's table and generates an alternate Scan operator for each index that includes the set of needed columns specified in the ScanOpDef.

This transformation can only consider partial indexes that are guaranteed to index every row in the table. Therefore, only partial indexes with predicates that always evaluate to true are considered. Such an index is pseudo-partial in that it behaves the exactly the same as a non-partial secondary index.

NOTE: This does not generate index joins for non-covering indexes (except in

case of ForceIndex). Index joins are usually only introduced "one level
up", when the Scan operator is wrapped by an operator that constrains
or limits scan output in some way (e.g. Select, Limit, InnerJoin).
Index joins are only lower cost when their input does not include all
rows from the table. See ConstrainScans and LimitScans for cases where
index joins are introduced into the memo.

func (*CustomFuncs) GenerateInvertedIndexScans Uses

func (c *CustomFuncs) GenerateInvertedIndexScans(
    grp memo.RelExpr, scanPrivate *memo.ScanPrivate, filters memo.FiltersExpr,
)

GenerateInvertedIndexScans enumerates all inverted indexes on the Scan operator's table and generates an alternate Scan operator for each inverted index that can service the query.

The resulting Scan operator is pre-constrained and requires an IndexJoin to project columns other than the primary key columns. The reason it's pre- constrained is that we cannot treat an inverted index in the same way as a regular index, since it does not actually contain the indexed column.

func (*CustomFuncs) GenerateInvertedIndexZigzagJoins Uses

func (c *CustomFuncs) GenerateInvertedIndexZigzagJoins(
    grp memo.RelExpr, scanPrivate *memo.ScanPrivate, filters memo.FiltersExpr,
)

GenerateInvertedIndexZigzagJoins generates zigzag joins for constraints on inverted index. It looks for cases where one inverted index can satisfy two constraints, and it produces zigzag joins with the same index on both sides of the zigzag join for those cases, fixed on different constant values.

func (*CustomFuncs) GenerateInvertedJoins Uses

func (c *CustomFuncs) GenerateInvertedJoins(
    grp memo.RelExpr,
    joinType opt.Operator,
    input memo.RelExpr,
    scanPrivate *memo.ScanPrivate,
    on memo.FiltersExpr,
    joinPrivate *memo.JoinPrivate,
)

GenerateInvertedJoins is similar to GenerateLookupJoins, but instead of generating lookup joins with regular indexes, it generates lookup joins with inverted indexes. Similar to GenerateLookupJoins, there are two cases depending on whether or not the index is covering. See the comment above GenerateLookupJoins for details. TODO(rytaft): add support for JSON and array inverted indexes.

func (*CustomFuncs) GenerateLimitedScans Uses

func (c *CustomFuncs) GenerateLimitedScans(
    grp memo.RelExpr,
    scanPrivate *memo.ScanPrivate,
    limit tree.Datum,
    required physical.OrderingChoice,
)

GenerateLimitedScans enumerates all non-inverted and non-partial secondary indexes on the Scan operator's table and tries to create new limited Scan operators from them. Since this only needs to be done once per table, GenerateLimitedScans should only be called on the original unaltered primary index Scan operator (i.e. not constrained or limited).

For a secondary index that "covers" the columns needed by the scan, a single limited Scan operator is created. For a non-covering index, an IndexJoin is constructed to add missing columns to the limited Scan.

Inverted index scans are not guaranteed to produce a specific number of result rows because they contain multiple entries for a single row indexed. Therefore, they cannot be considered for limited scans.

Partial indexes do not index every row in the table and they can only be used in cases where a query filter implies the partial index predicate. GenerateLimitedScans deals with limits, but no filters, so it cannot generate limited partial index scans. Limiting partial indexes is done by the PushLimitIntoFilteredScans rule.

func (*CustomFuncs) GenerateLookupJoins Uses

func (c *CustomFuncs) GenerateLookupJoins(
    grp memo.RelExpr,
    joinType opt.Operator,
    input memo.RelExpr,
    scanPrivate *memo.ScanPrivate,
    on memo.FiltersExpr,
    joinPrivate *memo.JoinPrivate,
)

GenerateLookupJoins looks at the possible indexes and creates lookup join expressions in the current group. A lookup join can be created when the ON condition has equality constraints on a prefix of the index columns.

There are two cases:

1. The index has all the columns we need; this is the simple case, where we
   generate a LookupJoin expression in the current group:

       Join                       LookupJoin(t@idx))
       /   \                           |
      /     \            ->            |
    Input  Scan(t)                   Input

2. The index is not covering. We have to generate an index join above the
   lookup join. Note that this index join is also implemented as a
   LookupJoin, because an IndexJoin can only output columns from one table,
   whereas we also need to output columns from Input.

       Join                       LookupJoin(t@primary)
       /   \                           |
      /     \            ->            |
    Input  Scan(t)                LookupJoin(t@idx)
                                       |
                                       |
                                     Input

   For example:
    CREATE TABLE abc (a INT PRIMARY KEY, b INT, c INT)
    CREATE TABLE xyz (x INT PRIMARY KEY, y INT, z INT, INDEX (y))
    SELECT * FROM abc JOIN xyz ON a=y

   We want to first join abc with the index on y (which provides columns y, x)
   and then use a lookup join to retrieve column z. The "index join" (top
   LookupJoin) will produce columns a,b,c,x,y; the lookup columns are just z
   (the original index join produced x,y,z).

   Note that the top LookupJoin "sees" column IDs from the table on both
   "sides" (in this example x,y on the left and z on the right) but there is
   no overlap.

func (*CustomFuncs) GenerateMergeJoins Uses

func (c *CustomFuncs) GenerateMergeJoins(
    grp memo.RelExpr,
    originalOp opt.Operator,
    left, right memo.RelExpr,
    on memo.FiltersExpr,
    joinPrivate *memo.JoinPrivate,
)

GenerateMergeJoins spawns MergeJoinOps, based on any interesting orderings.

func (*CustomFuncs) GeneratePartialIndexScans Uses

func (c *CustomFuncs) GeneratePartialIndexScans(
    grp memo.RelExpr, scanPrivate *memo.ScanPrivate, filters memo.FiltersExpr,
)

GeneratePartialIndexScans generates unconstrained index scans over all non-inverted, partial indexes with predicates that are implied by the filters. Partial indexes with predicates which cannot be proven to be implied by the filters are disregarded.

When a filter completely matches the predicate, the remaining filters are simplified so that they do not include the filter. A redundant filter is unnecessary to include in the remaining filters because a scan over the partial index implicitly filters the results.

For every partial index that is implied by the filters, a Scan will be generated along with a combination of an IndexJoin and Selects. There are three questions to consider which determine which operators are generated.

1. Does the index "cover" the columns needed?
2. Are there any remaining filters to apply after the Scan?
3. If there are remaining filters does the index cover the referenced
   columns?

If the index covers the columns needed, no IndexJoin is need. The two possible generated expressions are either a lone Scan or a Scan wrapped in a Select that applies any remaining filters.

(Scan $scanDef)

(Select (Scan $scanDef) $remainingFilters)

If the index is not covering, then an IndexJoin is required to retrieve the needed columns. Some or all of the remaining filters may be required to be applied after the IndexJoin, because they reference columns not covered by the index. Therefore, Selects can be constructed before, after, or both before and after the IndexJoin depending on the columns referenced in the remaining filters.

If the index is not covering, then an IndexJoin is required to retrieve the needed columns. Some of the remaining filters may be applied in a Select before the IndexJoin, if all the columns referenced in the filter are covered by the index. Some of the remaining filters may be applied in a Select after the IndexJoin, if their columns are not covered. Therefore, Selects can be constructed before, after, or both before and after the IndexJoin.

 (IndexJoin (Scan $scanDef) $indexJoinDef)

 (IndexJoin
   (Select (Scan $scanDef) $remainingFilters)
   $indexJoinDef
 )

(Select
  (IndexJoin (Scan $scanDef) $indexJoinDef)
  $outerFilter
)

(Select
  (IndexJoin
    (Select (Scan $scanDef) $innerFilter)
    $indexJoinDef
  )
  $outerFilter
)

func (*CustomFuncs) GenerateStreamingGroupBy Uses

func (c *CustomFuncs) GenerateStreamingGroupBy(
    grp memo.RelExpr,
    op opt.Operator,
    input memo.RelExpr,
    aggs memo.AggregationsExpr,
    private *memo.GroupingPrivate,
)

GenerateStreamingGroupBy generates variants of a GroupBy or DistinctOn expression with more specific orderings on the grouping columns, using the interesting orderings property. See the GenerateStreamingGroupBy rule.

func (*CustomFuncs) GenerateZigzagJoins Uses

func (c *CustomFuncs) GenerateZigzagJoins(
    grp memo.RelExpr, scanPrivate *memo.ScanPrivate, filters memo.FiltersExpr,
)

GenerateZigzagJoins generates zigzag joins for all pairs of indexes of the Scan table which contain one of the constant columns in the FiltersExpr as its prefix.

Similar to the lookup join, if the selected index pair does not contain all the columns in the output of the scan, we wrap the zigzag join in another index join (implemented as a lookup join) on the primary index. The index join is implemented with a lookup join since the index join does not support arbitrary input sources that are not plain index scans.

func (*CustomFuncs) HasInvertedIndexes Uses

func (c *CustomFuncs) HasInvertedIndexes(scanPrivate *memo.ScanPrivate) bool

HasInvertedIndexes returns true if at least one inverted index is defined on the Scan operator's table.

func (*CustomFuncs) Init Uses

func (c *CustomFuncs) Init(e *explorer)

Init initializes a new CustomFuncs with the given explorer.

func (*CustomFuncs) IsCanonicalGroupBy Uses

func (c *CustomFuncs) IsCanonicalGroupBy(private *memo.GroupingPrivate) bool

IsCanonicalGroupBy returns true if the private is for the canonical version of the grouping operator. This is the operator that is built initially (and has all grouping columns as optional in the ordering), as opposed to variants generated by the GenerateStreamingGroupBy exploration rule.

func (*CustomFuncs) IsCanonicalScan Uses

func (c *CustomFuncs) IsCanonicalScan(scan *memo.ScanPrivate) bool

IsCanonicalScan returns true if the given ScanPrivate is an original unaltered primary index Scan operator (i.e. unconstrained and not limited).

func (*CustomFuncs) IsLocking Uses

func (c *CustomFuncs) IsLocking(scan *memo.ScanPrivate) bool

IsLocking returns true if the ScanPrivate is configured to use a row-level locking mode. This can be the case either because the Scan is in the scope of a SELECT .. FOR [KEY] UPDATE/SHARE clause or because the Scan was configured as part of the row retrieval of a DELETE or UPDATE statement.

func (*CustomFuncs) IsSimpleEquality Uses

func (c *CustomFuncs) IsSimpleEquality(filters memo.FiltersExpr) bool

IsSimpleEquality returns true if all of the filter conditions are equalities between simple data types (constants, variables, tuples and NULL).

func (*CustomFuncs) LimitScanPrivate Uses

func (c *CustomFuncs) LimitScanPrivate(
    scanPrivate *memo.ScanPrivate, limit tree.Datum, required physical.OrderingChoice,
) *memo.ScanPrivate

LimitScanPrivate constructs a new ScanPrivate value that is based on the given ScanPrivate. The new private's HardLimit is set to the given limit, which must be a constant int datum value. The other fields are inherited from the existing private.

func (*CustomFuncs) MakeOrderingChoiceFromColumn Uses

func (c *CustomFuncs) MakeOrderingChoiceFromColumn(
    op opt.Operator, col opt.ColumnID,
) physical.OrderingChoice

MakeOrderingChoiceFromColumn constructs a new OrderingChoice with one element in the sequence: the columnID in the order defined by (MIN/MAX) operator. This function was originally created to be used with the Replace(Min|Max)WithLimit exploration rules.

WARNING: The MinOp case can return a NULL value if the column allows it. This is because NULL values sort first in CRDB.

func (*CustomFuncs) MakeProjectFromPassthroughAggs Uses

func (c *CustomFuncs) MakeProjectFromPassthroughAggs(
    grp memo.RelExpr, input memo.RelExpr, aggs memo.AggregationsExpr,
)

MakeProjectFromPassthroughAggs constructs a top-level Project operator that contains one output column per function in the given aggregrate list. The input expression is expected to return zero or one rows, and the aggregate functions are expected to always pass through their values in that case.

func (*CustomFuncs) MakeSetPrivateForSplitDisjunction Uses

func (c *CustomFuncs) MakeSetPrivateForSplitDisjunction(
    left, right *memo.ScanPrivate,
) *memo.SetPrivate

MakeSetPrivateForSplitDisjunction constructs a new SetPrivate with column sets from the left and right ScanPrivate. We use the same ColList for the LeftCols and OutCols of the SetPrivate because we've used the original ScanPrivate column IDs for the left ScanPrivate and those are safe to use as output column IDs of the Union expression.

func (*CustomFuncs) MakeWithPrivate Uses

func (c *CustomFuncs) MakeWithPrivate(id opt.WithID) *memo.WithPrivate

MakeWithPrivate returns a WithPrivate containing the given WithID.

func (*CustomFuncs) MakeWithScan Uses

func (c *CustomFuncs) MakeWithScan(withID opt.WithID) memo.RelExpr

MakeWithScan constructs a WithScan expression that scans the With expression with the given WithID. It creates new columns in the metadata for the WithScan output columns.

func (*CustomFuncs) MakeWithScanKeyEqualityFilters Uses

func (c *CustomFuncs) MakeWithScanKeyEqualityFilters(left, right opt.Expr) memo.FiltersExpr

MakeWithScanKeyEqualityFilters takes two WithScan expressions that scan the same With expression. It returns a filters expression that contains equality conditions between the primary key columns from each side. For example, if WithScans a and b are both scanning a With expression that has key columns x and y, MakeWithScanKeyEqualityFilters will return filters a.x = b.x AND a.y = b.y.

func (*CustomFuncs) MakeWithScanUsingCols Uses

func (c *CustomFuncs) MakeWithScanUsingCols(withID opt.WithID, outColSet opt.ColSet) memo.RelExpr

MakeWithScanUsingCols constructs a WithScan expression that scans the With expression with the given WithID. It uses the provided columns for the output columns of the WithScan.

func (*CustomFuncs) MapFilterCols Uses

func (c *CustomFuncs) MapFilterCols(
    filters memo.FiltersExpr, src, dst opt.ColSet,
) memo.FiltersExpr

MapFilterCols returns a new FiltersExpr with all the src column IDs in the input expression replaced with column IDs in dst.

NOTE: Every ColumnID in src must map to the a ColumnID in dst with the same relative position in the ColSets. For example, if src and dst are (1, 5, 6) and (7, 12, 15), then the following mapping would be applied:

1 => 7
5 => 12
6 => 15

func (*CustomFuncs) MapScanFilterCols Uses

func (c *CustomFuncs) MapScanFilterCols(
    filters memo.FiltersExpr, src *memo.ScanPrivate, dst *memo.ScanPrivate,
) memo.FiltersExpr

MapScanFilterCols returns a new FiltersExpr with all the src column IDs in the input expression replaced with column IDs in dst.

NOTE: Every ColumnID in src must map to the a ColumnID in dst with the same relative position in the ColSets. For example, if src and dst are (1, 5, 6) and (7, 12, 15), then the following mapping would be applied:

1 => 7
5 => 12
6 => 15

func (*CustomFuncs) NotNullCol Uses

func (c *CustomFuncs) NotNullCol(expr memo.RelExpr) opt.ColumnID

NotNullCol returns the first not-null column from the input expression. EnsureNotNullColFromFilteredScan must have been called previously to ensure that such a column exists.

func (*CustomFuncs) OtherAggsAreConst Uses

func (c *CustomFuncs) OtherAggsAreConst(
    aggs memo.AggregationsExpr, except *memo.AggregationsItem,
) bool

OtherAggsAreConst returns true if all items in the given aggregate list contain ConstAgg functions, except for the "except" item. The ConstAgg functions will always return the same value, as long as there is at least one input row.

func (*CustomFuncs) ReorderJoins Uses

func (c *CustomFuncs) ReorderJoins(grp memo.RelExpr) memo.RelExpr

ReorderJoins adds alternate orderings of the given join tree to the memo. The first expression of the memo group is used for construction of the join graph. For more information, see the comment in join_order_builder.go.

func (*CustomFuncs) ReplaceOutputCols Uses

func (c *CustomFuncs) ReplaceOutputCols(expr memo.RelExpr) memo.RelExpr

ReplaceOutputCols replaces the output columns of the given expression by wrapping the expression in a project expression that projects each of the original output columns as a new column with a new ColumnID.

func (*CustomFuncs) ScanIsInverted Uses

func (c *CustomFuncs) ScanIsInverted(sp *memo.ScanPrivate) bool

ScanIsInverted returns true if the index of the given ScanPrivate is an inverted index.

func (*CustomFuncs) ScanIsLimited Uses

func (c *CustomFuncs) ScanIsLimited(sp *memo.ScanPrivate) bool

ScanIsLimited returns true if the scan operator with the given ScanPrivate is limited.

func (*CustomFuncs) ShouldReorderJoins Uses

func (c *CustomFuncs) ShouldReorderJoins(root memo.RelExpr) bool

ShouldReorderJoins returns whether the optimizer should attempt to find a better ordering of inner joins. This is the case if the given expression is the first expression of its group, and the join tree rooted at the expression has not previously been reordered. This is to avoid duplicate work. In addition, a join cannot be reordered if it has join hints.

func (*CustomFuncs) SplitScanIntoUnionScans Uses

func (c *CustomFuncs) SplitScanIntoUnionScans(
    limitOrdering physical.OrderingChoice, scan memo.RelExpr, sp *memo.ScanPrivate, limit tree.Datum,
) memo.RelExpr

SplitScanIntoUnionScans returns a Union of Scan operators with hard limits that each scan over a single key from the original scan's constraints. This is beneficial in cases where the original scan had to scan over many rows but had relatively few keys to scan over. TODO(drewk): handle inverted scans.

type ExprPair Uses

type ExprPair struct {
    // contains filtered or unexported fields
}

ExprPair stores a left and right ScalarExpr. ExprPairForSplitDisjunction returns ExprPair, which can be deconstructed later, to avoid extra computation in determining the left and right expression groups.

type FmtFlags Uses

type FmtFlags int

FmtFlags controls how the memo output is formatted.

const (
    // FmtPretty performs a breadth-first topological sort on the memo groups,
    // and shows the root group at the top of the memo.
    FmtPretty FmtFlags = iota
)

type JoinOrderBuilder Uses

type JoinOrderBuilder struct {
    // contains filtered or unexported fields
}

JoinOrderBuilder is used to add valid orderings of a given join tree to the memo during exploration.

Motivation ----------

For any given join tree, it is possible to enumerate all possible orderings through exhaustive application of transformation rules. However, enumerating join orderings this way leads to a large number of duplicates, which in turn significantly impacts planning time. Ideally, each unique join ordering would only be enumerated once.

In addition, in the vast majority of cases, the optimal plan for a query does not involve introducing cross joins that were not present in the original version of the plan. This is because cross joins produce |m| x |n| output tuples, where |m| is left input cardinality and |n| is right input cardinality. With a query like this:

SELECT *
FROM (SELECT * FROM xy INNER JOIN ab ON x = a)
INNER JOIN uv ON x = u

An ordering like the following is valid but not desirable, since the cross join will likely be very expensive compared to a join with a predicate:

SELECT *
FROM (SELECT * FROM uv INNER JOIN ab ON True)
INNER JOIN xy ON x = a AND x = u

Avoiding cross joins significantly decreases the search space (and therefore planning time) without preventing the best plan from being found in most cases.

Finally, join trees that incorporate non-inner joins should be handled as well as inner join trees. This is non-trivial, because non-inner joins are much more restrictive than inner joins. For example, the associative property does not apply to an inner join in the right input of a left join. Any join enumeration method must take care not to introduce invalid orderings.

How JoinOrderBuilder works --------------------------

First, the initial join tree is traversed and encoded as a hypergraph (which is a graph in which any edge can reference two or more vertexes). The 'leaves' of the join tree (base relations) become the vertexes of the graph, and the edges are built using the ON conditions of the joins.

Taking this query as an example:

SELECT *
FROM (SELECT * FROM xy LEFT JOIN ab ON x = a)
INNER JOIN uv ON x = u AND (y = b OR b IS NULL)

The vertexes of the graph would represent the base relations xy, ab and uv. The three edges would be:

x = a [left]
x = u [inner]
y = b OR b IS NULL [inner]

Then, the DPSube algorithm is executed (see citations: [8]). DPSube enumerates all disjoint pairs of subsets of base relations such as ({xy}, {uv}) or ({xy, uv}, {cd}). This is done efficiently using bit sets. For each pair of relation subsets, the edges are iterated over. For each edge, if certain edge conditions (to be described in the next section) are satisfied, a new join is created using the filters from that edge and the relation subsets as inputs.

Avoiding invalid orderings --------------------------

Earlier, it was mentioned that edges are restricted in their ability to form new joins. How is this accomplished?

The paper 'On the correct and complete enumeration of the core search space' introduces the concept of a total eligibility set (TES) (citations: [8]). The TES is an extension of the syntactic eligibility set (SES). The SES models the producer-consumer relationship of joins and base relations; any relations contained in the SES of a join must be present in the join's input. For example, take the following query:

SELECT *
FROM xy
LEFT JOIN (SELECT * FROM ab INNER JOIN uv ON a = u)
ON x = u

The SES for the left join will contain relations xy and uv because both are referenced by the join's predicate. Therefore, both must be in the input of this join for any ordering to be valid.

The TES of an edge is initialized with the SES, and then expanded during execution of the CD-C algorithm (see citations: [8] section 5.4). For each 'child' join under the current join, associative, left-asscom and right-asscom properties are provided by lookup tables (the asscom properties are derived from a combination of association and commutation). Depending on these properties, the TES will be expanded to take into account whether certain transformations of the join tree are valid. During execution of the DPSube algorithm, the TES is used to decide whether a given edge can be used to construct a new join operator.

Consider the following (invalid) reordering of the above example):

SELECT *
FROM ab
INNER JOIN (SELECT * FROM xy LEFT JOIN uv ON x = u)
ON a = u

The left join's TES will include relations xy and uv because they are in the SES. The TES will also contain ab because the right-asscom property does not hold for a left join and an inner join. Violation of the right-asscom property in this context means that the xy and ab relations cannot switch places. Because all relations in the join's TES must be a part of its inputs, ab cannot be pulled out of the left join. This prevents the invalid plan from being considered.

In addition to the TES, 'conflict rules' are also required to detect invalid plans. For details, see the methods: calculateTES, addJoins, checkNonInnerJoin, and checkInnerJoin.

Special handling of inner joins -------------------------------

In general, a join's ON condition must be treated as a single entity, because join filter conjuncts cannot usually be pulled out of (or pushed down from) the ON condition. However, this restriction can be relaxed for inner joins because inner join trees have a unique property: they can be modeled as a series of cross joins followed by a series of selects with the inner join conjuncts. This allows inner join conjuncts to be treated as 'detached' from their original operator, free to be combined with conjuncts from other inner joins. For example, take this query:

SELECT *
FROM (SELECT * FROM xy INNER JOIN ab ON x = a)
INNER JOIN uv ON x = u AND a = u

Treating the ON conditions of these joins as a conglomerate (as we do with non-inner joins), a join between base relations xy and uv would not be possible, because the a = u conjunct requires that the ab base relation also be under that edge. However, creating separate edges for each inner join conjunct solves this problem, allowing a reordering like the following (the ab and uv relations are switched, along with the filters):

SELECT *
FROM (SELECT * FROM xy INNER JOIN uv ON x = u)
INNER JOIN ab ON x = a AND a = u

In fact, this idea can be taken even further. Take this query as an example:

SELECT *
FROM xy
INNER JOIN (SELECT * FROM ab LEFT JOIN uv ON b = v)
ON x = a AND (y = u OR u IS NULL)

The following is a valid reformulation:

SELECT *
FROM (SELECT * FROM xy INNER JOIN ab ON x = a)
LEFT JOIN uv ON b = v
WHERE y = u OR u IS NULL

Notice the new Select operation that now carries the inner join conjunct that references the right side of the left join. We can model the process that leads to this reformulation as follows:

1. The inner join is rewritten as a cross join and two selects, each
   carrying a conjunct: (x = a) for one and (y = u OR u IS NULL) for the
   other.
2. The Select operators are pulled above the inner join.
3. The left join and inner join are reordered according to the associative
   property (see citations: [8] table 2).
4. Finally, the inner join conjuncts are pushed back down the reordered
   join tree as far as possible. The x = a conjunct can be pushed to the
   inner join, but the (y = u OR u IS NULL) conjunct must remain on the
   Select.

JoinOrderBuilder is able to effect this transformation (though it is not accomplished in so many steps).

Note that it would be correct to handle inner joins in the same way as non-inner joins, by never splitting up predicates. However, this would diminish the search space for plans involving inner joins, and in many cases prevent the best plan from being found. It is for this reason that inner joins are handled separately.

Also note that this approach to handling inner joins is not discussed in [8]. Rather, it is an extension of the ideas of [8] motivated by the fact that the best plans for many real-world queries require inner joins to be handled in this way.

Transitive closure ------------------

Treating inner join conjuncts as separate edges allows yet another addition: we can add new edges that are implied by the transitive closure of the inner join edges. For example, take this query:

SELECT * FROM xy
INNER JOIN ab ON x = a
INNER JOIN uv ON a = u

The two edges x = a and a = u are explicit in this join tree. However, there is the additional implicit edge x = u which can be added to the join graph. Adding this edge allows xy to be joined to uv without introducing a cross join. This step of ensuring transitive closure is often crucial to finding the best plan; for example, the plan for TPC-H query 9 is much slower without it (try commenting out the call to ensureClosure()).

Citations: [8]

func (*JoinOrderBuilder) Init Uses

func (jb *JoinOrderBuilder) Init(f *norm.Factory, evalCtx *tree.EvalContext)

Init initializes a new JoinOrderBuilder with the given factory. The join graph is reset, so a JoinOrderBuilder can be reused. Callback functions are not reset.

func (*JoinOrderBuilder) NotifyOnAddJoin Uses

func (jb *JoinOrderBuilder) NotifyOnAddJoin(onAddJoin OnAddJoinFunc)

NotifyOnAddJoin sets a callback function that is called when a join is added to the memo.

func (*JoinOrderBuilder) NotifyOnReorder Uses

func (jb *JoinOrderBuilder) NotifyOnReorder(reorderFunc OnReorderFunc)

NotifyOnReorder sets a callback function that is called when a join is passed to JoinOrderBuilder to be reordered.

func (*JoinOrderBuilder) Reorder Uses

func (jb *JoinOrderBuilder) Reorder(join memo.RelExpr)

Reorder adds all valid orderings of the given join to the memo.

type MatchedRuleFunc Uses

type MatchedRuleFunc = norm.MatchedRuleFunc

MatchedRuleFunc defines the callback function for the NotifyOnMatchedRule event supported by the optimizer. See the comment in factory.go for more details.

type OnAddJoinFunc Uses

type OnAddJoinFunc func(left, right, all, refs []memo.RelExpr, op opt.Operator)

OnAddJoinFunc defines the callback function for the NotifyOnAddJoin event supported by JoinOrderBuilder. OnAddJoinFunc is called when JoinOrderBuilder attempts to add a join to the memo via addJoin. The callback parameters give the base relations of the left and right inputs of the join, the set of all base relations currently being considered, the base relations referenced by the join's ON condition, and the type of join.

type OnReorderFunc Uses

type OnReorderFunc func(
    join memo.RelExpr,
    vertexes []memo.RelExpr,
    edges []memo.FiltersExpr,
    edgeOps []opt.Operator,
)

OnReorderFunc defines the callback function for the NotifyOnReorder event supported by the optimizer and factory. OnReorderFunc is called once the join graph has been assembled. The callback parameters give the root join as well as the vertexes and edges of the join graph.

type Optimizer Uses

type Optimizer struct {
    // contains filtered or unexported fields
}

Optimizer transforms an input expression tree into the logically equivalent output expression tree with the lowest possible execution cost.

To use the optimizer, construct an input expression tree by invoking construction methods on the Optimizer.Factory instance. The factory transforms the input expression into its canonical form as part of construction. Pass the root of the tree constructed by the factory to the Optimize method, along with a set of required physical properties that the expression must provide. The optimizer will return an Expr over the output expression tree with the lowest cost.

func (*Optimizer) Coster Uses

func (o *Optimizer) Coster() Coster

Coster returns the coster instance that the optimizer is currently using to estimate the cost of executing portions of the expression tree. When a new optimizer is constructed, it creates a default coster that will be used unless it is overridden with a call to SetCoster.

func (*Optimizer) DetachMemo Uses

func (o *Optimizer) DetachMemo() *memo.Memo

DetachMemo extracts the memo from the optimizer, and then re-initializes the optimizer so that its reuse will not impact the detached memo. This method is used to extract a read-only memo during the PREPARE phase.

func (*Optimizer) DisableOptimizations Uses

func (o *Optimizer) DisableOptimizations()

DisableOptimizations disables all transformation rules, including normalize and explore rules. The unaltered input expression tree becomes the output expression tree (because no transforms are applied).

func (*Optimizer) Factory Uses

func (o *Optimizer) Factory() *norm.Factory

Factory returns a factory interface that the caller uses to construct an input expression tree. The root of the resulting tree can be passed to the Optimize method in order to find the lowest cost plan.

func (*Optimizer) FormatExpr Uses

func (o *Optimizer) FormatExpr(e opt.Expr, flags memo.ExprFmtFlags) string

FormatExpr is a convenience wrapper for memo.FormatExpr.

func (*Optimizer) FormatMemo Uses

func (o *Optimizer) FormatMemo(flags FmtFlags) string

FormatMemo returns a string representation of the memo for testing and debugging. The given flags control which properties are shown.

func (*Optimizer) Init Uses

func (o *Optimizer) Init(evalCtx *tree.EvalContext, catalog cat.Catalog)

Init initializes the Optimizer with a new, blank memo structure inside. This must be called before the optimizer can be used (or reused).

func (*Optimizer) JoinOrderBuilder Uses

func (o *Optimizer) JoinOrderBuilder() *JoinOrderBuilder

JoinOrderBuilder returns the JoinOrderBuilder instance that the optimizer is currently using to reorder join trees.

func (*Optimizer) Memo Uses

func (o *Optimizer) Memo() *memo.Memo

Memo returns the memo structure that the optimizer is using to optimize.

func (*Optimizer) NotifyOnAppliedRule Uses

func (o *Optimizer) NotifyOnAppliedRule(appliedRule AppliedRuleFunc)

NotifyOnAppliedRule sets a callback function which is invoked each time an optimization rule (Normalize or Explore) has been applied by the optimizer. If appliedRule is nil, then no further notifications are sent.

func (*Optimizer) NotifyOnMatchedRule Uses

func (o *Optimizer) NotifyOnMatchedRule(matchedRule MatchedRuleFunc)

NotifyOnMatchedRule sets a callback function which is invoked each time an optimization rule (Normalize or Explore) has been matched by the optimizer. If matchedRule is nil, then no notifications are sent, and all rules are applied by default. In addition, callers can invoke the DisableOptimizations convenience method to disable all rules.

func (*Optimizer) Optimize Uses

func (o *Optimizer) Optimize() (_ opt.Expr, err error)

Optimize returns the expression which satisfies the required physical properties at the lowest possible execution cost, but is still logically equivalent to the given expression. If there is a cost "tie", then any one of the qualifying lowest cost expressions may be selected by the optimizer.

func (*Optimizer) RecomputeCost Uses

func (o *Optimizer) RecomputeCost()

RecomputeCost recomputes the cost of each expression in the lowest cost tree. It should be used in combination with the perturb-cost OptTester flag in order to update the query plan tree after optimization is complete with the real computed cost, not the perturbed cost.

func (*Optimizer) SetCoster Uses

func (o *Optimizer) SetCoster(coster Coster)

SetCoster overrides the default coster. The optimizer will now use the given coster to estimate the cost of expression execution.

func (*Optimizer) String Uses

func (o *Optimizer) String() string

type RuleSet Uses

type RuleSet = util.FastIntSet

RuleSet efficiently stores an unordered set of RuleNames.

Package xform imports 28 packages (graph) and is imported by 9 packages. Updated 2020-09-22. Refresh now. Tools for package owners.