cockroach: github.com/cockroachdb/cockroach/pkg/sql/opt/invertedexpr Index | Files

package invertedexpr

import "github.com/cockroachdb/cockroach/pkg/sql/opt/invertedexpr"

Index

Package Files

expression.go geo_expression.go span_expression.pb.go

Variables

var (
    ErrInvalidLengthSpanExpression = fmt.Errorf("proto: negative length found during unmarshaling")
    ErrIntOverflowSpanExpression   = fmt.Errorf("proto: integer overflow")
)
var SetOperator_name = map[int32]string{
    0:  "None",
    1:  "SetUnion",
    2:  "SetIntersection",
}
var SetOperator_value = map[string]int32{
    "None":            0,
    "SetUnion":        1,
    "SetIntersection": 2,
}

type DatumToInvertedExpr Uses

type DatumToInvertedExpr interface {
    // Convert uses the lookup column to construct an inverted expression.
    Convert(context.Context, sqlbase.EncDatum) (*SpanExpressionProto, error)
}

DatumToInvertedExpr is an interface that is used by the rowexec.invertedJoiner to extract a SpanExpressionProto given an input row. The rowexec.invertedJoiner calls Convert and uses the resulting SpanExpressionProto.SpansToRead to determine which spans to read from the inverted index. Then it computes a set expression on the scanned rows as defined by the SpanExpressionProto.Node.

type EncInvertedVal Uses

type EncInvertedVal []byte

EncInvertedVal is the encoded form of a value in the inverted column. This library does not care about how the value is encoded. The following encoding comment is only relevant for integration purposes, and to justify the use of an encoded form.

If the inverted column stores an encoded datum, the encoding is DatumEncoding_ASCENDING_KEY, and is performed using EncodeTableKey(nil /* prefix */, val tree.Datum, encoding.Ascending). It is used to represent spans of the inverted column.

It would be ideal if the inverted column only contained Datums, since we could then work with a Datum here. However, JSON breaks that approach: - JSON inverted columns use a custom encoding that uses a special byte

jsonInvertedIndex, followed by the bytes produced by the various
implementations of the encodeInvertedIndexKey() method in the JSON
interface. This could be worked around by using a JSON datum that
represents a single path as the start key of the span, and representing
[start, start] spans. We would special case the encoding logic to
recognize that it is dealing with JSON (we have similar special path code
for JSON elsewhere). But this is insufficient (next bullet).

- Expressions like x ? 'b' don't have operands that are JSON, but can be

represented using a span on the inverted column.

So we make it the job of the caller of this library to encode the inverted column. Note that the second bullet above has some similarities with the behavior in makeStringPrefixSpan(), except there we can represent the start and end keys using the string type.

type InvertedExpression Uses

type InvertedExpression interface {
    // IsTight returns whether the inverted expression is tight, i.e., will the
    // original expression not need to be reevaluated on each row output by the
    // query evaluation over the inverted index.
    IsTight() bool
    // SetNotTight sets tight to false.
    SetNotTight()
}

InvertedExpression is the interface representing an expression or sub-expression to be evaluated on the inverted index. Any implementation can be used in the builder functions And() and Or(), but in practice there are two useful implementations provided here: - SpanExpression: this is the normal expression representing unions and

intersections over spans of the inverted index. A SpanExpression is the
root of an expression tree containing other SpanExpressions (there is one
exception when a SpanExpression tree can contain non-SpanExpressions,
discussed below for Joins).

- NonInvertedColExpression: this is a marker expression representing the universal

span, due to it being an expression on the non inverted column. This only appears in
expression trees with a single node, since Anding with such an expression simply
changes the tightness to false and Oring with this expression replaces the
other expression with a NonInvertedColExpression.

Optimizer cost estimation

There are two cases: - Single table expression: after generating the InvertedExpression, the

optimizer will check that it is a *SpanExpression -- if not, it is a
NonInvertedColExpression, which implies a full inverted index scan, and
it is definitely not worth using the inverted index. There are two costs for
using the inverted index:
- The scan cost: this should be estimated by using SpanExpression.SpansToRead.
- The cardinality of the output set after evaluating the expression: this
  requires a traversal of the expression to assign cardinality to the
  spans in each FactoredUnionSpans (this could be done using a mean,
  or using histograms). The cardinality of a SpanExpression is the
  cardinality of the union of its FactoredUnionSpans and the intersection
  of its left and right expressions. If the cardinality of the original
  table is C (i.e., the number of primary keys), and we have two subsets
  of cardinality C1 and C2, we can assume that each set itself is a
  drawing without replacement from the original table. This can be
  used to derive the expected cardinality of the union of the two sets
  and the intersection of the two sets.

- Join expression: Assigning a cost is hard since there are two

parameters, corresponding to the left and right columns. In some cases,
like Geospatial, the expression that could be generated is a black-box to
the optimizer since the quad-tree traversal is unknown until partial
application (when one of the parameters is known). Minimally, we do need to
know whether the user expression is going to cause a full inverted index
scan due to parts of the expression referring to non-inverted columns.
The optimizer will provide its own placeholder implementation of
InvertedExpression into which it can embed whatever information it wants.
Let's call this the UnknownExpression -- it will only exist at the
leaves of the expression tree. It will use this UnknownExpression
whenever there is an expression involving both the inverted columns. If
the final expression is a NonInvertedColExpression, it is definitely not
worth using the inverted index. If the final expression is an
UnknownExpression (the tree must be a single node) or a *SpanExpression,
the optimizer could either conjure up some magic cost number or try to
compose one using costs assigned to each span (as described in the
previous bullet) and to each leaf-level UnknownExpression.

Query evaluation

There are two cases: - Single table expression: The optimizer will convert the *SpanExpression

into a form that is passed to the evaluation machinery, which can recreate
the *SpanExpression and evaluate it. The optimizer will have constructed
the spans for the evaluation using SpanExpression.SpansToRead, so the
expression evaluating code does not need to concern itself with the spans
to be read.
e.g. the query was of the form ... WHERE x <@ '{"a":1, "b":2}'::json
The optimizer constructs a *SpanExpression, and
- uses the serialization of the *SpanExpression as the spec for a processor
  that will evaluate the expression.
- uses the SpanExpression.SpansToRead to specify the inverted index
  spans that must be read and fed to the processor.

- Join expression: The optimizer had an expression tree with the root as

a *SpanExpression or an UnknownExpression. Therefore it knows that after
partial application the expression will be a *SpanExpression. It passes the
inverted expression with two unknowns, as a string, to the join execution
machinery. The optimizer provides a way to do partial application for each
input row, and returns a *SpanExpression, which is evaluated on the
inverted index.
e.g. the join query was of the form
... ON t1.x <@ t2.y OR (t1.x @> t2.y AND t2.y @> '{"a":1, "b":2}'::json)
and the optimizer decides to use the inverted index on t2.y. The optimizer
passes an expression string with two unknowns in the InvertedJoinerSpec,
where @1 represents t1.x and @2 represents t2.y. For each input row of
t1 the inverted join processor asks the optimizer to apply the value of @1
and return a *SpanExpression, which the join processor will evaluate on
the inverted index.

func And Uses

func And(left, right InvertedExpression) InvertedExpression

And of two boolean expressions.

func Or Uses

func Or(left, right InvertedExpression) InvertedExpression

Or of two boolean expressions.

type InvertedSpan Uses

type InvertedSpan struct {
    Start, End EncInvertedVal
}

InvertedSpan is a span of the inverted index. Represents [start, end).

func MakeSingleInvertedValSpan Uses

func MakeSingleInvertedValSpan(val EncInvertedVal) InvertedSpan

MakeSingleInvertedValSpan constructs a span equivalent to [val, val].

func (InvertedSpan) Equals Uses

func (s InvertedSpan) Equals(other InvertedSpan) bool

Equals returns true if this span has the same start and end as the given span.

type InvertedSpans Uses

type InvertedSpans []InvertedSpan

InvertedSpans is a slice of InvertedSpan objects.

func (InvertedSpans) Equals Uses

func (is InvertedSpans) Equals(other InvertedSpans) bool

Equals returns true if this InvertedSpans has the same spans as the given InvertedSpans, in the same order.

func (InvertedSpans) Format Uses

func (is InvertedSpans) Format(tp treeprinter.Node, label string)

Format pretty-prints the spans.

type NonInvertedColExpression Uses

type NonInvertedColExpression struct{}

NonInvertedColExpression is an expression to use for parts of the user expression that do not involve the inverted index.

func (NonInvertedColExpression) IsTight Uses

func (n NonInvertedColExpression) IsTight() bool

IsTight implements the InvertedExpression interface.

func (NonInvertedColExpression) SetNotTight Uses

func (n NonInvertedColExpression) SetNotTight()

SetNotTight implements the InvertedExpression interface.

type SetOperator Uses

type SetOperator int32

SetOperator is an operator on sets.

const (
    // None is used in an expression node with no children.
    None SetOperator = 0
    // SetUnion unions the children.
    SetUnion SetOperator = 1
    // SetIntersection intersects the children.
    SetIntersection SetOperator = 2
)

func (SetOperator) EnumDescriptor Uses

func (SetOperator) EnumDescriptor() ([]byte, []int)

func (SetOperator) String Uses

func (x SetOperator) String() string

type SpanExpression Uses

type SpanExpression struct {
    // Tight mirrors the definition of IsTight().
    Tight bool

    // SpansToRead are the spans to read from the inverted index
    // to evaluate this SpanExpression. These are non-overlapping
    // and sorted. If left or right contains a non-SpanExpression,
    // it is not included in the spanning union.
    // To illustrate, consider a made up example:
    // [2, 10) \intersection [6, 14)
    // is factored into:
    // [6, 10) \union ([2, 6) \intersection [10, 14))
    // The root expression has a spanning union of [2, 14).
    SpansToRead InvertedSpans

    // FactoredUnionSpans are the spans to be unioned. These are
    // non-overlapping and sorted. As mentioned earlier, factoring
    // can result in faster evaluation and can be useful for
    // optimizer cost estimation.
    //
    // Using the same example, the FactoredUnionSpans will be
    // [6, 10). Now let's extend the above example and say that
    // it was just a sub-expression in a bigger expression, and
    // the full expression involved an intersection of that
    // sub-expression and [5, 8). After factoring, we would get
    // [6, 8) \union ([5, 6) \intersection ([8, 10) \union ([2, 6) \intersection [10, 14))))
    // The top-level expression has FactoredUnionSpans [6, 8), and the left and
    // right children have factoredUnionSpans [5, 6) and [8, 10) respectively.
    // The SpansToRead of this top-level expression is still [2, 14) since the
    // intersection with [5, 8) did not add anything to the spans to read. Also
    // note that, despite factoring, there are overlapping spans in this
    // expression, specifically [2, 6) and [5, 6).
    FactoredUnionSpans InvertedSpans

    // Operator is the set operation to apply to Left and Right.
    // When this is union or intersection, both Left and Right are non-nil,
    // else both are nil.
    Operator SetOperator
    Left     InvertedExpression
    Right    InvertedExpression
}

SpanExpression is an implementation of InvertedExpression.

TODO(sumeer): after integration and experimentation with optimizer costing, decide if we can eliminate the generality of the InvertedExpression interface. If we don't need that generality, we can merge SpanExpression and SpanExpressionProto.

func ExprForInvertedSpan Uses

func ExprForInvertedSpan(span InvertedSpan, tight bool) *SpanExpression

ExprForInvertedSpan constructs a leaf-level SpanExpression for an inverted expression. Note that these leaf-level expressions may also have tight = false. Geospatial functions are all non-tight.

For JSON, expressions like x <@ '{"a":1, "b":2}'::json will have tight = false. Say SpanA, SpanB correspond to "a":1 and "b":2 respectively). A tight expression would require the following set evaluation: Set(SpanA) \union Set(SpanB) - Set(ComplementSpan(SpanA \spanunion SpanB)) where ComplementSpan(X) is everything in the inverted index except for X. Since ComplementSpan(SpanA \spanunion SpanB) is likely to be very wide when SpanA and SpanB are narrow, or vice versa, this tight expression would be very costly to evaluate.

func GeoRPKeyExprToSpanExpr Uses

func GeoRPKeyExprToSpanExpr(rpExpr geoindex.RPKeyExpr) (*SpanExpression, error)

GeoRPKeyExprToSpanExpr converts geoindex.RPKeyExpr to SpanExpression.

func GeoUnionKeySpansToSpanExpr Uses

func GeoUnionKeySpansToSpanExpr(ukSpans geoindex.UnionKeySpans) *SpanExpression

GeoUnionKeySpansToSpanExpr converts geoindex.UnionKeySpans to a SpanExpression.

func (*SpanExpression) Format Uses

func (s *SpanExpression) Format(tp treeprinter.Node, includeSpansToRead bool)

Format pretty-prints the SpanExpression.

func (*SpanExpression) IsTight Uses

func (s *SpanExpression) IsTight() bool

IsTight implements the InvertedExpression interface.

func (*SpanExpression) SetNotTight Uses

func (s *SpanExpression) SetNotTight()

SetNotTight implements the InvertedExpression interface.

func (*SpanExpression) String Uses

func (s *SpanExpression) String() string

func (*SpanExpression) ToProto Uses

func (s *SpanExpression) ToProto() *SpanExpressionProto

ToProto constructs a SpanExpressionProto for execution. It should be called on an expression tree that contains only *SpanExpressions.

type SpanExpressionProto Uses

type SpanExpressionProto struct {
    SpansToRead []SpanExpressionProto_Span `protobuf:"bytes,1,rep,name=spans_to_read,json=spansToRead,proto3" json:"spans_to_read"`
    Node        SpanExpressionProto_Node   `protobuf:"bytes,2,opt,name=node,proto3" json:"node"`
}

SpanExpressionProto is a proto representation of an InvertedExpression tree consisting only of SpanExpressions. It is intended for use in expression execution.

func (*SpanExpressionProto) Descriptor Uses

func (*SpanExpressionProto) Descriptor() ([]byte, []int)

func (*SpanExpressionProto) Marshal Uses

func (m *SpanExpressionProto) Marshal() (dAtA []byte, err error)

func (*SpanExpressionProto) MarshalTo Uses

func (m *SpanExpressionProto) MarshalTo(dAtA []byte) (int, error)

func (*SpanExpressionProto) ProtoMessage Uses

func (*SpanExpressionProto) ProtoMessage()

func (*SpanExpressionProto) Reset Uses

func (m *SpanExpressionProto) Reset()

func (*SpanExpressionProto) Size Uses

func (m *SpanExpressionProto) Size() (n int)

func (*SpanExpressionProto) String Uses

func (m *SpanExpressionProto) String() string

func (*SpanExpressionProto) Unmarshal Uses

func (m *SpanExpressionProto) Unmarshal(dAtA []byte) error

func (*SpanExpressionProto) XXX_DiscardUnknown Uses

func (m *SpanExpressionProto) XXX_DiscardUnknown()

func (*SpanExpressionProto) XXX_Marshal Uses

func (m *SpanExpressionProto) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*SpanExpressionProto) XXX_Merge Uses

func (dst *SpanExpressionProto) XXX_Merge(src proto.Message)

func (*SpanExpressionProto) XXX_Size Uses

func (m *SpanExpressionProto) XXX_Size() int

func (*SpanExpressionProto) XXX_Unmarshal Uses

func (m *SpanExpressionProto) XXX_Unmarshal(b []byte) error

type SpanExpressionProto_Node Uses

type SpanExpressionProto_Node struct {
    FactoredUnionSpans []SpanExpressionProto_Span `protobuf:"bytes,1,rep,name=factored_union_spans,json=factoredUnionSpans,proto3" json:"factored_union_spans"`
    Operator           SetOperator                `protobuf:"varint,2,opt,name=operator,proto3,enum=cockroach.sql.opt.invertedexpr.SetOperator" json:"operator,omitempty"`
    Left               *SpanExpressionProto_Node  `protobuf:"bytes,3,opt,name=left,proto3" json:"left,omitempty"`
    Right              *SpanExpressionProto_Node  `protobuf:"bytes,4,opt,name=right,proto3" json:"right,omitempty"`
}

func (*SpanExpressionProto_Node) Descriptor Uses

func (*SpanExpressionProto_Node) Descriptor() ([]byte, []int)

func (*SpanExpressionProto_Node) Marshal Uses

func (m *SpanExpressionProto_Node) Marshal() (dAtA []byte, err error)

func (*SpanExpressionProto_Node) MarshalTo Uses

func (m *SpanExpressionProto_Node) MarshalTo(dAtA []byte) (int, error)

func (*SpanExpressionProto_Node) ProtoMessage Uses

func (*SpanExpressionProto_Node) ProtoMessage()

func (*SpanExpressionProto_Node) Reset Uses

func (m *SpanExpressionProto_Node) Reset()

func (*SpanExpressionProto_Node) Size Uses

func (m *SpanExpressionProto_Node) Size() (n int)

func (*SpanExpressionProto_Node) String Uses

func (m *SpanExpressionProto_Node) String() string

func (*SpanExpressionProto_Node) Unmarshal Uses

func (m *SpanExpressionProto_Node) Unmarshal(dAtA []byte) error

func (*SpanExpressionProto_Node) XXX_DiscardUnknown Uses

func (m *SpanExpressionProto_Node) XXX_DiscardUnknown()

func (*SpanExpressionProto_Node) XXX_Marshal Uses

func (m *SpanExpressionProto_Node) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*SpanExpressionProto_Node) XXX_Merge Uses

func (dst *SpanExpressionProto_Node) XXX_Merge(src proto.Message)

func (*SpanExpressionProto_Node) XXX_Size Uses

func (m *SpanExpressionProto_Node) XXX_Size() int

func (*SpanExpressionProto_Node) XXX_Unmarshal Uses

func (m *SpanExpressionProto_Node) XXX_Unmarshal(b []byte) error

type SpanExpressionProto_Span Uses

type SpanExpressionProto_Span struct {
    Start []byte `protobuf:"bytes,1,opt,name=start,proto3" json:"start,omitempty"`
    End   []byte `protobuf:"bytes,2,opt,name=end,proto3" json:"end,omitempty"`
}

Span is a span of the inverted index. Represents [start, end).

func (*SpanExpressionProto_Span) Descriptor Uses

func (*SpanExpressionProto_Span) Descriptor() ([]byte, []int)

func (*SpanExpressionProto_Span) Marshal Uses

func (m *SpanExpressionProto_Span) Marshal() (dAtA []byte, err error)

func (*SpanExpressionProto_Span) MarshalTo Uses

func (m *SpanExpressionProto_Span) MarshalTo(dAtA []byte) (int, error)

func (*SpanExpressionProto_Span) ProtoMessage Uses

func (*SpanExpressionProto_Span) ProtoMessage()

func (*SpanExpressionProto_Span) Reset Uses

func (m *SpanExpressionProto_Span) Reset()

func (*SpanExpressionProto_Span) Size Uses

func (m *SpanExpressionProto_Span) Size() (n int)

func (*SpanExpressionProto_Span) String Uses

func (m *SpanExpressionProto_Span) String() string

func (*SpanExpressionProto_Span) Unmarshal Uses

func (m *SpanExpressionProto_Span) Unmarshal(dAtA []byte) error

func (*SpanExpressionProto_Span) XXX_DiscardUnknown Uses

func (m *SpanExpressionProto_Span) XXX_DiscardUnknown()

func (*SpanExpressionProto_Span) XXX_Marshal Uses

func (m *SpanExpressionProto_Span) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*SpanExpressionProto_Span) XXX_Merge Uses

func (dst *SpanExpressionProto_Span) XXX_Merge(src proto.Message)

func (*SpanExpressionProto_Span) XXX_Size Uses

func (m *SpanExpressionProto_Span) XXX_Size() int

func (*SpanExpressionProto_Span) XXX_Unmarshal Uses

func (m *SpanExpressionProto_Span) XXX_Unmarshal(b []byte) error

Package invertedexpr imports 14 packages (graph) and is imported by 8 packages. Updated 2020-07-08. Refresh now. Tools for package owners.