inverted

package
v0.23.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2024 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidLengthSpanExpression        = fmt.Errorf("proto: negative length found during unmarshaling")
	ErrIntOverflowSpanExpression          = fmt.Errorf("proto: integer overflow")
	ErrUnexpectedEndOfGroupSpanExpression = fmt.Errorf("proto: unexpected end of group")
)
View Source
var SetOperator_name = map[int32]string{
	0: "None",
	1: "SetUnion",
	2: "SetIntersection",
}
View Source
var SetOperator_value = map[string]int32{
	"None":            0,
	"SetUnion":        1,
	"SetIntersection": 2,
}

Functions

This section is empty.

Types

type EncVal

type EncVal []byte

EncVal is the encoded form of a value in the inverted column. This library does not care about how the value is encoded. The following encoding comment is only relevant for integration purposes, and to justify the use of an encoded form.

If the inverted column stores an encoded datum, the encoding is DatumEncoding_ASCENDING_KEY, and is performed using keyside.Encode(nil /* prefix */, val tree.Datum, encoding.Ascending). It is used to represent spans of the inverted column.

It would be ideal if the inverted column only contained Datums, since we could then work with a Datum here. However, JSON breaks that approach:

  • JSON inverted columns use a custom encoding that uses a special byte jsonInvertedIndex, followed by the bytes produced by the various implementations of the encodeInvertedIndexKey() method in the JSON interface. This could be worked around by using a JSON datum that represents a single path as the start key of the span, and representing [start, start] spans. We would special case the encoding logic to recognize that it is dealing with JSON (we have similar special path code for JSON elsewhere). But this is insufficient (next bullet).
  • Expressions like x ? 'b' don't have operands that are JSON, but can be represented using a span on the inverted column.

So we make it the job of the caller of this library to encode the inverted column. Note that the second bullet above has some similarities with the behavior in makeStringPrefixSpan(), except there we can represent the start and end keys using the string type.

type Expression

type Expression interface {
	// IsTight returns whether the inverted expression is tight, i.e., will the
	// original expression not need to be reevaluated on each row output by the
	// query evaluation over the inverted index.
	IsTight() bool
	// SetNotTight sets tight to false.
	SetNotTight()
	// Copy makes a copy of the inverted expression.
	Copy() Expression
}

Expression is the interface representing an expression or sub-expression to be evaluated on the inverted index. Any implementation can be used in the builder functions And() and Or(), but in practice there are two useful implementations provided here:

  • SpanExpression: this is the normal expression representing unions and intersections over spans of the inverted index. A SpanExpression is the root of an expression tree containing other SpanExpressions (there is one exception when a SpanExpression tree can contain non-SpanExpressions, discussed below for Joins).
  • NonInvertedColExpression: this is a marker expression representing the universal span, due to it being an expression on the non inverted column. This only appears in expression trees with a single node, since Anding with such an expression simply changes the tightness to false and Oring with this expression replaces the other expression with a NonInvertedColExpression.

Optimizer cost estimation

There are two cases:

  • Single table expression: after generating the Expression, the optimizer will check that it is a *SpanExpression -- if not, it is a NonInvertedColExpression, which implies a full inverted index scan, and it is definitely not worth using the inverted index. There are two costs for using the inverted index:

  • The scan cost: this should be estimated by using SpanExpression.SpansToRead.

  • The cardinality of the output set after evaluating the expression: this requires a traversal of the expression to assign cardinality to the spans in each FactoredUnionSpans (this could be done using a mean, or using histograms). The cardinality of a SpanExpression is the cardinality of the union of its FactoredUnionSpans and the intersection of its left and right expressions. If the cardinality of the original table is C (i.e., the number of primary keys), and we have two subsets of cardinality C1 and C2, we can assume that each set itself is a drawing without replacement from the original table. This can be used to derive the expected cardinality of the union of the two sets and the intersection of the two sets.

  • Join expression: Assigning a cost is hard since there are two parameters, corresponding to the left and right columns. In some cases, like Geospatial, the expression that could be generated is a black-box to the optimizer since the quad-tree traversal is unknown until partial application (when one of the parameters is known). Minimally, we do need to know whether the user expression is going to cause a full inverted index scan due to parts of the expression referring to non-inverted columns. The optimizer will provide its own placeholder implementation of Expression into which it can embed whatever information it wants. Let's call this the UnknownExpression -- it will only exist at the leaves of the expression tree. It will use this UnknownExpression whenever there is an expression involving both the inverted columns. If the final expression is a NonInvertedColExpression, it is definitely not worth using the inverted index. If the final expression is an UnknownExpression (the tree must be a single node) or a *SpanExpression, the optimizer could either conjure up some magic cost number or try to compose one using costs assigned to each span (as described in the previous bullet) and to each leaf-level UnknownExpression.

Query evaluation

There are two cases:

  • Single table expression: The optimizer will convert the *SpanExpression into a form that is passed to the evaluation machinery, which can recreate the *SpanExpression and evaluate it. The optimizer will have constructed the spans for the evaluation using SpanExpression.SpansToRead, so the expression evaluating code does not need to concern itself with the spans to be read. e.g. the query was of the form ... WHERE x <@ '{"a":1, "b":2}'::json The optimizer constructs a *SpanExpression, and
  • uses the serialization of the *SpanExpression as the spec for a processor that will evaluate the expression.
  • uses the SpanExpression.SpansToRead to specify the inverted index spans that must be read and fed to the processor.
  • Join expression: The optimizer had an expression tree with the root as a *SpanExpression or an UnknownExpression. Therefore it knows that after partial application the expression will be a *SpanExpression. It passes the inverted expression with two unknowns, as a string, to the join execution machinery. The optimizer provides a way to do partial application for each input row, and returns a *SpanExpression, which is evaluated on the inverted index. e.g. the join query was of the form ... ON t1.x <@ t2.y OR (t1.x @> t2.y AND t2.y @> '{"a":1, "b":2}'::json) and the optimizer decides to use the inverted index on t2.y. The optimizer passes an expression string with two unknowns in the InvertedJoinerSpec, where @1 represents t1.x and @2 represents t2.y. For each input row of t1 the inverted join processor asks the optimizer to apply the value of @1 and return a *SpanExpression, which the join processor will evaluate on the inverted index.

func And

func And(left, right Expression) Expression

And of two boolean expressions. This function may modify both the left and right Expressions.

func Or

func Or(left, right Expression) Expression

Or of two boolean expressions. This function may modify both the left and right Expressions.

type NonInvertedColExpression

type NonInvertedColExpression struct{}

NonInvertedColExpression is an expression to use for parts of the user expression that do not involve the inverted index.

func (NonInvertedColExpression) Copy

Copy implements the Expression interface.

func (NonInvertedColExpression) IsTight

func (n NonInvertedColExpression) IsTight() bool

IsTight implements the Expression interface.

func (NonInvertedColExpression) SetNotTight

func (n NonInvertedColExpression) SetNotTight()

SetNotTight implements the Expression interface.

type SetOperator

type SetOperator int32

SetOperator is an operator on sets.

const (
	// None is used in an expression node with no children.
	None SetOperator = 0
	// SetUnion unions the children.
	SetUnion SetOperator = 1
	// SetIntersection intersects the children.
	SetIntersection SetOperator = 2
)

func (SetOperator) EnumDescriptor

func (SetOperator) EnumDescriptor() ([]byte, []int)

func (SetOperator) String

func (x SetOperator) String() string

type Span

type Span struct {
	Start, End EncVal
}

Span is a span of the inverted index. Represents [start, end).

func MakeSingleValSpan

func MakeSingleValSpan(val EncVal) Span

MakeSingleValSpan constructs a span equivalent to [val, val].

func (Span) ContainsKey

func (s Span) ContainsKey(key EncVal) bool

ContainsKey returns whether the span contains the given key.

func (Span) Equals

func (s Span) Equals(other Span) bool

Equals returns true if this span has the same start and end as the given span.

func (Span) IsSingleVal

func (s Span) IsSingleVal() bool

IsSingleVal returns true iff the span is equivalent to [val, val].

type SpanExpression

type SpanExpression struct {
	// Tight mirrors the definition of IsTight().
	Tight bool

	// Unique is true if the spans in FactoredUnionSpans are guaranteed not to
	// produce duplicate primary keys. Otherwise, Unique is false. Unique may
	// be true for certain JSON or Array SpanExpressions, and it holds when
	// unique SpanExpressions are combined with And. It does not hold when
	// these SpanExpressions are combined with Or.
	//
	// Once a SpanExpression is built, this field is relevant if the root
	// SpanExpression has no children (i.e., Operator is None). In this case,
	// Unique is used to determine whether an invertedFilter is needed on top
	// of the inverted index scan to deduplicate keys (an invertedFilter is
	// always necessary if Operator is not None).
	Unique bool

	// SpansToRead are the spans to read from the inverted index
	// to evaluate this SpanExpression. These are non-overlapping
	// and sorted. If left or right contains a non-SpanExpression,
	// it is not included in the spanning union.
	// To illustrate, consider a made up example:
	// [2, 10) \intersection [6, 14)
	// is factored into:
	// [6, 10) \union ([2, 6) \intersection [10, 14))
	// The root expression has a spanning union of [2, 14).
	SpansToRead Spans

	// FactoredUnionSpans are the spans to be unioned. These are
	// non-overlapping and sorted. As mentioned earlier, factoring
	// can result in faster evaluation and can be useful for
	// optimizer cost estimation.
	//
	// Using the same example, the FactoredUnionSpans will be
	// [6, 10). Now let's extend the above example and say that
	// it was just a sub-expression in a bigger expression, and
	// the full expression involved an intersection of that
	// sub-expression and [5, 8). After factoring, we would get
	// [6, 8) \union ([5, 6) \intersection ([8, 10) \union ([2, 6) \intersection [10, 14))))
	// The top-level expression has FactoredUnionSpans [6, 8), and the left and
	// right children have factoredUnionSpans [5, 6) and [8, 10) respectively.
	// The SpansToRead of this top-level expression is still [2, 14) since the
	// intersection with [5, 8) did not add anything to the spans to read. Also
	// note that, despite factoring, there are overlapping spans in this
	// expression, specifically [2, 6) and [5, 6).
	FactoredUnionSpans Spans

	// Operator is the set operation to apply to Left and Right.
	// When this is union or intersection, both Left and Right are non-nil,
	// else both are nil.
	Operator SetOperator
	Left     Expression
	Right    Expression
}

SpanExpression is an implementation of Expression.

TODO(sumeer): after integration and experimentation with optimizer costing, decide if we can eliminate the generality of the Expression interface. If we don't need that generality, we can merge SpanExpression and SpanExpressionProto.

func ExprForSpan

func ExprForSpan(span Span, tight bool) *SpanExpression

ExprForSpan constructs a leaf-level SpanExpression for an inverted expression. Note that these leaf-level expressions may also have tight = false. Geospatial functions are all non-tight.

For JSON, expressions like x <@ '{"a":1, "b":2}'::json will have tight = false. Say SpanA, SpanB correspond to "a":1 and "b":2 respectively). A tight expression would require the following set evaluation: Set(SpanA) \union Set(SpanB) - Set(ComplementSpan(SpanA \spanunion SpanB)) where ComplementSpan(X) is everything in the inverted index except for X. Since ComplementSpan(SpanA \spanunion SpanB) is likely to be very wide when SpanA and SpanB are narrow, or vice versa, this tight expression would be very costly to evaluate.

func (*SpanExpression) ContainsKeys

func (s *SpanExpression) ContainsKeys(keys [][]byte) (bool, error)

ContainsKeys traverses the SpanExpression to determine whether the span expression contains the given keys. It is primarily used for testing.

func (*SpanExpression) Copy

func (s *SpanExpression) Copy() Expression

Copy implements the Expression interface.

Copy makes a copy of the SpanExpression and returns it. Copy recurses into the children and makes copies of them as well, so the new struct is independent from the old. It does *not* perform a deep copy of the SpansToRead or FactoredUnionSpans slices, however, because those slices are never modified in place and therefore are safe to reuse.

func (*SpanExpression) Format

func (s *SpanExpression) Format(tp treeprinter.Node, includeSpansToRead, redactable bool)

Format pretty-prints the SpanExpression.

func (*SpanExpression) IsTight

func (s *SpanExpression) IsTight() bool

IsTight implements the Expression interface.

func (*SpanExpression) SetNotTight

func (s *SpanExpression) SetNotTight()

SetNotTight implements the Expression interface.

func (*SpanExpression) String

func (s *SpanExpression) String() string

func (*SpanExpression) ToProto

func (s *SpanExpression) ToProto() *SpanExpressionProto

ToProto constructs a SpanExpressionProto for execution. It should be called on an expression tree that contains only *SpanExpressions.

type SpanExpressionProto

type SpanExpressionProto struct {
	SpansToRead []SpanExpressionProto_Span `protobuf:"bytes,1,rep,name=spans_to_read,json=spansToRead,proto3" json:"spans_to_read"`
	Node        SpanExpressionProto_Node   `protobuf:"bytes,2,opt,name=node,proto3" json:"node"`
}

SpanExpressionProto is a proto representation of an inverted.Expression tree consisting only of SpanExpressions. It is intended for use in expression execution.

func (*SpanExpressionProto) Descriptor

func (*SpanExpressionProto) Descriptor() ([]byte, []int)

func (*SpanExpressionProto) Marshal

func (m *SpanExpressionProto) Marshal() (dAtA []byte, err error)

func (*SpanExpressionProto) MarshalTo

func (m *SpanExpressionProto) MarshalTo(dAtA []byte) (int, error)

func (*SpanExpressionProto) MarshalToSizedBuffer

func (m *SpanExpressionProto) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*SpanExpressionProto) ProtoMessage

func (*SpanExpressionProto) ProtoMessage()

func (*SpanExpressionProto) Reset

func (m *SpanExpressionProto) Reset()

func (*SpanExpressionProto) Size

func (m *SpanExpressionProto) Size() (n int)

func (*SpanExpressionProto) String

func (m *SpanExpressionProto) String() string

func (*SpanExpressionProto) Unmarshal

func (m *SpanExpressionProto) Unmarshal(dAtA []byte) error

func (*SpanExpressionProto) XXX_DiscardUnknown

func (m *SpanExpressionProto) XXX_DiscardUnknown()

func (*SpanExpressionProto) XXX_Marshal

func (m *SpanExpressionProto) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*SpanExpressionProto) XXX_Merge

func (m *SpanExpressionProto) XXX_Merge(src proto.Message)

func (*SpanExpressionProto) XXX_Size

func (m *SpanExpressionProto) XXX_Size() int

func (*SpanExpressionProto) XXX_Unmarshal

func (m *SpanExpressionProto) XXX_Unmarshal(b []byte) error

type SpanExpressionProtoSpans

type SpanExpressionProtoSpans []SpanExpressionProto_Span

SpanExpressionProtoSpans is a slice of SpanExpressionProto_Span.

func (SpanExpressionProtoSpans) End

func (s SpanExpressionProtoSpans) End(i int) []byte

End implements the span.InvertedSpans interface.

func (SpanExpressionProtoSpans) Len

func (s SpanExpressionProtoSpans) Len() int

Len implements the span.InvertedSpans interface.

func (SpanExpressionProtoSpans) Start

func (s SpanExpressionProtoSpans) Start(i int) []byte

Start implements the span.InvertedSpans interface.

type SpanExpressionProto_Node

type SpanExpressionProto_Node struct {
	FactoredUnionSpans []SpanExpressionProto_Span `protobuf:"bytes,1,rep,name=factored_union_spans,json=factoredUnionSpans,proto3" json:"factored_union_spans"`
	Operator           SetOperator                `protobuf:"varint,2,opt,name=operator,proto3,enum=cockroach.sql.inverted.SetOperator" json:"operator,omitempty"`
	Left               *SpanExpressionProto_Node  `protobuf:"bytes,3,opt,name=left,proto3" json:"left,omitempty"`
	Right              *SpanExpressionProto_Node  `protobuf:"bytes,4,opt,name=right,proto3" json:"right,omitempty"`
}

func (*SpanExpressionProto_Node) Descriptor

func (*SpanExpressionProto_Node) Descriptor() ([]byte, []int)

func (*SpanExpressionProto_Node) Marshal

func (m *SpanExpressionProto_Node) Marshal() (dAtA []byte, err error)

func (*SpanExpressionProto_Node) MarshalTo

func (m *SpanExpressionProto_Node) MarshalTo(dAtA []byte) (int, error)

func (*SpanExpressionProto_Node) MarshalToSizedBuffer

func (m *SpanExpressionProto_Node) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*SpanExpressionProto_Node) ProtoMessage

func (*SpanExpressionProto_Node) ProtoMessage()

func (*SpanExpressionProto_Node) Reset

func (m *SpanExpressionProto_Node) Reset()

func (*SpanExpressionProto_Node) Size

func (m *SpanExpressionProto_Node) Size() (n int)

func (*SpanExpressionProto_Node) String

func (m *SpanExpressionProto_Node) String() string

func (*SpanExpressionProto_Node) Unmarshal

func (m *SpanExpressionProto_Node) Unmarshal(dAtA []byte) error

func (*SpanExpressionProto_Node) XXX_DiscardUnknown

func (m *SpanExpressionProto_Node) XXX_DiscardUnknown()

func (*SpanExpressionProto_Node) XXX_Marshal

func (m *SpanExpressionProto_Node) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*SpanExpressionProto_Node) XXX_Merge

func (m *SpanExpressionProto_Node) XXX_Merge(src proto.Message)

func (*SpanExpressionProto_Node) XXX_Size

func (m *SpanExpressionProto_Node) XXX_Size() int

func (*SpanExpressionProto_Node) XXX_Unmarshal

func (m *SpanExpressionProto_Node) XXX_Unmarshal(b []byte) error

type SpanExpressionProto_Span

type SpanExpressionProto_Span struct {
	Start []byte `protobuf:"bytes,1,opt,name=start,proto3" json:"start,omitempty"`
	End   []byte `protobuf:"bytes,2,opt,name=end,proto3" json:"end,omitempty"`
}

Span is a span of the inverted index. Represents [start, end).

func (*SpanExpressionProto_Span) Descriptor

func (*SpanExpressionProto_Span) Descriptor() ([]byte, []int)

func (*SpanExpressionProto_Span) Marshal

func (m *SpanExpressionProto_Span) Marshal() (dAtA []byte, err error)

func (*SpanExpressionProto_Span) MarshalTo

func (m *SpanExpressionProto_Span) MarshalTo(dAtA []byte) (int, error)

func (*SpanExpressionProto_Span) MarshalToSizedBuffer

func (m *SpanExpressionProto_Span) MarshalToSizedBuffer(dAtA []byte) (int, error)

func (*SpanExpressionProto_Span) ProtoMessage

func (*SpanExpressionProto_Span) ProtoMessage()

func (*SpanExpressionProto_Span) Reset

func (m *SpanExpressionProto_Span) Reset()

func (*SpanExpressionProto_Span) Size

func (m *SpanExpressionProto_Span) Size() (n int)

func (*SpanExpressionProto_Span) String

func (m *SpanExpressionProto_Span) String() string

func (*SpanExpressionProto_Span) Unmarshal

func (m *SpanExpressionProto_Span) Unmarshal(dAtA []byte) error

func (*SpanExpressionProto_Span) XXX_DiscardUnknown

func (m *SpanExpressionProto_Span) XXX_DiscardUnknown()

func (*SpanExpressionProto_Span) XXX_Marshal

func (m *SpanExpressionProto_Span) XXX_Marshal(b []byte, deterministic bool) ([]byte, error)

func (*SpanExpressionProto_Span) XXX_Merge

func (m *SpanExpressionProto_Span) XXX_Merge(src proto.Message)

func (*SpanExpressionProto_Span) XXX_Size

func (m *SpanExpressionProto_Span) XXX_Size() int

func (*SpanExpressionProto_Span) XXX_Unmarshal

func (m *SpanExpressionProto_Span) XXX_Unmarshal(b []byte) error

type Spans

type Spans []Span

Spans is a slice of Span objects.

func (Spans) End

func (is Spans) End(i int) []byte

End implements the span.KeyableInvertedSpans interface.

func (Spans) Equals

func (is Spans) Equals(other Spans) bool

Equals returns true if this Spans has the same spans as the given Spans, in the same order.

func (Spans) Format

func (is Spans) Format(tp treeprinter.Node, label string, redactable bool)

Format pretty-prints the spans.

func (Spans) Len

func (is Spans) Len() int

Len implements sort.Interface.

func (Spans) Less

func (is Spans) Less(i, j int) bool

Less implements sort.Interface, when Spans is known to contain non-overlapping spans.

func (Spans) Start

func (is Spans) Start(i int) []byte

Start implements the span.KeyableInvertedSpans interface.

func (Spans) Swap

func (is Spans) Swap(i, j int)

Swap implements the sort.Interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL