luci: go.chromium.org/luci/appengine/mapper/splitter Index | Files

package splitter

import "go.chromium.org/luci/appengine/mapper/splitter"

Package splitter implements SplitIntoRanges function useful when splitting large datastore queries into a bunch of smaller queries with approximately evenly-sized result sets.

It is based on __scatter__ magical property. For more info see: https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/ScatterPropertyImplementation

Index

Package Files

split.go

type Params Uses

type Params struct {
    // Shards is maximum number of key ranges to return.
    //
    // Should be >=1. The function may return fewer key ranges if the query has
    // very few results. In the most extreme case it can return one shard that
    // covers the entirety of the key space.
    Shards int

    // Samples tells how many random entities to sample when deciding where to
    // split the query.
    //
    // Higher number of samples means better accuracy of the split in exchange for
    // slower execution of SplitIntoRanges. For large number of shards (hundreds),
    // number of samples can be set to number of shards. For small number of
    // shards (tens), it makes sense to sample 16x or even 32x more entities.
    //
    // If Samples is 0, default of 512 will be used. If Shards >= Samples, Shards
    // will be used instead.
    Samples int
}

Params are passed to SplitIntoRanges.

See the doc for SplitIntoRanges for more info.

type Range Uses

type Range struct {
    Start *datastore.Key // if nil, then the range represents (0x000..., End]
    End   *datastore.Key // if nil, then the range represents (Start, 0xfff...)
}

Range represents a range of datastore keys (Start, End].

func SplitIntoRanges Uses

func SplitIntoRanges(c context.Context, q *datastore.Query, p Params) ([]Range, error)

SplitIntoRanges returns a list of key ranges (up to 'Shards') that together cover the results of the provided query.

When all query results are fetched and split between returned ranges, sizes of resulting buckets are approximately even.

Internally uses magical entity property __scatter__. It is set on ~0.8% of datastore entities. Querying a bunch of entities ordered by __scatter__ returns a pseudorandom sample of entities that match the query. To improve chances of a more even split, we query 'Samples' entities, and then pick the split points evenly among them.

If the given query has filters, SplitIntoRanges may need a corresponding composite index that includes __scatter__ field.

May return fewer ranges than requested if it detects there are too few entities. In extreme case may return a single range (000..., fff...) represented by Range struct with 'Start' and 'End' both set to nil.

func (Range) Apply Uses

func (r Range) Apply(q *datastore.Query) *datastore.Query

Apply adds >Start and <=End filters to the query and returns the resulting query.

func (Range) IsEmpty Uses

func (r Range) IsEmpty() bool

IsEmpty is true if the range represents an empty set.

Package splitter imports 4 packages (graph) and is imported by 1 packages. Updated 2018-11-15. Refresh now. Tools for package owners.