scoredb

package module
v0.0.0-...-57beea0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2016 License: MIT Imports: 24 Imported by: 0

README

scoredb

A simple database index optimized for returning results by custom scoring functions.

To my knowledge, it is the only open source system with an algorithm designed for this purpose; in some cases, it is faster than elasticsearch's implementation by an order of magnitude. (see below)

Why?

Scoredb is optimized for systems that want to find the top scoring results, where the scoring function is specified by the client, and may depend on more than one field. It may be a good choice for any system that needs to incorporate multiple factors when returning results. For instance, it might power a used car website to produce results based on factors like mileage, year, and distance.

Run It

Though Scoredb has a straightforward programatic interface, you can run a simple standalone HTTP server like so:

$ go get github.com/pschanely/scoredb
$ go install github.com/pschanely/scoredb/...
$ ${GOPATH}/bin/scoredb serve -datadir my_data_directory -port 11625

... and in another shell:

# insert some people with ages and weights
$ curl -XPUT http://localhost:11625/jim -d '{"age":21, "weight":170}'
$ curl -XPUT http://localhost:11625/bob -d '{"age":34, "weight":150}'

# get people by age
$ curl -G 'http://localhost:11625' --data-urlencode 'score=["field", "age"]'
{"Ids":["bob","jim"]}

# get people by the sum of their age and weight:
$ curl -G 'http://localhost:11625' --data-urlencode 'score=["sum", ["field", "age"], ["field", "weight"]]'
{"Ids":["jim","bob"]}

The Algorithm

Scoredb uses a format on disk that is very similar to that used by text search systems like solr and elasticsearch. We divide each field into ranges of values (buckets) and, for each bucket, maintain a file containing the IDs of objects that have their value inside that range.

The IDs in each file are strictly increasing; this means that we can traverse several buckets efficiently by using a heap of buckets to find the next smallest id among many buckets.

As we traverse the buckets, we score the objects produced and put them into a candidate result set. The result set is capped at the &limit= parameter specified by the user. As poorly scoring results get kicked out of the candidate result set, we can infer a lower bound on the final score. With some math, we can propagate that lower bound backwards through the scoring function to infer bounds on the individual fields. These bounds may then be used to stop traversing very poorly scoring buckets that could not produce a good enough final score. In this manner, as the candidate result set gets better and better, the system can eliminate more and more buckets to arrive at a result very quickly.

The following graph shows bucket elimination over the course of an example query combining two fields, "age" and "wages":

Performance

Few database systems support custom scoring functions, and fewer (possibly none?) use algorithms designed for that purpose. In practice, I've found elasticsearch's custom scoring functions to be quite fast, so I've benchmarked against it here. Please let me know about other systems I might benchmark against!

This is a graph of how 5 different queries perform with varying database sizes (yellow is elasticsearch and blue is scoredb):

The elasticsearch query times (yellow) look like they're rising exponentially, but it's actually linear because the X-axis has a logarithmic scale.

The dataset is anonymized US census data, each object representing an individual. These are the 5 scoring functions used for benchmarking, in order from fastest to slowest (for scoredb):

10 * number_of_children + age
10000 * age + yearly_wages
100 * age + yearly_wages
40 * gender + weekly_work_hours
100.0 * gender + 9 * num_children + age + weekly_work_hours
5 * num_children + age + weekly_work_hours

This is an unscientific test! Just my personal laptop, this datafile repeated a few times over for the biggest datasets, and scoredb benchmark -maxrecords 10000000 -csv censusdata.csv. There's no substitute for testing with your own data, queries, and hardware.

It's clear from the graph that scoredb's performance can vary significantly based on the scoring function. Some guidance on scoring:

  • Prefer to combine fields with addition, multiplication, and, in particular, minimum, because they allow the computation of useful lower bounds. Combining fields with a max() function does not, because a bad value in one field can be completely overcome by a good value in another.
  • Combining many fields instead of a few will make the query take longer, because it takes longer to determine useful lower bounds on each field.
  • Prefer to engineer weights so that the contributions from each of your fields is similar in scale. Scoredb may never be able to find useful bounds on fields that tweak the final score very slightly.

Limitations

Scoredb is minimalistic and highly specialized; it is intended to just act as one piece of a larger system:

  • Scoredb has no delete or update operation. To remove or change an object, you must build a new index. See below for how to swap a new index in under a running instance without downtime.
  • It stores objects as a flat set of key-value pairs with string keys and numeric values only. (internally, all values are 32 bit floating point values)
  • Scoredb can only respond to queries with lists of identifiers; scoredb's indexes do not provide efficient access to the original field data.
  • Scoredb has no built-in clustering, redundancy, or backup functions.
  • Adding objects to scoredb is slow if you add them one at a time. Bulk insertion should be used whenever possible.
  • Scoredb requires many open files; sometimes thousands of them. You will need to increase default filehandle limits on your system (see "ulimit" on linux).
  • Scoredb expects you to provide every field for every object; objects that are missing a field cannot be returned from queries that use the missing fields.
  • Scoredb data files are endian specific; most modern CPUs are little endian, so you won't normally have to worry about this.

Index Bulk Load

You can create a database without running a server using the scoredb load command, which expects newline separated json records on stdin. So, for instance:

printf '{"id":"person_1", "values":{"age":10, "height":53}}\n' > data.jsonl
printf '{"id":"person_2", "values":{"age":32, "height":68}}\n' >> data.jsonl
cat data.jsonl | scoredb load

Index Swapping

If you need deletes or updates, you'll have to perodically rebuild your database and swap in updated versions. If you specify the -automigrate option to the server, it will look for new database directories that begin with the given data directory and keep the (lexigraphically largest) one live. Use an atomic mv command to put it in place like so:

$ cat new_data.jsonlines | scoredb load -datadir ./live_db_v00001  # Load initial data
$ scoredb serve -readonly -automigrate -datadir ./live_db_v        # Start server

# when ready for a new version of the database,

$ cat new_data.jsonlines | scoredb load -datadir ./tmp_db          # Create the database
$ mv ./tmp_db ./live_db_v00002                                     # Rename to match the watched prefix

# The server should detect and load the new database here.

$ rm -rf ./live_db_v00001                                          # Now, remove the old database

Supported Query Functions

As shown above, queries are expressed as JSON expressions and then url encoded into the "score" query parameter. Each expression takes a lisp-like form: [<function name>, <argument 1>, <argument 2>, ...]. These are the supported functions:

["field", <field_name>]

Simply produces the value of <field_name> as a score.

  • Example: ["field", "age"] (return the age value as a score)
["scale", <factor>, <subexpression>]

Takes the result of <subexpression> and multiplies it by <factor>. <factor> may be negative.

  • Example: ["scale", 2.0, ["field", "age"]] (age, doubled)
["sum", <subexpression 1>, <subexpression 2>, ...]

Sums the results of each <subexpression>.

  • Example: ["sum", ["field", "age"], ["field", "height"]] (add age and height together)
["product", <subexpression 1>, <subexpression 2>, ...]

Multiplies the result of each <subexpression> together. For bounding reasons, negative inputs are not allowed.

  • Example: ["product", ["field", "age"], ["field", "height"]] (multiply age by height)
["min", <subexpression 1>, <subexpression 2>, ...]

Takes the least score resulting from all <subexpression>s.

  • Example: ["min", ["field", "age"], ["field", "height"]] (Take age or height, whichever is smaller)

####["diff", <subexpression 1>, <subexpression 2>] Returns the absolute difference between the values produced by both subexpressions.

  • Example: ["diff", ["field", "age"], ["field", "height"]] (the difference between each age and height)
["pow", <subexpression>, <exponent>]

Raises the result from the given subexpression to the <exponent> power.
<exponent> may be fractional (for Nth roots) or negative.
However, for bounding reasons, the subexpression may not produce negative values.

  • Example: ["pow", ["field", "age"], 2.0] (age, squared)
["custom_linear", [[<x1>, <y1>], [<x2>, <y2>], ..], <subexpression>]

Establishes a user-defined function using a set of linearly interpolated [x, y] points. Inputs smaller than the smallest X value or larger than the largest X value get the closest specified Y value.

  • Example: ["custom_linear", [[0, 0.0], [30, 1.0], [80, 0.0]], ["field", "age"]] Maping ages to scores: 30 year-olds get a score of one, gradually declining to a score of zero for infants and the elderly.
["geo_distance", <lat>, <lng>, <lat field name>, <lng field name>]

Returns the distance to a fixed point in kilometers as a score.
This is experimental: may be inaccurate for large distances, and fails across the prime meridian.
Since you typically want smaller distances to have higher scores, you'll probably want to wrap the "scale" or "custom_linear" functions around this one to invert it.

  • Example: ["geo_distance", 40.7, -74.0, "home_lat", "home_lng"] Scores each result by how far its home_lat and home_lng fields put it from New York City.

Status

Though it has reasonable test coverage and a small, straightforward codebase, scoredb is certainly alpha-quality software.

Your bug reports are greatly appreciated.

Thanks

Thanks are due to the Samsung Accelerator which let us start this project as a hackathon proof of concept. Scoredb was built with this awesome team (in github lexicographic order!):

Plugs

Check out of some of our other side projects too:

  • wildflower-touch is proof-of-concept programming IDE and language for touch devices.
  • music-tonight makes playlists of bands playing near you, tonight.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var INITIAL_VAR_BITS = uint(23 - 0)
View Source
var NegativeInfinity = float32(math.Inf(-1))
View Source
var PositiveInfinity = float32(math.Inf(1))

Functions

func Abs

func Abs(val float32) float32

func CandidateIsLess

func CandidateIsLess(r1, r2 DocScore) bool

func CheckIntersection

func CheckIntersection(yValue float32, p1, p2 CustomPoint, insideMin, insideMax *float32)

func CloseWriters

func CloseWriters(db *FsScoreDb) error

func ComputeCustomFunc

func ComputeCustomFunc(x float32, points []CustomPoint) float32

func EnsureDirectory

func EnsureDirectory(dir string) error

func Exists

func Exists(path string) bool

func FileIsAtEnd

func FileIsAtEnd(file *os.File) bool

func Max

func Max(v1, v2 float32) float32

func MaxDocsForFile

func MaxDocsForFile(fileInfo *FileInfo) int64

func Min

func Min(v1, v2 float32) float32

func Pow

func Pow(val, exp float32) float32

func QueryFloatVal

func QueryFloatVal(queryParams url.Values, key string, defaultValue float32) (float32, error)

func QueryIntVal

func QueryIntVal(queryParams url.Values, key string, defaultValue int) (int, error)

func ReadNativeLong

func ReadNativeLong(buf []byte) uint64

func RunBenchmark

func RunBenchmark(db LinearCombinationBackend, csvFilename string, maxRecords int64) ([]int64, []int64, [][]int64, error)

func RunItr

func RunItr(itr DocItr, myWorkerNum int, resultChannel chan CandidateResult, boundsChannel chan Bounds)

func ServeHttp

func ServeHttp(addr string, db Db, readOnly bool) error

func ShardIdToExt

func ShardIdToExt(idInShard int64, shardNum int) int64

func ToFloat32

func ToFloat32(val interface{}) (float32, error)

func WriteNativeLong

func WriteNativeLong(val uint64, writer io.Writer) error

func WritePostingListEntry

func WritePostingListEntry(fileInfo *FileInfo, docId int64, score float32)

Types

type BaseDb

type BaseDb struct {
	StreamingDb StreamingDb
	IdDb        IdBackend
}

func (BaseDb) BulkIndex

func (db BaseDb) BulkIndex(records []Record) error

func (BaseDb) Index

func (db BaseDb) Index(id string, values map[string]float32) error

func (BaseDb) LinearQuery

func (db BaseDb) LinearQuery(numResults int, weights map[string]float32) []string

func (BaseDb) Query

func (db BaseDb) Query(query Query) (QueryResult, error)

type BaseDbResultSet

type BaseDbResultSet []DocScore

func (BaseDbResultSet) Len

func (h BaseDbResultSet) Len() int

func (BaseDbResultSet) Less

func (h BaseDbResultSet) Less(i, j int) bool

func (*BaseDbResultSet) Pop

func (h *BaseDbResultSet) Pop() interface{}

func (*BaseDbResultSet) Push

func (h *BaseDbResultSet) Push(x interface{})

func (BaseDbResultSet) Swap

func (h BaseDbResultSet) Swap(i, j int)

type BaseStreamingDb

type BaseStreamingDb struct {
	Backend DbBackend
}

func (BaseStreamingDb) BulkIndex

func (db BaseStreamingDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (BaseStreamingDb) QueryItr

func (db BaseStreamingDb) QueryItr(scorer []interface{}) (DocItr, error)

type BitReader

type BitReader struct {
	OrigMmap        *mmap.MMap
	Mmap            []uint64
	MmapPtr         uint
	MmapPtrBitsLeft uint
	File            *os.File
	Cur             uint64
	CurBitsLeft     uint
}

func NewBitReader

func NewBitReader(file *os.File) (*BitReader, error)

func (*BitReader) Close

func (reader *BitReader) Close() error

func (*BitReader) ReadBits

func (reader *BitReader) ReadBits(numBits uint) (uint64, error)

func (*BitReader) ReadVarUInt32

func (reader *BitReader) ReadVarUInt32() (uint32, error)

func (*BitReader) Refill

func (reader *BitReader) Refill(cur uint64, bitsLeft uint, numNeeded uint) (uint64, uint, error)

type BitWriter

type BitWriter struct {
	BufferedWriter *bufio.Writer
	File           *os.File
	Cur            uint64
	CurBitsUsed    uint
}

func NewBitWriter

func NewBitWriter(file *os.File) (*BitWriter, error)

func (*BitWriter) Close

func (writer *BitWriter) Close() error

func (*BitWriter) WriteBits

func (writer *BitWriter) WriteBits(val uint64, numBits uint) error

func (*BitWriter) WriteVarUInt32

func (writer *BitWriter) WriteVarUInt32(val uint32) error

type BoltIdDb

type BoltIdDb struct {
	Db *bolt.DB
}

func NewBoltIdDb

func NewBoltIdDb(file string) (*BoltIdDb, error)

func (*BoltIdDb) Get

func (db *BoltIdDb) Get(scoreIds []int64) ([]string, error)

func (*BoltIdDb) Put

func (db *BoltIdDb) Put(scoreIds []int64, clientIds []string) error

type Bounds

type Bounds struct {
	// contains filtered or unexported fields
}

type CandidateResult

type CandidateResult struct {
	DocId     int64
	Score     float32
	WorkerNum int
}

type CustomLinearDocItr

type CustomLinearDocItr struct {
	// contains filtered or unexported fields
}

Remaps a value according to a user-specified function that linearly interpolates among a set of (x, y) points.

func (*CustomLinearDocItr) Close

func (op *CustomLinearDocItr) Close()

func (*CustomLinearDocItr) Cur

func (op *CustomLinearDocItr) Cur() (int64, float32)

func (*CustomLinearDocItr) GetBounds

func (op *CustomLinearDocItr) GetBounds() (min, max float32)

func (*CustomLinearDocItr) Name

func (op *CustomLinearDocItr) Name() string

func (*CustomLinearDocItr) Next

func (op *CustomLinearDocItr) Next(minId int64) bool

func (*CustomLinearDocItr) SetBounds

func (op *CustomLinearDocItr) SetBounds(outsideMin, outsideMax float32) bool

type CustomMapDocItr

type CustomMapDocItr struct {
	// contains filtered or unexported fields
}

Remaps a value according to a user-specified mapping of values to scores

func (*CustomMapDocItr) Close

func (op *CustomMapDocItr) Close()

func (*CustomMapDocItr) ComputeCustomFunc

func (op *CustomMapDocItr) ComputeCustomFunc(val float32) float32

func (*CustomMapDocItr) Cur

func (op *CustomMapDocItr) Cur() (int64, float32)

func (*CustomMapDocItr) GetBounds

func (op *CustomMapDocItr) GetBounds() (min, max float32)

func (*CustomMapDocItr) Name

func (op *CustomMapDocItr) Name() string

func (*CustomMapDocItr) Next

func (op *CustomMapDocItr) Next(minId int64) bool

func (*CustomMapDocItr) SetBounds

func (op *CustomMapDocItr) SetBounds(outsideMin, outsideMax float32) bool

type CustomPoint

type CustomPoint struct {
	X, Y float32
}

func ToXyPoints

func ToXyPoints(input interface{}) ([]CustomPoint, error)

type Db

type Db interface {
	BulkIndex(records []Record) error
	Index(id string, values map[string]float32) error
	Query(query Query) (QueryResult, error)
}

type DbBackend

type DbBackend interface {
	BulkIndex(records []map[string]float32) ([]int64, error)
	FieldDocItr(field string) DocItr
}

type DiffDocItr

type DiffDocItr struct {
	// contains filtered or unexported fields
}

(Absolute) difference between a value and a constant

func (*DiffDocItr) Close

func (op *DiffDocItr) Close()

func (*DiffDocItr) Cur

func (op *DiffDocItr) Cur() (int64, float32)

func (*DiffDocItr) GetBounds

func (op *DiffDocItr) GetBounds() (min, max float32)

func (*DiffDocItr) Name

func (op *DiffDocItr) Name() string

func (*DiffDocItr) Next

func (op *DiffDocItr) Next(minId int64) bool

func (*DiffDocItr) SetBounds

func (op *DiffDocItr) SetBounds(min, max float32) bool

type DocItr

type DocItr interface {
	Name() string

	// return false if the iterator is now known to not produce any more values
	SetBounds(min, max float32) bool

	GetBounds() (min, max float32)

	// Next() skips the iterator ahead to at least as far as the given id.
	// It always advances the iterator at least one position.
	// It Returns false if there are no remaining values.
	// Iterators need a call to Next(0) to intialize them to a real value; they all initially have a docId of -1
	Next(minId int64) bool

	Close() // release resources held by this iterator (if any)

	Cur() (int64, float32) // doc id and score of current result, or (-1, 0.0) if the iterator has not been initialized

}

func NewPostingListDocItr

func NewPostingListDocItr(rangePrefix uint32, path string, header *PostingListHeader, numVarBits uint) DocItr

type DocScore

type DocScore struct {
	DocId int64
	Score float32
}

type EsQueryResponse

type EsQueryResponse struct {
	Hits struct {
		Hits []struct {
			Id string `json:"_id"`
		} `json:"hits"`
	} `json:"hits"`
}

type EsScoreDb

type EsScoreDb struct {
	BaseURL, Index string
}

func (*EsScoreDb) BulkIndex

func (db *EsScoreDb) BulkIndex(records []Record) error

func (*EsScoreDb) CreateIndex

func (db *EsScoreDb) CreateIndex()

func (*EsScoreDb) DeleteIndex

func (db *EsScoreDb) DeleteIndex()

func (*EsScoreDb) LinearQuery

func (db *EsScoreDb) LinearQuery(numResults int, weights map[string]float32) []string

func (*EsScoreDb) ParseQuery

func (db *EsScoreDb) ParseQuery(query string) map[string]float32

func (*EsScoreDb) RefreshIndex

func (db *EsScoreDb) RefreshIndex()

type FieldDocItr

type FieldDocItr struct {
	// contains filtered or unexported fields
}

func NewFieldDocItr

func NewFieldDocItr(field string, lists FieldDocItrs) *FieldDocItr

func (*FieldDocItr) Close

func (op *FieldDocItr) Close()

func (*FieldDocItr) Cur

func (op *FieldDocItr) Cur() (int64, float32)

func (*FieldDocItr) GetBounds

func (op *FieldDocItr) GetBounds() (min, max float32)

func (*FieldDocItr) Name

func (op *FieldDocItr) Name() string

func (*FieldDocItr) Next

func (op *FieldDocItr) Next(minId int64) bool

func (*FieldDocItr) SetBounds

func (op *FieldDocItr) SetBounds(min, max float32) bool

type FieldDocItrs

type FieldDocItrs []DocItr // FieldDocItrs implements heap.Interface

func (FieldDocItrs) Len

func (so FieldDocItrs) Len() int

func (FieldDocItrs) Less

func (so FieldDocItrs) Less(i, j int) bool

func (*FieldDocItrs) Pop

func (so *FieldDocItrs) Pop() interface{}

func (*FieldDocItrs) Push

func (so *FieldDocItrs) Push(x interface{})

func (FieldDocItrs) Swap

func (so FieldDocItrs) Swap(i, j int)

type FileInfo

type FileInfo struct {
	// contains filtered or unexported fields
}

func FindPostingListFileForWrite

func FindPostingListFileForWrite(db *FsScoreDb, docId int64, key string, value float32) (*FileInfo, error)

func MakeFileInfo

func MakeFileInfo(fieldDir string, value float32, numVarBits uint, docId int64) (*FileInfo, error)

type FsScoreDb

type FsScoreDb struct {
	// contains filtered or unexported fields
}

func NewFsScoreDb

func NewFsScoreDb(dataDir string) *FsScoreDb

func (*FsScoreDb) BulkIndex

func (db *FsScoreDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (*FsScoreDb) FieldDocItr

func (db *FsScoreDb) FieldDocItr(fieldName string) DocItr

func (*FsScoreDb) Index

func (db *FsScoreDb) Index(record map[string]float32) (int64, error)

type IdBackend

type IdBackend interface {
	Put(scoreIds []int64, clientIds []string) error
	Get(scoreIds []int64) ([]string, error)
}

type LinearCombinationBackend

type LinearCombinationBackend interface {
	BulkIndex(records []Record) error
	LinearQuery(numResults int, coefs map[string]float32) []string
}

type MemoryDocItr

type MemoryDocItr struct {
	// contains filtered or unexported fields
}

func NewMemoryDocItr

func NewMemoryDocItr(scores []float32, docs []int64) *MemoryDocItr

func (*MemoryDocItr) Close

func (op *MemoryDocItr) Close()

func (*MemoryDocItr) Cur

func (op *MemoryDocItr) Cur() (int64, float32)

func (*MemoryDocItr) GetBounds

func (op *MemoryDocItr) GetBounds() (min, max float32)

func (*MemoryDocItr) Name

func (op *MemoryDocItr) Name() string

func (*MemoryDocItr) Next

func (op *MemoryDocItr) Next(minId int64) bool

func (*MemoryDocItr) SetBounds

func (op *MemoryDocItr) SetBounds(min, max float32) bool

type MemoryIdDb

type MemoryIdDb struct {
	// contains filtered or unexported fields
}

func NewMemoryIdDb

func NewMemoryIdDb() MemoryIdDb

func (MemoryIdDb) Get

func (db MemoryIdDb) Get(scoreIds []int64) ([]string, error)

func (MemoryIdDb) Put

func (db MemoryIdDb) Put(scoreIds []int64, clientIds []string) error

type MemoryScoreDb

type MemoryScoreDb struct {
	Fields map[string][]float32
	// contains filtered or unexported fields
}

func NewMemoryScoreDb

func NewMemoryScoreDb() *MemoryScoreDb

func (*MemoryScoreDb) BulkIndex

func (db *MemoryScoreDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (*MemoryScoreDb) FieldDocItr

func (db *MemoryScoreDb) FieldDocItr(fieldName string) DocItr

type MemoryScoreDocItr

type MemoryScoreDocItr struct {
	// contains filtered or unexported fields
}

func NewMemoryScoreDocItr

func NewMemoryScoreDocItr(scores []float32) *MemoryScoreDocItr

func (*MemoryScoreDocItr) Close

func (op *MemoryScoreDocItr) Close()

func (*MemoryScoreDocItr) Cur

func (op *MemoryScoreDocItr) Cur() (int64, float32)

func (*MemoryScoreDocItr) GetBounds

func (op *MemoryScoreDocItr) GetBounds() (min, max float32)

func (*MemoryScoreDocItr) Name

func (op *MemoryScoreDocItr) Name() string

func (*MemoryScoreDocItr) Next

func (op *MemoryScoreDocItr) Next(minId int64) bool

func (*MemoryScoreDocItr) SetBounds

func (op *MemoryScoreDocItr) SetBounds(min, max float32) bool

type MigratableDb

type MigratableDb struct {
	Current Db
}

func (*MigratableDb) BulkIndex

func (db *MigratableDb) BulkIndex(records []Record) error

func (*MigratableDb) Index

func (db *MigratableDb) Index(id string, values map[string]float32) error

func (*MigratableDb) Query

func (db *MigratableDb) Query(query Query) (QueryResult, error)

type MinComponents

type MinComponents []DocItr

func (MinComponents) Len

func (a MinComponents) Len() int

func (MinComponents) Less

func (a MinComponents) Less(i, j int) bool

func (MinComponents) Swap

func (a MinComponents) Swap(i, j int)

type MinDocItr

type MinDocItr struct {
	// contains filtered or unexported fields
}

func NewMinDocItr

func NewMinDocItr(itrs []DocItr) *MinDocItr

func (*MinDocItr) Close

func (op *MinDocItr) Close()

func (*MinDocItr) Cur

func (op *MinDocItr) Cur() (int64, float32)

func (*MinDocItr) GetBounds

func (op *MinDocItr) GetBounds() (min, max float32)

func (*MinDocItr) Name

func (op *MinDocItr) Name() string

func (*MinDocItr) Next

func (op *MinDocItr) Next(minId int64) bool

func (*MinDocItr) SetBounds

func (op *MinDocItr) SetBounds(min, max float32) bool

type OrderedFileInfos

type OrderedFileInfos []*FileInfo

func (OrderedFileInfos) Len

func (a OrderedFileInfos) Len() int

func (OrderedFileInfos) Less

func (a OrderedFileInfos) Less(i, j int) bool

func (OrderedFileInfos) Swap

func (a OrderedFileInfos) Swap(i, j int)

type ParallelDocItr

type ParallelDocItr struct {
	NumAlive      int
	Bounds        Bounds
	ResultChannel chan CandidateResult
	Comms         []chan Bounds
	// contains filtered or unexported fields
}

func NewParallelDocItr

func NewParallelDocItr(parts []DocItr) *ParallelDocItr

func (*ParallelDocItr) Close

func (op *ParallelDocItr) Close()

func (*ParallelDocItr) Cur

func (op *ParallelDocItr) Cur() (int64, float32)

func (*ParallelDocItr) GetBounds

func (op *ParallelDocItr) GetBounds() (min, max float32)

func (*ParallelDocItr) Name

func (op *ParallelDocItr) Name() string

func (*ParallelDocItr) Next

func (op *ParallelDocItr) Next(minId int64) bool

func (*ParallelDocItr) SetBounds

func (op *ParallelDocItr) SetBounds(min, max float32) bool

type PostingListDocItr

type PostingListDocItr struct {
	// contains filtered or unexported fields
}

func (*PostingListDocItr) Close

func (op *PostingListDocItr) Close()

func (*PostingListDocItr) Cur

func (op *PostingListDocItr) Cur() (int64, float32)

func (*PostingListDocItr) GetBounds

func (op *PostingListDocItr) GetBounds() (min, max float32)

func (*PostingListDocItr) Name

func (op *PostingListDocItr) Name() string

func (*PostingListDocItr) Next

func (op *PostingListDocItr) Next(minId int64) bool

func (*PostingListDocItr) SetBounds

func (op *PostingListDocItr) SetBounds(min, max float32) bool

type PostingListHeader

type PostingListHeader struct {
	FirstDocId    int64
	LastDocId     int64
	NumDocs       int64
	MinVal        float32
	MaxVal        float32
	FirstDocScore float32
	Version       uint8
	// contains filtered or unexported fields
}

type PowDocItr

type PowDocItr struct {
	// contains filtered or unexported fields
}

Takes a constant power of a value. Important: for bounds caluclation reasons, assumes only positive values are provided as inputs!

func NewPowDocItr

func NewPowDocItr(itr DocItr, exp float32) *PowDocItr

func (*PowDocItr) Close

func (op *PowDocItr) Close()

func (*PowDocItr) Cur

func (op *PowDocItr) Cur() (int64, float32)

func (*PowDocItr) GetBounds

func (op *PowDocItr) GetBounds() (min, max float32)

func (*PowDocItr) Name

func (op *PowDocItr) Name() string

func (*PowDocItr) Next

func (op *PowDocItr) Next(minId int64) bool

func (*PowDocItr) SetBounds

func (op *PowDocItr) SetBounds(min, max float32) bool

type ProductComponents

type ProductComponents []DocItr

func (ProductComponents) Len

func (a ProductComponents) Len() int

func (ProductComponents) Less

func (a ProductComponents) Less(i, j int) bool

func (ProductComponents) Swap

func (a ProductComponents) Swap(i, j int)

type ProductDocItr

type ProductDocItr struct {
	// contains filtered or unexported fields
}

func NewProductDocItr

func NewProductDocItr(itrs []DocItr) *ProductDocItr

func (*ProductDocItr) Close

func (op *ProductDocItr) Close()

func (*ProductDocItr) Cur

func (op *ProductDocItr) Cur() (int64, float32)

func (*ProductDocItr) GetBounds

func (op *ProductDocItr) GetBounds() (min, max float32)

func (*ProductDocItr) Name

func (op *ProductDocItr) Name() string

func (*ProductDocItr) Next

func (op *ProductDocItr) Next(minId int64) bool

func (*ProductDocItr) SetBounds

func (op *ProductDocItr) SetBounds(min, max float32) bool

type Query

type Query struct {
	Offset   int
	Limit    int
	MinScore float32

	// mixed, nested arrays of strings and numbers describing a function; for example: ["sum", ["field", "age"], ["field", "height"]]
	Scorer []interface{}
}

type QueryResult

type QueryResult struct {
	Ids    []string
	Scores []float32
}

type Record

type Record struct {
	Id     string
	Values map[string]float32
}

type ScaleDocItr

type ScaleDocItr struct {
	// contains filtered or unexported fields
}

Multiplies a value by a constant

func (*ScaleDocItr) Close

func (op *ScaleDocItr) Close()

func (*ScaleDocItr) Cur

func (op *ScaleDocItr) Cur() (int64, float32)

func (*ScaleDocItr) GetBounds

func (op *ScaleDocItr) GetBounds() (min, max float32)

func (*ScaleDocItr) Name

func (op *ScaleDocItr) Name() string

func (*ScaleDocItr) Next

func (op *ScaleDocItr) Next(minId int64) bool

func (*ScaleDocItr) SetBounds

func (op *ScaleDocItr) SetBounds(min, max float32) bool

type ScoreDbServer

type ScoreDbServer struct {
	Db                    Db
	ReadOnly, AutoMigrate bool
}

func (*ScoreDbServer) ServeHTTP

func (sds *ScoreDbServer) ServeHTTP(w http.ResponseWriter, req *http.Request)

type ShardedDb

type ShardedDb struct {
	Shards []StreamingDb
}

func NewShardedDb

func NewShardedDb(shards []StreamingDb) (*ShardedDb, error)

func (ShardedDb) BulkIndex

func (db ShardedDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (ShardedDb) QueryItr

func (db ShardedDb) QueryItr(scorer []interface{}) (DocItr, error)

type StreamingDb

type StreamingDb interface {
	BulkIndex(records []map[string]float32) ([]int64, error)
	QueryItr(Scorer []interface{}) (DocItr, error)
}

type StubDb

type StubDb struct {
	// contains filtered or unexported fields
}

func (*StubDb) BulkIndex

func (sdb *StubDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (*StubDb) Index

func (sdb *StubDb) Index(record map[string]float32) (int64, error)

func (*StubDb) LinearQuery

func (db *StubDb) LinearQuery(numResults int, coefs map[string]float32) []string

func (*StubDb) Query

func (db *StubDb) Query(query Query) (QueryResult, error)

type SumComponent

type SumComponent struct {
	// contains filtered or unexported fields
}

type SumComponents

type SumComponents []SumComponent

func (SumComponents) Len

func (a SumComponents) Len() int

func (SumComponents) Less

func (a SumComponents) Less(i, j int) bool

func (SumComponents) Swap

func (a SumComponents) Swap(i, j int)

type SumDocItr

type SumDocItr struct {
	// contains filtered or unexported fields
}

func NewSumDocItr

func NewSumDocItr(itrs []DocItr) *SumDocItr

func (*SumDocItr) Close

func (op *SumDocItr) Close()

func (*SumDocItr) Cur

func (op *SumDocItr) Cur() (int64, float32)

func (*SumDocItr) GetBounds

func (op *SumDocItr) GetBounds() (min, max float32)

func (*SumDocItr) Name

func (op *SumDocItr) Name() string

func (*SumDocItr) Next

func (op *SumDocItr) Next(minId int64) bool

func (*SumDocItr) SetBounds

func (op *SumDocItr) SetBounds(min, max float32) bool

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL