scoredb

package module

v0.0.0-...-57beea0 Latest Latest Go to latest Published: Mar 29, 2016 License: MIT Imports: 24 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/pschanely/scoredb

Links

Open Source Insights

README ¶

scoredb

A simple database index optimized for returning results by custom scoring functions.

To my knowledge, it is the only open source system with an algorithm designed for this purpose; in some cases, it is faster than elasticsearch's implementation by an order of magnitude. (see below)

Why?

Scoredb is optimized for systems that want to find the top scoring results, where the scoring function is specified by the client, and may depend on more than one field. It may be a good choice for any system that needs to incorporate multiple factors when returning results. For instance, it might power a used car website to produce results based on factors like mileage, year, and distance.

Run It

Though Scoredb has a straightforward programatic interface, you can run a simple standalone HTTP server like so:

$ go get github.com/pschanely/scoredb
$ go install github.com/pschanely/scoredb/...
$ ${GOPATH}/bin/scoredb serve -datadir my_data_directory -port 11625

... and in another shell:

# insert some people with ages and weights
$ curl -XPUT http://localhost:11625/jim -d '{"age":21, "weight":170}'
$ curl -XPUT http://localhost:11625/bob -d '{"age":34, "weight":150}'

# get people by age
$ curl -G 'http://localhost:11625' --data-urlencode 'score=["field", "age"]'
{"Ids":["bob","jim"]}

# get people by the sum of their age and weight:
$ curl -G 'http://localhost:11625' --data-urlencode 'score=["sum", ["field", "age"], ["field", "weight"]]'
{"Ids":["jim","bob"]}

The Algorithm

Scoredb uses a format on disk that is very similar to that used by text search systems like solr and elasticsearch. We divide each field into ranges of values (buckets) and, for each bucket, maintain a file containing the IDs of objects that have their value inside that range.

The IDs in each file are strictly increasing; this means that we can traverse several buckets efficiently by using a heap of buckets to find the next smallest id among many buckets.

As we traverse the buckets, we score the objects produced and put them into a candidate result set. The result set is capped at the &limit= parameter specified by the user. As poorly scoring results get kicked out of the candidate result set, we can infer a lower bound on the final score. With some math, we can propagate that lower bound backwards through the scoring function to infer bounds on the individual fields. These bounds may then be used to stop traversing very poorly scoring buckets that could not produce a good enough final score. In this manner, as the candidate result set gets better and better, the system can eliminate more and more buckets to arrive at a result very quickly.

The following graph shows bucket elimination over the course of an example query combining two fields, "age" and "wages":

Performance

Few database systems support custom scoring functions, and fewer (possibly none?) use algorithms designed for that purpose. In practice, I've found elasticsearch's custom scoring functions to be quite fast, so I've benchmarked against it here. Please let me know about other systems I might benchmark against!

This is a graph of how 5 different queries perform with varying database sizes (yellow is elasticsearch and blue is scoredb):

The elasticsearch query times (yellow) look like they're rising exponentially, but it's actually linear because the X-axis has a logarithmic scale.

The dataset is anonymized US census data, each object representing an individual. These are the 5 scoring functions used for benchmarking, in order from fastest to slowest (for scoredb):

10 * number_of_children + age
10000 * age + yearly_wages
100 * age + yearly_wages
40 * gender + weekly_work_hours
100.0 * gender + 9 * num_children + age + weekly_work_hours
5 * num_children + age + weekly_work_hours

This is an unscientific test! Just my personal laptop, this datafile repeated a few times over for the biggest datasets, and scoredb benchmark -maxrecords 10000000 -csv censusdata.csv. There's no substitute for testing with your own data, queries, and hardware.

It's clear from the graph that scoredb's performance can vary significantly based on the scoring function. Some guidance on scoring:

Prefer to combine fields with addition, multiplication, and, in particular, minimum, because they allow the computation of useful lower bounds. Combining fields with a max() function does not, because a bad value in one field can be completely overcome by a good value in another.
Combining many fields instead of a few will make the query take longer, because it takes longer to determine useful lower bounds on each field.
Prefer to engineer weights so that the contributions from each of your fields is similar in scale. Scoredb may never be able to find useful bounds on fields that tweak the final score very slightly.

Limitations

Scoredb is minimalistic and highly specialized; it is intended to just act as one piece of a larger system:

Scoredb has no delete or update operation. To remove or change an object, you must build a new index. See below for how to swap a new index in under a running instance without downtime.
It stores objects as a flat set of key-value pairs with string keys and numeric values only. (internally, all values are 32 bit floating point values)
Scoredb can only respond to queries with lists of identifiers; scoredb's indexes do not provide efficient access to the original field data.
Scoredb has no built-in clustering, redundancy, or backup functions.
Adding objects to scoredb is slow if you add them one at a time. Bulk insertion should be used whenever possible.
Scoredb requires many open files; sometimes thousands of them. You will need to increase default filehandle limits on your system (see "ulimit" on linux).
Scoredb expects you to provide every field for every object; objects that are missing a field cannot be returned from queries that use the missing fields.
Scoredb data files are endian specific; most modern CPUs are little endian, so you won't normally have to worry about this.

Index Bulk Load

You can create a database without running a server using the scoredb load command, which expects newline separated json records on stdin. So, for instance:

printf '{"id":"person_1", "values":{"age":10, "height":53}}\n' > data.jsonl
printf '{"id":"person_2", "values":{"age":32, "height":68}}\n' >> data.jsonl
cat data.jsonl | scoredb load

Index Swapping

If you need deletes or updates, you'll have to perodically rebuild your database and swap in updated versions. If you specify the -automigrate option to the server, it will look for new database directories that begin with the given data directory and keep the (lexigraphically largest) one live. Use an atomic mv command to put it in place like so:

$ cat new_data.jsonlines | scoredb load -datadir ./live_db_v00001  # Load initial data
$ scoredb serve -readonly -automigrate -datadir ./live_db_v        # Start server

# when ready for a new version of the database,

$ cat new_data.jsonlines | scoredb load -datadir ./tmp_db          # Create the database
$ mv ./tmp_db ./live_db_v00002                                     # Rename to match the watched prefix

# The server should detect and load the new database here.

$ rm -rf ./live_db_v00001                                          # Now, remove the old database

Supported Query Functions

As shown above, queries are expressed as JSON expressions and then url encoded into the "score" query parameter. Each expression takes a lisp-like form: [<function name>, <argument 1>, <argument 2>, ...]. These are the supported functions:

`["field", <field_name>]`

Simply produces the value of <field_name> as a score.

Example: ["field", "age"] (return the age value as a score)

`["scale", <factor>, <subexpression>]`

Takes the result of <subexpression> and multiplies it by <factor>. <factor> may be negative.

Example: ["scale", 2.0, ["field", "age"]] (age, doubled)

`["sum", <subexpression 1>, <subexpression 2>, ...]`

Sums the results of each <subexpression>.

Example: ["sum", ["field", "age"], ["field", "height"]] (add age and height together)

`["product", <subexpression 1>, <subexpression 2>, ...]`

Multiplies the result of each <subexpression> together. For bounding reasons, negative inputs are not allowed.

Example: ["product", ["field", "age"], ["field", "height"]] (multiply age by height)

`["min", <subexpression 1>, <subexpression 2>, ...]`

Takes the least score resulting from all <subexpression>s.

Example: ["min", ["field", "age"], ["field", "height"]] (Take age or height, whichever is smaller)

####["diff", <subexpression 1>, <subexpression 2>] Returns the absolute difference between the values produced by both subexpressions.

Example: ["diff", ["field", "age"], ["field", "height"]] (the difference between each age and height)

`["pow", <subexpression>, <exponent>]`

Raises the result from the given subexpression to the <exponent> power.
<exponent> may be fractional (for Nth roots) or negative.
However, for bounding reasons, the subexpression may not produce negative values.

Example: ["pow", ["field", "age"], 2.0] (age, squared)

`["custom_linear", [[<x1>, <y1>], [<x2>, <y2>], ..], <subexpression>]`

Establishes a user-defined function using a set of linearly interpolated [x, y] points. Inputs smaller than the smallest X value or larger than the largest X value get the closest specified Y value.

Example: ["custom_linear", [[0, 0.0], [30, 1.0], [80, 0.0]], ["field", "age"]] Maping ages to scores: 30 year-olds get a score of one, gradually declining to a score of zero for infants and the elderly.

`["geo_distance", <lat>, <lng>, <lat field name>, <lng field name>]`

Returns the distance to a fixed point in kilometers as a score.
This is experimental: may be inaccurate for large distances, and fails across the prime meridian.
Since you typically want smaller distances to have higher scores, you'll probably want to wrap the "scale" or "custom_linear" functions around this one to invert it.

Example: ["geo_distance", 40.7, -74.0, "home_lat", "home_lng"] Scores each result by how far its home_lat and home_lng fields put it from New York City.

Status

Though it has reasonable test coverage and a small, straightforward codebase, scoredb is certainly alpha-quality software.

Your bug reports are greatly appreciated.

Thanks

Thanks are due to the Samsung Accelerator which let us start this project as a hackathon proof of concept. Scoredb was built with this awesome team (in github lexicographic order!):

Plugs

Check out of some of our other side projects too:

wildflower-touch is proof-of-concept programming IDE and language for touch devices.
music-tonight makes playlists of bands playing near you, tonight.

Documentation ¶

Index ¶

Variables
func Abs(val float32) float32
func CandidateIsLess(r1, r2 DocScore) bool
func CheckIntersection(yValue float32, p1, p2 CustomPoint, insideMin, insideMax *float32)
func CloseWriters(db *FsScoreDb) error
func ComputeCustomFunc(x float32, points []CustomPoint) float32
func EnsureDirectory(dir string) error
func Exists(path string) bool
func FileIsAtEnd(file *os.File) bool
func Max(v1, v2 float32) float32
func MaxDocsForFile(fileInfo *FileInfo) int64
func Min(v1, v2 float32) float32
func Pow(val, exp float32) float32
func QueryFloatVal(queryParams url.Values, key string, defaultValue float32) (float32, error)
func QueryIntVal(queryParams url.Values, key string, defaultValue int) (int, error)
func ReadNativeLong(buf []byte) uint64
func RunBenchmark(db LinearCombinationBackend, csvFilename string, maxRecords int64) ([]int64, []int64, [][]int64, error)
func RunItr(itr DocItr, myWorkerNum int, resultChannel chan CandidateResult, ...)
func ServeHttp(addr string, db Db, readOnly bool) error
func ShardIdToExt(idInShard int64, shardNum int) int64
func ToFloat32(val interface{}) (float32, error)
func WriteNativeLong(val uint64, writer io.Writer) error
func WritePostingListEntry(fileInfo *FileInfo, docId int64, score float32)
type BaseDb
- func (db BaseDb) BulkIndex(records []Record) error
- func (db BaseDb) Index(id string, values map[string]float32) error
- func (db BaseDb) LinearQuery(numResults int, weights map[string]float32) []string
- func (db BaseDb) Query(query Query) (QueryResult, error)
type BaseDbResultSet
- func (h BaseDbResultSet) Len() int
- func (h BaseDbResultSet) Less(i, j int) bool
- func (h *BaseDbResultSet) Pop() interface{}
- func (h *BaseDbResultSet) Push(x interface{})
- func (h BaseDbResultSet) Swap(i, j int)
type BaseStreamingDb
- func (db BaseStreamingDb) BulkIndex(records []map[string]float32) ([]int64, error)
- func (db BaseStreamingDb) QueryItr(scorer []interface{}) (DocItr, error)
type BitReader
- func NewBitReader(file *os.File) (*BitReader, error)
- func (reader *BitReader) Close() error
- func (reader *BitReader) ReadBits(numBits uint) (uint64, error)
- func (reader *BitReader) ReadVarUInt32() (uint32, error)
- func (reader *BitReader) Refill(cur uint64, bitsLeft uint, numNeeded uint) (uint64, uint, error)
type BitWriter
- func NewBitWriter(file *os.File) (*BitWriter, error)
- func (writer *BitWriter) Close() error
- func (writer *BitWriter) WriteBits(val uint64, numBits uint) error
- func (writer *BitWriter) WriteVarUInt32(val uint32) error
type BoltIdDb
- func NewBoltIdDb(file string) (*BoltIdDb, error)
- func (db *BoltIdDb) Get(scoreIds []int64) ([]string, error)
- func (db *BoltIdDb) Put(scoreIds []int64, clientIds []string) error
type Bounds
type CandidateResult
type CustomLinearDocItr
- func (op *CustomLinearDocItr) Close()
- func (op *CustomLinearDocItr) Cur() (int64, float32)
- func (op *CustomLinearDocItr) GetBounds() (min, max float32)
- func (op *CustomLinearDocItr) Name() string
- func (op *CustomLinearDocItr) Next(minId int64) bool
- func (op *CustomLinearDocItr) SetBounds(outsideMin, outsideMax float32) bool
type CustomMapDocItr
- func (op *CustomMapDocItr) Close()
- func (op *CustomMapDocItr) ComputeCustomFunc(val float32) float32
- func (op *CustomMapDocItr) Cur() (int64, float32)
- func (op *CustomMapDocItr) GetBounds() (min, max float32)
- func (op *CustomMapDocItr) Name() string
- func (op *CustomMapDocItr) Next(minId int64) bool
- func (op *CustomMapDocItr) SetBounds(outsideMin, outsideMax float32) bool
type CustomPoint
- func ToXyPoints(input interface{}) ([]CustomPoint, error)
type Db
type DbBackend
type DiffDocItr
- func (op *DiffDocItr) Close()
- func (op *DiffDocItr) Cur() (int64, float32)
- func (op *DiffDocItr) GetBounds() (min, max float32)
- func (op *DiffDocItr) Name() string
- func (op *DiffDocItr) Next(minId int64) bool
- func (op *DiffDocItr) SetBounds(min, max float32) bool
type DocItr
- func NewPostingListDocItr(rangePrefix uint32, path string, header *PostingListHeader, numVarBits uint) DocItr
type DocScore
type EsQueryResponse
type EsScoreDb
- func (db *EsScoreDb) BulkIndex(records []Record) error
- func (db *EsScoreDb) CreateIndex()
- func (db *EsScoreDb) DeleteIndex()
- func (db *EsScoreDb) LinearQuery(numResults int, weights map[string]float32) []string
- func (db *EsScoreDb) ParseQuery(query string) map[string]float32
- func (db *EsScoreDb) RefreshIndex()
type FieldDocItr
- func NewFieldDocItr(field string, lists FieldDocItrs) *FieldDocItr
- func (op *FieldDocItr) Close()
- func (op *FieldDocItr) Cur() (int64, float32)
- func (op *FieldDocItr) GetBounds() (min, max float32)
- func (op *FieldDocItr) Name() string
- func (op *FieldDocItr) Next(minId int64) bool
- func (op *FieldDocItr) SetBounds(min, max float32) bool
type FieldDocItrs
- func (so FieldDocItrs) Len() int
- func (so FieldDocItrs) Less(i, j int) bool
- func (so *FieldDocItrs) Pop() interface{}
- func (so *FieldDocItrs) Push(x interface{})
- func (so FieldDocItrs) Swap(i, j int)
type FileInfo
- func FindPostingListFileForWrite(db *FsScoreDb, docId int64, key string, value float32) (*FileInfo, error)
- func MakeFileInfo(fieldDir string, value float32, numVarBits uint, docId int64) (*FileInfo, error)
type FsScoreDb
- func NewFsScoreDb(dataDir string) *FsScoreDb
- func (db *FsScoreDb) BulkIndex(records []map[string]float32) ([]int64, error)
- func (db *FsScoreDb) FieldDocItr(fieldName string) DocItr
- func (db *FsScoreDb) Index(record map[string]float32) (int64, error)
type IdBackend
type LinearCombinationBackend
type MemoryDocItr
- func NewMemoryDocItr(scores []float32, docs []int64) *MemoryDocItr
- func (op *MemoryDocItr) Close()
- func (op *MemoryDocItr) Cur() (int64, float32)
- func (op *MemoryDocItr) GetBounds() (min, max float32)
- func (op *MemoryDocItr) Name() string
- func (op *MemoryDocItr) Next(minId int64) bool
- func (op *MemoryDocItr) SetBounds(min, max float32) bool
type MemoryIdDb
- func NewMemoryIdDb() MemoryIdDb
- func (db MemoryIdDb) Get(scoreIds []int64) ([]string, error)
- func (db MemoryIdDb) Put(scoreIds []int64, clientIds []string) error
type MemoryScoreDb
- func NewMemoryScoreDb() *MemoryScoreDb
- func (db *MemoryScoreDb) BulkIndex(records []map[string]float32) ([]int64, error)
- func (db *MemoryScoreDb) FieldDocItr(fieldName string) DocItr
type MemoryScoreDocItr
- func NewMemoryScoreDocItr(scores []float32) *MemoryScoreDocItr
- func (op *MemoryScoreDocItr) Close()
- func (op *MemoryScoreDocItr) Cur() (int64, float32)
- func (op *MemoryScoreDocItr) GetBounds() (min, max float32)
- func (op *MemoryScoreDocItr) Name() string
- func (op *MemoryScoreDocItr) Next(minId int64) bool
- func (op *MemoryScoreDocItr) SetBounds(min, max float32) bool
type MigratableDb
- func (db *MigratableDb) BulkIndex(records []Record) error
- func (db *MigratableDb) Index(id string, values map[string]float32) error
- func (db *MigratableDb) Query(query Query) (QueryResult, error)
type MinComponents
- func (a MinComponents) Len() int
- func (a MinComponents) Less(i, j int) bool
- func (a MinComponents) Swap(i, j int)
type MinDocItr
- func NewMinDocItr(itrs []DocItr) *MinDocItr
- func (op *MinDocItr) Close()
- func (op *MinDocItr) Cur() (int64, float32)
- func (op *MinDocItr) GetBounds() (min, max float32)
- func (op *MinDocItr) Name() string
- func (op *MinDocItr) Next(minId int64) bool
- func (op *MinDocItr) SetBounds(min, max float32) bool
type OrderedFileInfos
- func (a OrderedFileInfos) Len() int
- func (a OrderedFileInfos) Less(i, j int) bool
- func (a OrderedFileInfos) Swap(i, j int)
type ParallelDocItr
- func NewParallelDocItr(parts []DocItr) *ParallelDocItr
- func (op *ParallelDocItr) Close()
- func (op *ParallelDocItr) Cur() (int64, float32)
- func (op *ParallelDocItr) GetBounds() (min, max float32)
- func (op *ParallelDocItr) Name() string
- func (op *ParallelDocItr) Next(minId int64) bool
- func (op *ParallelDocItr) SetBounds(min, max float32) bool
type PostingListDocItr
- func (op *PostingListDocItr) Close()
- func (op *PostingListDocItr) Cur() (int64, float32)
- func (op *PostingListDocItr) GetBounds() (min, max float32)
- func (op *PostingListDocItr) Name() string
- func (op *PostingListDocItr) Next(minId int64) bool
- func (op *PostingListDocItr) SetBounds(min, max float32) bool
type PostingListHeader
type PowDocItr
- func NewPowDocItr(itr DocItr, exp float32) *PowDocItr
- func (op *PowDocItr) Close()
- func (op *PowDocItr) Cur() (int64, float32)
- func (op *PowDocItr) GetBounds() (min, max float32)
- func (op *PowDocItr) Name() string
- func (op *PowDocItr) Next(minId int64) bool
- func (op *PowDocItr) SetBounds(min, max float32) bool
type ProductComponents
- func (a ProductComponents) Len() int
- func (a ProductComponents) Less(i, j int) bool
- func (a ProductComponents) Swap(i, j int)
type ProductDocItr
- func NewProductDocItr(itrs []DocItr) *ProductDocItr
- func (op *ProductDocItr) Close()
- func (op *ProductDocItr) Cur() (int64, float32)
- func (op *ProductDocItr) GetBounds() (min, max float32)
- func (op *ProductDocItr) Name() string
- func (op *ProductDocItr) Next(minId int64) bool
- func (op *ProductDocItr) SetBounds(min, max float32) bool
type Query
type QueryResult
type Record
type ScaleDocItr
- func (op *ScaleDocItr) Close()
- func (op *ScaleDocItr) Cur() (int64, float32)
- func (op *ScaleDocItr) GetBounds() (min, max float32)
- func (op *ScaleDocItr) Name() string
- func (op *ScaleDocItr) Next(minId int64) bool
- func (op *ScaleDocItr) SetBounds(min, max float32) bool
type ScoreDbServer
- func (sds *ScoreDbServer) ServeHTTP(w http.ResponseWriter, req *http.Request)
type ShardedDb
- func NewShardedDb(shards []StreamingDb) (*ShardedDb, error)
- func (db ShardedDb) BulkIndex(records []map[string]float32) ([]int64, error)
- func (db ShardedDb) QueryItr(scorer []interface{}) (DocItr, error)
type StreamingDb
type StubDb
- func (sdb *StubDb) BulkIndex(records []map[string]float32) ([]int64, error)
- func (sdb *StubDb) Index(record map[string]float32) (int64, error)
- func (db *StubDb) LinearQuery(numResults int, coefs map[string]float32) []string
- func (db *StubDb) Query(query Query) (QueryResult, error)
type SumComponent
type SumComponents
- func (a SumComponents) Len() int
- func (a SumComponents) Less(i, j int) bool
- func (a SumComponents) Swap(i, j int)
type SumDocItr
- func NewSumDocItr(itrs []DocItr) *SumDocItr
- func (op *SumDocItr) Close()
- func (op *SumDocItr) Cur() (int64, float32)
- func (op *SumDocItr) GetBounds() (min, max float32)
- func (op *SumDocItr) Name() string
- func (op *SumDocItr) Next(minId int64) bool
- func (op *SumDocItr) SetBounds(min, max float32) bool

Constants ¶

This section is empty.

Variables ¶

View Source

var HEADER_SIZE = int64(binary.Size(PostingListHeader{}))

View Source

var INITIAL_VAR_BITS = uint(23 - 0)

View Source

var NegativeInfinity = float32(math.Inf(-1))

View Source

var PositiveInfinity = float32(math.Inf(1))

Functions ¶

func Abs ¶

func Abs(val float32) float32

func CandidateIsLess ¶

func CandidateIsLess(r1, r2 DocScore) bool

func CheckIntersection ¶

func CheckIntersection(yValue float32, p1, p2 CustomPoint, insideMin, insideMax *float32)

func CloseWriters ¶

func CloseWriters(db *FsScoreDb) error

func ComputeCustomFunc ¶

func ComputeCustomFunc(x float32, points []CustomPoint) float32

func EnsureDirectory ¶

func EnsureDirectory(dir string) error

func Exists ¶

func Exists(path string) bool

func FileIsAtEnd ¶

func FileIsAtEnd(file *os.File) bool

func Max ¶

func Max(v1, v2 float32) float32

func MaxDocsForFile ¶

func MaxDocsForFile(fileInfo *FileInfo) int64

func Min ¶

func Min(v1, v2 float32) float32

func Pow ¶

func Pow(val, exp float32) float32

func QueryFloatVal ¶

func QueryFloatVal(queryParams url.Values, key string, defaultValue float32) (float32, error)

func QueryIntVal ¶

func QueryIntVal(queryParams url.Values, key string, defaultValue int) (int, error)

func ReadNativeLong ¶

func ReadNativeLong(buf []byte) uint64

func RunBenchmark ¶

func RunBenchmark(db LinearCombinationBackend, csvFilename string, maxRecords int64) ([]int64, []int64, [][]int64, error)

func RunItr ¶

func RunItr(itr DocItr, myWorkerNum int, resultChannel chan CandidateResult, boundsChannel chan Bounds)

func ServeHttp ¶

func ServeHttp(addr string, db Db, readOnly bool) error

func ShardIdToExt ¶

func ShardIdToExt(idInShard int64, shardNum int) int64

func ToFloat32 ¶

func ToFloat32(val interface{}) (float32, error)

func WriteNativeLong ¶

func WriteNativeLong(val uint64, writer io.Writer) error

func WritePostingListEntry ¶

func WritePostingListEntry(fileInfo *FileInfo, docId int64, score float32)

Types ¶

type BaseDb ¶

type BaseDb struct {
	StreamingDb StreamingDb
	IdDb        IdBackend
}

func (BaseDb) BulkIndex ¶

func (db BaseDb) BulkIndex(records []Record) error

func (BaseDb) Index ¶

func (db BaseDb) Index(id string, values map[string]float32) error

func (BaseDb) LinearQuery ¶

func (db BaseDb) LinearQuery(numResults int, weights map[string]float32) []string

func (BaseDb) Query ¶

func (db BaseDb) Query(query Query) (QueryResult, error)

type BaseDbResultSet ¶

type BaseDbResultSet []DocScore

func (BaseDbResultSet) Len ¶

func (h BaseDbResultSet) Len() int

func (BaseDbResultSet) Less ¶

func (h BaseDbResultSet) Less(i, j int) bool

func (*BaseDbResultSet) Pop ¶

func (h *BaseDbResultSet) Pop() interface{}

func (*BaseDbResultSet) Push ¶

func (h *BaseDbResultSet) Push(x interface{})

func (BaseDbResultSet) Swap ¶

func (h BaseDbResultSet) Swap(i, j int)

type BaseStreamingDb ¶

type BaseStreamingDb struct {
	Backend DbBackend
}

func (BaseStreamingDb) BulkIndex ¶

func (db BaseStreamingDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (BaseStreamingDb) QueryItr ¶

func (db BaseStreamingDb) QueryItr(scorer []interface{}) (DocItr, error)

type BitReader ¶

type BitReader struct {
	OrigMmap        *mmap.MMap
	Mmap            []uint64
	MmapPtr         uint
	MmapPtrBitsLeft uint
	File            *os.File
	Cur             uint64
	CurBitsLeft     uint
}

func NewBitReader ¶

func NewBitReader(file *os.File) (*BitReader, error)

func (*BitReader) Close ¶

func (reader *BitReader) Close() error

func (*BitReader) ReadBits ¶

func (reader *BitReader) ReadBits(numBits uint) (uint64, error)

func (*BitReader) ReadVarUInt32 ¶

func (reader *BitReader) ReadVarUInt32() (uint32, error)

func (*BitReader) Refill ¶

func (reader *BitReader) Refill(cur uint64, bitsLeft uint, numNeeded uint) (uint64, uint, error)

type BitWriter ¶

type BitWriter struct {
	BufferedWriter *bufio.Writer
	File           *os.File
	Cur            uint64
	CurBitsUsed    uint
}

func NewBitWriter ¶

func NewBitWriter(file *os.File) (*BitWriter, error)

func (*BitWriter) Close ¶

func (writer *BitWriter) Close() error

func (*BitWriter) WriteBits ¶

func (writer *BitWriter) WriteBits(val uint64, numBits uint) error

func (*BitWriter) WriteVarUInt32 ¶

func (writer *BitWriter) WriteVarUInt32(val uint32) error

type BoltIdDb ¶

type BoltIdDb struct {
	Db *bolt.DB
}

func NewBoltIdDb ¶

func NewBoltIdDb(file string) (*BoltIdDb, error)

func (*BoltIdDb) Get ¶

func (db *BoltIdDb) Get(scoreIds []int64) ([]string, error)

func (*BoltIdDb) Put ¶

func (db *BoltIdDb) Put(scoreIds []int64, clientIds []string) error

type Bounds ¶

type Bounds struct {
	// contains filtered or unexported fields
}

type CandidateResult ¶

type CandidateResult struct {
	DocId     int64
	Score     float32
	WorkerNum int
}

type CustomLinearDocItr ¶

type CustomLinearDocItr struct {
	// contains filtered or unexported fields
}

Remaps a value according to a user-specified function that linearly interpolates among a set of (x, y) points.

func (*CustomLinearDocItr) Close ¶

func (op *CustomLinearDocItr) Close()

func (*CustomLinearDocItr) Cur ¶

func (op *CustomLinearDocItr) Cur() (int64, float32)

func (*CustomLinearDocItr) GetBounds ¶

func (op *CustomLinearDocItr) GetBounds() (min, max float32)

func (*CustomLinearDocItr) Name ¶

func (op *CustomLinearDocItr) Name() string

func (*CustomLinearDocItr) Next ¶

func (op *CustomLinearDocItr) Next(minId int64) bool

func (*CustomLinearDocItr) SetBounds ¶

func (op *CustomLinearDocItr) SetBounds(outsideMin, outsideMax float32) bool

type CustomMapDocItr ¶

type CustomMapDocItr struct {
	// contains filtered or unexported fields
}

Remaps a value according to a user-specified mapping of values to scores

func (*CustomMapDocItr) Close ¶

func (op *CustomMapDocItr) Close()

func (*CustomMapDocItr) ComputeCustomFunc ¶

func (op *CustomMapDocItr) ComputeCustomFunc(val float32) float32

func (*CustomMapDocItr) Cur ¶

func (op *CustomMapDocItr) Cur() (int64, float32)

func (*CustomMapDocItr) GetBounds ¶

func (op *CustomMapDocItr) GetBounds() (min, max float32)

func (*CustomMapDocItr) Name ¶

func (op *CustomMapDocItr) Name() string

func (*CustomMapDocItr) Next ¶

func (op *CustomMapDocItr) Next(minId int64) bool

func (*CustomMapDocItr) SetBounds ¶

func (op *CustomMapDocItr) SetBounds(outsideMin, outsideMax float32) bool

type CustomPoint ¶

type CustomPoint struct {
	X, Y float32
}

func ToXyPoints ¶

func ToXyPoints(input interface{}) ([]CustomPoint, error)

type Db ¶

type Db interface {
	BulkIndex(records []Record) error
	Index(id string, values map[string]float32) error
	Query(query Query) (QueryResult, error)
}

type DbBackend ¶

type DbBackend interface {
	BulkIndex(records []map[string]float32) ([]int64, error)
	FieldDocItr(field string) DocItr
}

type DiffDocItr ¶

type DiffDocItr struct {
	// contains filtered or unexported fields
}

(Absolute) difference between a value and a constant

func (*DiffDocItr) Close ¶

func (op *DiffDocItr) Close()

func (*DiffDocItr) Cur ¶

func (op *DiffDocItr) Cur() (int64, float32)

func (*DiffDocItr) GetBounds ¶

func (op *DiffDocItr) GetBounds() (min, max float32)

func (*DiffDocItr) Name ¶

func (op *DiffDocItr) Name() string

func (*DiffDocItr) Next ¶

func (op *DiffDocItr) Next(minId int64) bool

func (*DiffDocItr) SetBounds ¶

func (op *DiffDocItr) SetBounds(min, max float32) bool

type DocItr ¶

type DocItr interface {
	Name() string

	// return false if the iterator is now known to not produce any more values
	SetBounds(min, max float32) bool

	GetBounds() (min, max float32)

	// Next() skips the iterator ahead to at least as far as the given id.
	// It always advances the iterator at least one position.
	// It Returns false if there are no remaining values.
	// Iterators need a call to Next(0) to intialize them to a real value; they all initially have a docId of -1
	Next(minId int64) bool

	Close() // release resources held by this iterator (if any)

	Cur() (int64, float32) // doc id and score of current result, or (-1, 0.0) if the iterator has not been initialized

}

func NewPostingListDocItr ¶

func NewPostingListDocItr(rangePrefix uint32, path string, header *PostingListHeader, numVarBits uint) DocItr

type DocScore ¶

type DocScore struct {
	DocId int64
	Score float32
}

type EsQueryResponse ¶

type EsQueryResponse struct {
	Hits struct {
		Hits []struct {
			Id string `json:"_id"`
		} `json:"hits"`
	} `json:"hits"`
}

type EsScoreDb ¶

type EsScoreDb struct {
	BaseURL, Index string
}

func (*EsScoreDb) BulkIndex ¶

func (db *EsScoreDb) BulkIndex(records []Record) error

func (*EsScoreDb) CreateIndex ¶

func (db *EsScoreDb) CreateIndex()

func (*EsScoreDb) DeleteIndex ¶

func (db *EsScoreDb) DeleteIndex()

func (*EsScoreDb) LinearQuery ¶

func (db *EsScoreDb) LinearQuery(numResults int, weights map[string]float32) []string

func (*EsScoreDb) ParseQuery ¶

func (db *EsScoreDb) ParseQuery(query string) map[string]float32

func (*EsScoreDb) RefreshIndex ¶

func (db *EsScoreDb) RefreshIndex()

type FieldDocItr ¶

type FieldDocItr struct {
	// contains filtered or unexported fields
}

func NewFieldDocItr ¶

func NewFieldDocItr(field string, lists FieldDocItrs) *FieldDocItr

func (*FieldDocItr) Close ¶

func (op *FieldDocItr) Close()

func (*FieldDocItr) Cur ¶

func (op *FieldDocItr) Cur() (int64, float32)

func (*FieldDocItr) GetBounds ¶

func (op *FieldDocItr) GetBounds() (min, max float32)

func (*FieldDocItr) Name ¶

func (op *FieldDocItr) Name() string

func (*FieldDocItr) Next ¶

func (op *FieldDocItr) Next(minId int64) bool

func (*FieldDocItr) SetBounds ¶

func (op *FieldDocItr) SetBounds(min, max float32) bool

type FieldDocItrs ¶

type FieldDocItrs []DocItr // FieldDocItrs implements heap.Interface

func (FieldDocItrs) Len ¶

func (so FieldDocItrs) Len() int

func (FieldDocItrs) Less ¶

func (so FieldDocItrs) Less(i, j int) bool

func (*FieldDocItrs) Pop ¶

func (so *FieldDocItrs) Pop() interface{}

func (*FieldDocItrs) Push ¶

func (so *FieldDocItrs) Push(x interface{})

func (FieldDocItrs) Swap ¶

func (so FieldDocItrs) Swap(i, j int)

type FileInfo ¶

type FileInfo struct {
	// contains filtered or unexported fields
}

func FindPostingListFileForWrite ¶

func FindPostingListFileForWrite(db *FsScoreDb, docId int64, key string, value float32) (*FileInfo, error)

func MakeFileInfo ¶

func MakeFileInfo(fieldDir string, value float32, numVarBits uint, docId int64) (*FileInfo, error)

type FsScoreDb ¶

type FsScoreDb struct {
	// contains filtered or unexported fields
}

func NewFsScoreDb ¶

func NewFsScoreDb(dataDir string) *FsScoreDb

func (*FsScoreDb) BulkIndex ¶

func (db *FsScoreDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (*FsScoreDb) FieldDocItr ¶

func (db *FsScoreDb) FieldDocItr(fieldName string) DocItr

func (*FsScoreDb) Index ¶

func (db *FsScoreDb) Index(record map[string]float32) (int64, error)

type IdBackend ¶

type IdBackend interface {
	Put(scoreIds []int64, clientIds []string) error
	Get(scoreIds []int64) ([]string, error)
}

type LinearCombinationBackend ¶

type LinearCombinationBackend interface {
	BulkIndex(records []Record) error
	LinearQuery(numResults int, coefs map[string]float32) []string
}

type MemoryDocItr ¶

type MemoryDocItr struct {
	// contains filtered or unexported fields
}

func NewMemoryDocItr ¶

func NewMemoryDocItr(scores []float32, docs []int64) *MemoryDocItr

func (*MemoryDocItr) Close ¶

func (op *MemoryDocItr) Close()

func (*MemoryDocItr) Cur ¶

func (op *MemoryDocItr) Cur() (int64, float32)

func (*MemoryDocItr) GetBounds ¶

func (op *MemoryDocItr) GetBounds() (min, max float32)

func (*MemoryDocItr) Name ¶

func (op *MemoryDocItr) Name() string

func (*MemoryDocItr) Next ¶

func (op *MemoryDocItr) Next(minId int64) bool

func (*MemoryDocItr) SetBounds ¶

func (op *MemoryDocItr) SetBounds(min, max float32) bool

type MemoryIdDb ¶

type MemoryIdDb struct {
	// contains filtered or unexported fields
}

func NewMemoryIdDb ¶

func NewMemoryIdDb() MemoryIdDb

func (MemoryIdDb) Get ¶

func (db MemoryIdDb) Get(scoreIds []int64) ([]string, error)

func (MemoryIdDb) Put ¶

func (db MemoryIdDb) Put(scoreIds []int64, clientIds []string) error

type MemoryScoreDb ¶

type MemoryScoreDb struct {
	Fields map[string][]float32
	// contains filtered or unexported fields
}

func NewMemoryScoreDb ¶

func NewMemoryScoreDb() *MemoryScoreDb

func (*MemoryScoreDb) BulkIndex ¶

func (db *MemoryScoreDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (*MemoryScoreDb) FieldDocItr ¶

func (db *MemoryScoreDb) FieldDocItr(fieldName string) DocItr

type MemoryScoreDocItr ¶

type MemoryScoreDocItr struct {
	// contains filtered or unexported fields
}

func NewMemoryScoreDocItr ¶

func NewMemoryScoreDocItr(scores []float32) *MemoryScoreDocItr

func (*MemoryScoreDocItr) Close ¶

func (op *MemoryScoreDocItr) Close()

func (*MemoryScoreDocItr) Cur ¶

func (op *MemoryScoreDocItr) Cur() (int64, float32)

func (*MemoryScoreDocItr) GetBounds ¶

func (op *MemoryScoreDocItr) GetBounds() (min, max float32)

func (*MemoryScoreDocItr) Name ¶

func (op *MemoryScoreDocItr) Name() string

func (*MemoryScoreDocItr) Next ¶

func (op *MemoryScoreDocItr) Next(minId int64) bool

func (*MemoryScoreDocItr) SetBounds ¶

func (op *MemoryScoreDocItr) SetBounds(min, max float32) bool

type MigratableDb ¶

type MigratableDb struct {
	Current Db
}

func (*MigratableDb) BulkIndex ¶

func (db *MigratableDb) BulkIndex(records []Record) error

func (*MigratableDb) Index ¶

func (db *MigratableDb) Index(id string, values map[string]float32) error

func (*MigratableDb) Query ¶

func (db *MigratableDb) Query(query Query) (QueryResult, error)

type MinComponents ¶

type MinComponents []DocItr

func (MinComponents) Len ¶

func (a MinComponents) Len() int

func (MinComponents) Less ¶

func (a MinComponents) Less(i, j int) bool

func (MinComponents) Swap ¶

func (a MinComponents) Swap(i, j int)

type MinDocItr ¶

type MinDocItr struct {
	// contains filtered or unexported fields
}

func NewMinDocItr ¶

func NewMinDocItr(itrs []DocItr) *MinDocItr

func (*MinDocItr) Close ¶

func (op *MinDocItr) Close()

func (*MinDocItr) Cur ¶

func (op *MinDocItr) Cur() (int64, float32)

func (*MinDocItr) GetBounds ¶

func (op *MinDocItr) GetBounds() (min, max float32)

func (*MinDocItr) Name ¶

func (op *MinDocItr) Name() string

func (*MinDocItr) Next ¶

func (op *MinDocItr) Next(minId int64) bool

func (*MinDocItr) SetBounds ¶

func (op *MinDocItr) SetBounds(min, max float32) bool

type OrderedFileInfos ¶

type OrderedFileInfos []*FileInfo

func (OrderedFileInfos) Len ¶

func (a OrderedFileInfos) Len() int

func (OrderedFileInfos) Less ¶

func (a OrderedFileInfos) Less(i, j int) bool

func (OrderedFileInfos) Swap ¶

func (a OrderedFileInfos) Swap(i, j int)

type ParallelDocItr ¶

type ParallelDocItr struct {
	NumAlive      int
	Bounds        Bounds
	ResultChannel chan CandidateResult
	Comms         []chan Bounds
	// contains filtered or unexported fields
}

func NewParallelDocItr ¶

func NewParallelDocItr(parts []DocItr) *ParallelDocItr

func (*ParallelDocItr) Close ¶

func (op *ParallelDocItr) Close()

func (*ParallelDocItr) Cur ¶

func (op *ParallelDocItr) Cur() (int64, float32)

func (*ParallelDocItr) GetBounds ¶

func (op *ParallelDocItr) GetBounds() (min, max float32)

func (*ParallelDocItr) Name ¶

func (op *ParallelDocItr) Name() string

func (*ParallelDocItr) Next ¶

func (op *ParallelDocItr) Next(minId int64) bool

func (*ParallelDocItr) SetBounds ¶

func (op *ParallelDocItr) SetBounds(min, max float32) bool

type PostingListDocItr ¶

type PostingListDocItr struct {
	// contains filtered or unexported fields
}

func (*PostingListDocItr) Close ¶

func (op *PostingListDocItr) Close()

func (*PostingListDocItr) Cur ¶

func (op *PostingListDocItr) Cur() (int64, float32)

func (*PostingListDocItr) GetBounds ¶

func (op *PostingListDocItr) GetBounds() (min, max float32)

func (*PostingListDocItr) Name ¶

func (op *PostingListDocItr) Name() string

func (*PostingListDocItr) Next ¶

func (op *PostingListDocItr) Next(minId int64) bool

func (*PostingListDocItr) SetBounds ¶

func (op *PostingListDocItr) SetBounds(min, max float32) bool

type PostingListHeader ¶

type PostingListHeader struct {
	FirstDocId    int64
	LastDocId     int64
	NumDocs       int64
	MinVal        float32
	MaxVal        float32
	FirstDocScore float32
	Version       uint8
	// contains filtered or unexported fields
}

type PowDocItr ¶

type PowDocItr struct {
	// contains filtered or unexported fields
}

Takes a constant power of a value. Important: for bounds caluclation reasons, assumes only positive values are provided as inputs!

func NewPowDocItr ¶

func NewPowDocItr(itr DocItr, exp float32) *PowDocItr

func (*PowDocItr) Close ¶

func (op *PowDocItr) Close()

func (*PowDocItr) Cur ¶

func (op *PowDocItr) Cur() (int64, float32)

func (*PowDocItr) GetBounds ¶

func (op *PowDocItr) GetBounds() (min, max float32)

func (*PowDocItr) Name ¶

func (op *PowDocItr) Name() string

func (*PowDocItr) Next ¶

func (op *PowDocItr) Next(minId int64) bool

func (*PowDocItr) SetBounds ¶

func (op *PowDocItr) SetBounds(min, max float32) bool

type ProductComponents ¶

type ProductComponents []DocItr

func (ProductComponents) Len ¶

func (a ProductComponents) Len() int

func (ProductComponents) Less ¶

func (a ProductComponents) Less(i, j int) bool

func (ProductComponents) Swap ¶

func (a ProductComponents) Swap(i, j int)

type ProductDocItr ¶

type ProductDocItr struct {
	// contains filtered or unexported fields
}

func NewProductDocItr ¶

func NewProductDocItr(itrs []DocItr) *ProductDocItr

func (*ProductDocItr) Close ¶

func (op *ProductDocItr) Close()

func (*ProductDocItr) Cur ¶

func (op *ProductDocItr) Cur() (int64, float32)

func (*ProductDocItr) GetBounds ¶

func (op *ProductDocItr) GetBounds() (min, max float32)

func (*ProductDocItr) Name ¶

func (op *ProductDocItr) Name() string

func (*ProductDocItr) Next ¶

func (op *ProductDocItr) Next(minId int64) bool

func (*ProductDocItr) SetBounds ¶

func (op *ProductDocItr) SetBounds(min, max float32) bool

type Query ¶

type Query struct {
	Offset   int
	Limit    int
	MinScore float32

	// mixed, nested arrays of strings and numbers describing a function; for example: ["sum", ["field", "age"], ["field", "height"]]
	Scorer []interface{}
}

type QueryResult ¶

type QueryResult struct {
	Ids    []string
	Scores []float32
}

type Record ¶

type Record struct {
	Id     string
	Values map[string]float32
}

type ScaleDocItr ¶

type ScaleDocItr struct {
	// contains filtered or unexported fields
}

Multiplies a value by a constant

func (*ScaleDocItr) Close ¶

func (op *ScaleDocItr) Close()

func (*ScaleDocItr) Cur ¶

func (op *ScaleDocItr) Cur() (int64, float32)

func (*ScaleDocItr) GetBounds ¶

func (op *ScaleDocItr) GetBounds() (min, max float32)

func (*ScaleDocItr) Name ¶

func (op *ScaleDocItr) Name() string

func (*ScaleDocItr) Next ¶

func (op *ScaleDocItr) Next(minId int64) bool

func (*ScaleDocItr) SetBounds ¶

func (op *ScaleDocItr) SetBounds(min, max float32) bool

type ScoreDbServer ¶

type ScoreDbServer struct {
	Db                    Db
	ReadOnly, AutoMigrate bool
}

func (*ScoreDbServer) ServeHTTP ¶

func (sds *ScoreDbServer) ServeHTTP(w http.ResponseWriter, req *http.Request)

type ShardedDb ¶

type ShardedDb struct {
	Shards []StreamingDb
}

func NewShardedDb ¶

func NewShardedDb(shards []StreamingDb) (*ShardedDb, error)

func (ShardedDb) BulkIndex ¶

func (db ShardedDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (ShardedDb) QueryItr ¶

func (db ShardedDb) QueryItr(scorer []interface{}) (DocItr, error)

type StreamingDb ¶

type StreamingDb interface {
	BulkIndex(records []map[string]float32) ([]int64, error)
	QueryItr(Scorer []interface{}) (DocItr, error)
}

type StubDb ¶

type StubDb struct {
	// contains filtered or unexported fields
}

func (*StubDb) BulkIndex ¶

func (sdb *StubDb) BulkIndex(records []map[string]float32) ([]int64, error)

func (*StubDb) Index ¶

func (sdb *StubDb) Index(record map[string]float32) (int64, error)

func (*StubDb) LinearQuery ¶

func (db *StubDb) LinearQuery(numResults int, coefs map[string]float32) []string

func (*StubDb) Query ¶

func (db *StubDb) Query(query Query) (QueryResult, error)

type SumComponent ¶

type SumComponent struct {
	// contains filtered or unexported fields
}

type SumComponents ¶

type SumComponents []SumComponent

func (SumComponents) Len ¶

func (a SumComponents) Len() int

func (SumComponents) Less ¶

func (a SumComponents) Less(i, j int) bool

func (SumComponents) Swap ¶

func (a SumComponents) Swap(i, j int)

type SumDocItr ¶

type SumDocItr struct {
	// contains filtered or unexported fields
}

func NewSumDocItr ¶

func NewSumDocItr(itrs []DocItr) *SumDocItr

func (*SumDocItr) Close ¶

func (op *SumDocItr) Close()

func (*SumDocItr) Cur ¶

func (op *SumDocItr) Cur() (int64, float32)

func (*SumDocItr) GetBounds ¶

func (op *SumDocItr) GetBounds() (min, max float32)

func (*SumDocItr) Name ¶

func (op *SumDocItr) Name() string

func (*SumDocItr) Next ¶

func (op *SumDocItr) Next(minId int64) bool

func (*SumDocItr) SetBounds ¶

func (op *SumDocItr) SetBounds(min, max float32) bool

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
scoredb

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL