Documentation ¶
Overview ¶
Package minsearch implements a minimal solution to index text and retrieve search results with score.
Index ¶
- Constants
- type File
- func (f *File) AvgCount() (float32, error)
- func (f *File) Close()
- func (f *File) IndexBatch(pairs []Pair, maxIDs int) error
- func (f *File) IndexPair(pair Pair, maxIDs int) error
- func (f *File) KeyCount() (uint32, error)
- func (f *File) LastID() (ID, error)
- func (f *File) Search(query []byte, setOp SetOperation, maxResults int) ([]Result, error)
- func (f *File) SetLastID(id ID) error
- func (f File) String() string
- func (f *File) UpdateStatistics() error
- type ID
- type Pair
- type Result
- type Score
- type SetOperation
Constants ¶
const DefaultMaxResults = 1000000
DefaultMaxResults is a default value for the maximum temporary results during calculation of a search.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type File ¶
type File struct {
// contains filtered or unexported fields
}
File is the index file.
func Open ¶
Open opens the File or creates a new File if it doesn't exist. Setting the noSync flag will cause the database to skip fsync() calls after each commit. In the event of a system failure data can get lost, so setting it is unsafe but makes indexing much faster.
func (*File) AvgCount ¶
AvgCount returns the average number of IDs per key in the database at last calculation. If it wasn't calculated before (UpdateStatistics does it), an error is returned.
func (*File) IndexBatch ¶
IndexBatch indexes all relevant segments for each Pair as a batch operation. See IndexPair for more information.
func (*File) IndexPair ¶
IndexPair indexes all relevant segments of the given Pair. If maxIDs > 0 each indexed segment will only have up to maxIDs different (ID, Score) pairs and only the highest scores are chosen. If maxIDs > 0 and the value is chosen too small, the results could become too bad. Maybe maxIDs in [1000, 10000] is a good choice that limits too common words of a language like "the" or "a" in English. If maxIDs <= 0 the number of scores per segment is not limited. This will yield the best results (under the assumption that the result set is not limited) but definetly the biggest file size and higher temporary memory usage.
func (*File) KeyCount ¶
KeyCount returns the number of keys in the database at last calculation. If it wasn't calculated before (UpdateStatistics does it), an error is returned.
func (*File) LastID ¶
LastID returns the last ID that was saved using SetLastID. This function can be helpful to get the last state of an operation.
func (*File) Search ¶
Search searches the relevant segments of the query in the index file and returns a result set ordered by score. If maxResults > 0 the maximum temporary results _during_ calculation of the search results, which can be much higher than the end result, are limited to maxResults. If for at least one segment the number of results > maxResults it's possible that the result set misses results with higher score. If maxResults <= 0 the memory is not limited. It's recommend to set maxResults > 0 to limit the maximum RAM usage (especially if the SetOperation is set to Union or query is user input).
func (*File) SetLastID ¶
SetLastID stores the given ID (that can be some unrelated type with same byte length) in the statistics of the database. This function can be helpful to store the last state of some operation. The stored value can be retrieved using a call to LastID. Setting the value has no effect on the indexed data.
func (*File) UpdateStatistics ¶
UpdateStatistics calculates the current number of keys and the average data length.
type ID ¶
type ID = uint32
ID is a unique uint32 number like a position or an FNV hash, which is indexed together with a Score.
type Result ¶
Result is a single search result of a result set. It stores the ID and the score depending on the search query.
type Score ¶
type Score = float32
Score is a priority value calculated for each indexed segment per ID.
type SetOperation ¶
type SetOperation uint8
SetOperation is the operation that is done on the result set when the query consists of multiple relevant segments.
const ( // Union collects all search results that match at least one relevant segment of the query. Union SetOperation = iota // Intersection collects all results that match each relevant segment of the query. Intersection )