Documentation ¶
Overview ¶
Package cleo provides a fast search algorithm for prefix matching large amounts of text.
Index ¶
- Constants
- func BuildIndexes(corpusPath string, scoringFunction fn_score)
- func InitIndex(iIndex *InvertedIndex, fIndex *ForwardIndex, corpusPath string)
- func LevenshteinDistance(s, t string) int
- func Max(a ...int) int
- func Min(a ...int) int
- func Score(query, candidate string) float64
- func TestBytesFromQuery(bf int, qBloom int) bool
- type ByScore
- type Document
- type ForwardIndex
- type InvertedIndex
- type RankedResult
- type RankedResults
Constants ¶
const ( FNV_BASIS_64 = uint64(14695981039346656037) FNV_PRIME_64 = uint64((1 << 40) + 435) FNV_MASK_64 = uint64(^uint64(0) >> 1) NUM_BITS = 64 FNV_BASIS_32 = uint32(0x811c9dc5) FNV_PRIME_32 = uint32((1 << 24) + 403) FNV_MASK_32 = uint32(^uint32(0) >> 1) )
Used for the bloom filter
Variables ¶
This section is empty.
Functions ¶
func BuildIndexes ¶
func BuildIndexes(corpusPath string, scoringFunction fn_score)
func InitIndex ¶
func InitIndex(iIndex *InvertedIndex, fIndex *ForwardIndex, corpusPath string)
func LevenshteinDistance ¶
Levenshtein distance is the number of inserts, deletions, and substitutions that differentiate one word from another. This algorithm is dynamic programming found at http://en.wikipedia.org/wiki/Levenshtein_distance
func TestBytesFromQuery ¶
Iterates through all of the 8 bytes (64 bits) and tests each bit that is set to 1 in the query's filter against the bit in the comparison's filter. If the bit is not also 1, you do not have a match.
Types ¶
type ByScore ¶
type ByScore struct{ RankedResults }
type ForwardIndex ¶
Forward Index - Maps the document id to the document
func NewForwardIndex ¶
func NewForwardIndex() *ForwardIndex
func (*ForwardIndex) AddDoc ¶
func (x *ForwardIndex) AddDoc(docId int, doc string)
type InvertedIndex ¶
Inverted Index - Maps the query prefix to the matching documents
func NewInvertedIndex ¶
func NewInvertedIndex() *InvertedIndex
func (*InvertedIndex) Search ¶
func (x *InvertedIndex) Search(query string) []Document
func (*InvertedIndex) Size ¶
func (x *InvertedIndex) Size() int
type RankedResult ¶
func CleoSearch ¶
func CleoSearch(iIndex *InvertedIndex, fIndex *ForwardIndex, query string) []RankedResult
This is the meat of the search. It first checks the inverted index for matches, then filters the potentially numerous results using the bloom filter. Finally, it ranks the word using a Levenshtein distance.
type RankedResults ¶
type RankedResults []RankedResult
func (RankedResults) Len ¶
func (s RankedResults) Len() int
func (RankedResults) Swap ¶
func (s RankedResults) Swap(i, j int)