search

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 31, 2022 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Distance

func Distance(s1, s2 string, c DistanceCalculator) (float64, error)

Distance calculates the similarity between two strings. The DistanceCalculator determines which algorithm is used.

This function is a wrapper for the matchr library. The documentation of the DistanceCalculator constants and parts of the testdata are adopted from there. For more information about the implementation, see http://github.com/antzucaro/matchr.

There are two groups of algorithms:

Edit distance: Levenshstein, Damerau-Levenshtein, Hamming, Jaro-Winkler, SmithWaterman

Sound similarity: Metaphone, Nysiis, Osa, Phonex, Soundex

func IsFuzzyMatch

func IsFuzzyMatch(s1, s2 string, fuzziness float64, c DistanceCalculator) (bool, error)

IsFuzzyMatch determines if two strings are similar enough within the specified fuzziness.

func IsPhraseMatch

func IsPhraseMatch(pos1, pos2, slop int) (bool, error)

IsPhraseMatch is a helper to determine if two positions are close enough to be part of the same phrase. This is used for phrase queries. Default slop is 0.

func LuceneASCIIFolding

func LuceneASCIIFolding(str string) string

FoldASCII converts Unicode characters to ASCII equivalent. When none is found the original unicode is returned.

The native Go solution in NativeASCIIFolding() does not produce the exact same result as the Lucene 'ASCIIFoldingFilter'.

func NativeASCIIFolding

func NativeASCIIFolding(text string) string

NativeASCIIFolding uses the Go native ASCII folding functionality. This is sufficient in most cases. When full compliance with the Lucene ASCIIFoldingFilter is required, use LuceneASCIIFolding().

func Transform

func Transform(s1 string, pp PhoneticPreprocessor) (string, error)

Types

type Analyzer

type Analyzer struct{}

Analyzer is the default analyzer for Search actions. It folds unicode to ASCII characters and lowercases them all.

The goal is to have this analyzer behave similarly to the ElasticSearch Analyzer that Ikuzo comes preconfigured with.

func (*Analyzer) Transform

func (a *Analyzer) Transform(text string) string

func (*Analyzer) TransformPhrase

func (a *Analyzer) TransformPhrase(text string) string

type AutoComplete added in v0.1.3

type AutoComplete struct {
	SuggestFn func(a Autos) Autos
	// contains filtered or unexported fields
}

func NewAutoComplete added in v0.1.3

func NewAutoComplete() *AutoComplete

func (*AutoComplete) FromStrings added in v0.1.3

func (ac *AutoComplete) FromStrings(words []string)

func (*AutoComplete) FromTokenSteam added in v0.1.3

func (ac *AutoComplete) FromTokenSteam(stream *TokenStream)

func (*AutoComplete) Suggest added in v0.1.3

func (ac *AutoComplete) Suggest(input string, limit int) ([]Autos, error)

type Autos added in v0.1.3

type Autos struct {
	Term     string
	Count    int
	Metadata map[string][]string
}

type DistanceCalculator

type DistanceCalculator int
const (
	// Levenshtein computes the Levenshtein distance between two
	// strings. The returned value - distance - is the number of insertions,
	// deletions, and substitutions it takes to transform one
	// string (s1) into another (s2). Each step in the transformation "costs"
	// one distance point.
	Levenshtein DistanceCalculator = iota

	// DamerauLevenshtein computes the Damerau-Levenshtein distance between two
	// strings. The returned value - distance - is the number of insertions,
	// deletions, substitutions, and transpositions it takes to transform one
	// string (s1) into another (s2). Each step in the transformation "costs"
	// one distance point. It is similar to the Optimal String Alignment,
	// algorithm, but is more complex because it allows multiple edits on
	// substrings.
	DamerauLevenshtein

	// Hamming computes the Hamming distance between two equal-length strings.
	// This is the number of times the two strings differ between characters at
	// the same index. This implementation is based off of the algorithm
	// description found at http://en.wikipedia.org/wiki/Hamming_distance.
	Hamming

	// Jaro computes the Jaro edit distance between two strings. It represents
	// this with a float64 between 0 and 1 inclusive, with 0 indicating the two
	// strings are not at all similar and 1 indicating the two strings are exact
	// matches.
	//
	// See http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance for a
	// full description.
	Jaro

	// JaroWinkler computes the Jaro-Winkler edit distance between two strings.
	// This is a modification of the Jaro algorithm that gives additional weight
	// to prefix matches.
	JaroWinkler

	// OSA computes the Optimal String Alignment distance between two
	// strings. The returned value - distance - is the number of insertions,
	// deletions, substitutions, and transpositions it takes to transform one
	// string (s1) into another (s2). Each step in the transformation "costs"
	// one distance point. It is similar to Damerau-Levenshtein, but is simpler
	// because it does not allow multiple edits on any substring.
	Osa

	// SmithWaterman computes the Smith-Waterman local sequence alignment for the
	// two input strings. This was originally designed to find similar regions in
	// strings representing DNA or protein sequences.
	SmithWaterman
)

func (DistanceCalculator) String

func (dc DistanceCalculator) String() string

type Matches added in v0.1.3

type Matches struct {
	// contains filtered or unexported fields
}

func NewMatches added in v0.1.3

func NewMatches() *Matches

func (*Matches) AppendTerm added in v0.1.3

func (m *Matches) AppendTerm(term string, tv *Vectors)

func (*Matches) DocCount added in v0.1.3

func (m *Matches) DocCount() int

func (*Matches) HasDocID added in v0.1.3

func (m *Matches) HasDocID(docID int) bool

func (*Matches) Merge added in v0.1.3

func (m *Matches) Merge(matches *Matches)

func (*Matches) Reset added in v0.1.7

func (m *Matches) Reset()

Reset is used when already gathered matches must be reset when ErrSearchNoMatch is returned.

func (*Matches) TermCount added in v0.1.3

func (m *Matches) TermCount() int

func (*Matches) TermFrequency added in v0.1.3

func (m *Matches) TermFrequency() map[string]int

func (*Matches) Total added in v0.1.3

func (m *Matches) Total() int

func (*Matches) Vectors added in v0.1.3

func (m *Matches) Vectors() *Vectors

type Operator

type Operator string
const (
	AndOperator      Operator = "AND"
	BoostOperator    Operator = "^"
	FieldOperator    Operator = ":"
	FuzzyOperator    Operator = "~"
	NilOperator      Operator = ""
	NotOperator      Operator = "NOT"
	OrOperator       Operator = "OR"
	WildCardOperator Operator = "*"
)

type PhoneticPreprocessor

type PhoneticPreprocessor int
const (
	// DoubleMetaphone computes the Double-Metaphone value of the input string.
	// This value is a phonetic representation of how the string sounds, with
	// affordances for many different language dialects. It was originally
	// developed by Lawrence Phillips in the 1990s.
	//
	// More information about this algorithm can be found on Wikipedia at
	// http://en.wikipedia.org/wiki/Metaphone.
	DoubleMetaphone PhoneticPreprocessor = iota

	// NYSIIS computes the NYSIIS phonetic encoding of the input string. It is a
	// modification of the traditional Soundex algorithm.
	Nysiis

	// Phonex computes the Phonex phonetic encoding of the input string. Phonex is
	// a modification of the venerable Soundex algorithm. It accounts for a few
	// more letter combinations to improve accuracy on some data sets.
	//
	// This implementation is based off of the original C implementation by the
	// creator - A. J. Lait - as found in his research paper entitled "An
	// Assessment of Name Matching Algorithms."
	Phonex

	// Soundex computes the Soundex phonetic representation of the input string. It
	// attempts to encode homophones with the same characters. More information can
	// be found at http://en.wikipedia.org/wiki/Soundex.
	Soundex
)

func (PhoneticPreprocessor) String

func (pp PhoneticPreprocessor) String() string

type QueryOption

type QueryOption func(*QueryParser) error

func SetDefaultOperator

func SetDefaultOperator(op Operator) QueryOption

SetDefaultOperator sets the default boolean search operator for the query

func SetFields

func SetFields(field ...string) QueryOption

SetFields sets the default search fields for the query

type QueryParser

type QueryParser struct {
	// contains filtered or unexported fields
}

term1* -- Searches for the prefix term1 term1\* -- Searches for the term term1* term*1 -- Searches for the term term*1 term\*1 -- Searches for the term term*1 Note that above examples consider the terms before text processing.

The specification and documentation is adopted from the Lucene documentation for the SimpleQueryParser: https://lucene.apache.org/core/6_6_1/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.html

func NewQueryParser

func NewQueryParser(options ...QueryOption) (*QueryParser, error)

NewQueryParser returns a QueryParser that can be used to parse user queries.

func (*QueryParser) Fields

func (qp *QueryParser) Fields() []string

Fields returns the default search fields for the query

func (*QueryParser) Parse

func (qp *QueryParser) Parse(query string) (*QueryTerm, error)

type QueryTerm

type QueryTerm struct {
	Field          string
	Value          string
	Prohibited     bool
	Phrase         bool
	SuffixWildcard bool
	PrefixWildcard bool
	Boost          float64
	Fuzzy          int // fuzzy is for words
	Slop           int // slop is for phrases
	// contains filtered or unexported fields
}

func (*QueryTerm) IsBoolQuery

func (qt *QueryTerm) IsBoolQuery() bool

isBoolQuery returns true if the QueryTerm has a nested QueryTerm in a Boolean clause.

func (*QueryTerm) Must

func (qt *QueryTerm) Must() []*QueryTerm

Must returns a list of Required QueryTerms.

func (*QueryTerm) MustNot

func (qt *QueryTerm) MustNot() []*QueryTerm

MustNot returns a list of Prohibited QueryTerms.

func (*QueryTerm) Should

func (qt *QueryTerm) Should() []*QueryTerm

Should returns a list of Optional QueryTerms. One or more must match to satistify the Query.

func (*QueryTerm) Type

func (qt *QueryTerm) Type() QueryType

Type returns the type of the Query.

type QueryType

type QueryType int
const (
	BoolQuery QueryType = iota
	FuzzyQuery
	PhraseQuery
	TermQuery
	WildCardQuery
)

func (QueryType) String

func (qt QueryType) String() string

type SpellCheckOption added in v0.1.3

type SpellCheckOption func(*SpellChecker)

func SetSuggestDepth added in v0.1.3

func SetSuggestDepth(depth int) SpellCheckOption

func SetThreshold added in v0.1.3

func SetThreshold(threshold int) SpellCheckOption

type SpellChecker added in v0.1.3

type SpellChecker struct {
	// contains filtered or unexported fields
}

func NewSpellCheck added in v0.1.3

func NewSpellCheck(options ...SpellCheckOption) *SpellChecker

func (*SpellChecker) SetCount added in v0.1.3

func (s *SpellChecker) SetCount(term string, count int, suggest bool)

func (*SpellChecker) SpellCheck added in v0.1.3

func (s *SpellChecker) SpellCheck(input string) string

Return the most likely correction for the input termgg

func (*SpellChecker) SpellCheckSuggestions added in v0.1.3

func (s *SpellChecker) SpellCheckSuggestions(input string, n int) []string

Return the most likely corrections in order from best to worst

func (*SpellChecker) Train added in v0.1.3

func (s *SpellChecker) Train(stream *TokenStream)

type Token added in v0.1.3

type Token struct {
	Vector        int
	TermVector    int
	OffsetStart   int
	OffsetEnd     int
	Ignored       bool
	RawText       string
	Normal        string
	TrailingSpace bool
	Punctuation   bool
	DocID         int
}

func (*Token) GetTermVector added in v0.1.3

func (t *Token) GetTermVector() Vector

type TokenOption added in v0.1.3

type TokenOption func(tok *Tokenizer)

func SetPhraseAware added in v0.1.3

func SetPhraseAware() TokenOption

type TokenStream added in v0.1.3

type TokenStream struct {
	// contains filtered or unexported fields
}

func (*TokenStream) Highlight added in v0.1.3

func (ts *TokenStream) Highlight(vectors *Vectors, tagLabel, emClass string) string

TODO(kiivihal): refactor to reduce cyclo complexity

func (*TokenStream) String added in v0.1.3

func (ts *TokenStream) String() string

func (*TokenStream) Tokens added in v0.1.3

func (ts *TokenStream) Tokens() []Token

type Tokenizer added in v0.1.3

type Tokenizer struct {
	// contains filtered or unexported fields
}

func NewTokenizer added in v0.1.3

func NewTokenizer(options ...TokenOption) *Tokenizer

func (*Tokenizer) Parse added in v0.1.3

func (t *Tokenizer) Parse(r io.Reader, docID int) *TokenStream

Parse creates a stream of tokens from an io.Reader. Each time Parse is called the document count is auto-incremented if a document identifier of 0 is given. Otherwise each call to Parse would effectively create the same vectors as the previous runs.

func (*Tokenizer) ParseBytes added in v0.1.3

func (t *Tokenizer) ParseBytes(b []byte, docID int) *TokenStream

func (*Tokenizer) ParseString added in v0.1.3

func (t *Tokenizer) ParseString(text string, docID int) *TokenStream

type Vector added in v0.1.3

type Vector struct {
	DocID    int
	Location int
}

func ValidPhrasePosition

func ValidPhrasePosition(vector Vector, slop int) []Vector

ValidPhrasePosition returns a list of valid positions from the source position to determine if the term is part of a phrase.

type Vectors added in v0.1.3

type Vectors struct {
	Locations     map[Vector]bool
	Docs          map[int]bool
	PhraseVectors int
}

func NewVectors added in v0.1.3

func NewVectors() *Vectors

func (*Vectors) Add added in v0.1.3

func (tv *Vectors) Add(doc, pos int)

pos must not be 0

func (*Vectors) AddPhraseVector added in v0.1.3

func (tv *Vectors) AddPhraseVector(vector Vector)

func (*Vectors) AddVector added in v0.1.3

func (tv *Vectors) AddVector(vector Vector)

func (*Vectors) DocCount added in v0.1.3

func (tv *Vectors) DocCount() int

func (*Vectors) HasDoc added in v0.1.3

func (tv *Vectors) HasDoc(doc int) bool

func (*Vectors) HasVector added in v0.1.3

func (tv *Vectors) HasVector(vector Vector) bool

func (*Vectors) Merge added in v0.1.3

func (tv *Vectors) Merge(vectors *Vectors)

func (*Vectors) Size added in v0.1.3

func (tv *Vectors) Size() int

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL