Documentation ¶
Index ¶
- Constants
- func CalculateHybridSimilarity(text1, text2 string, opts ...Option) float64
- func FindBestMatchInList(targetText string, texts []string, opts ...Option) (bestMatch string, highestScore float64)
- func FindBestNMatchesInList(targetText string, texts []string, n int, opts ...Option) []utils.Match
- func PreprocessText(text string, opts *SimilarityOptions) string
- type Option
- type SimilarityOptions
Constants ¶
const ( DefaultNgramSize = 2 DefaultWordSimWeight = 0.5 DefaultNgramSimWeight = 0.3 DefaultContainmentSimWeight = 0.2 )
Variables ¶
This section is empty.
Functions ¶
func CalculateHybridSimilarity ¶
CalculateHybridSimilarity calculates a hybrid similarity score between two text strings. It combines different similarity measures (word similarity, n-gram similarity, and containment similarity) with custom weightings to provide an overall similarity score between the two texts.
Parameters: - text1: The first text string for comparison. - text2: The second text string for comparison. - opts: An optional variadic parameter that allows customization of n-gram size and weights
Returns: The hybrid similarity score, which is a weighted combination of the three similarity measures.
func FindBestMatchInList ¶
func FindBestMatchInList(targetText string, texts []string, opts ...Option) (bestMatch string, highestScore float64)
FindBestMatchInList takes a target text and a slice of texts, calculates the similarity for each, and returns the text with the highest similarity score.
func FindBestNMatchesInList ¶
FindBestNMatchesInList searches through a list of texts to find the top `n` texts that are most similar to a target text. It uses a heap to efficiently keep track of the best matches while iterating through the list.
Parameters: - targetText: The text string you want to compare against the list of texts. - texts: A slice of text strings that you want to compare with the target text. - n: The number of top matches you want to find. - opts: Zero or more options that can modify the similarity calculation (such as n-gram size, weights, etc.).
Returns: A slice of Match structs, each containing a text from the input list and its similarity score to the target text. The slice is sorted in descending order of similarity scores, with the highest scoring matches first.
func PreprocessText ¶
func PreprocessText(text string, opts *SimilarityOptions) string
PreprocessText processes the input text for similarity comparison by performing several steps:
- Tokenization: Splitting the text into words (tokens) based on whitespace.
- Normalization: Converting all words to lowercase to ensure case insensitivity.
- Stop word removal: Eliminating common words (stop words) that are unlikely to be useful in the similarity comparison. It uses both a predefined set of stop words and any custom stop words provided in the SimilarityOptions.
- Stemming: Reducing words to their base or root form (stem).
Parameters:
- text: The input text to preprocess.
- opts: A pointer to SimilarityOptions which contains settings for the preprocessing, including any custom stop words to consider.
Returns: A preprocessed version of the input text with all words stemmed and stop words removed, joined into a single string separated by spaces.
Types ¶
type Option ¶
type Option func(*SimilarityOptions)
func WithContainmentSimWeight ¶
WithContainmentSimWeight sets the containmentSimWeight in similarityOptions.
func WithCustomStopWords ¶
WithCustomStopWords allows users to add custom stop words by providing a list of words.
func WithNgramSimWeight ¶
WithNgramSimWeight sets the ngramSimWeight in similarityOptions.
func WithNgramSize ¶
WithNgramSize sets the ngramSize in similarityOptions.
func WithWordSimWeight ¶
WithWordSimWeight sets the wordSimWeight in similarityOptions.
type SimilarityOptions ¶
type SimilarityOptions struct { NgramSize int WordSimWeight float64 NgramSimWeight float64 ContainmentSimWeight float64 CustomStopWords map[string]bool }
SimilarityOptions represents optional settings for hybrid similarity calculation. - NgramSize: The n-gram size used for n-gram similarity calculation. - WordSimWeight: Weight for word similarity in the final score. - NgramSimWeight: Weight for n-gram similarity in the final score. - ContainmentSimWeight: Weight for containment similarity in the final score.
func DefaultSimilarityOptions ¶
func DefaultSimilarityOptions() *SimilarityOptions