similigo

package module
v0.0.0-...-f9dd620 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 4, 2023 License: BSD-3-Clause Imports: 4 Imported by: 0

README

Similigo

A Go client library to calculate similarity between two strings

Usage

go get -u github.com/Ojelaidi/similigo
Example

Default Usage

package main

import "github.com/Ojelaidi/similigo"

func main() {

	similarityScore := similigo.CalculateHybridSimilarity("text1", "text2")
	fmt.Printf("Similarity Score: %.2f\n", similarityScore)
}

Usage with options

package main

import "github.com/Ojelaidi/similigo"

func main() {
	
	similarityScore := similigo.CalculateHybridSimilarity(
		"text1",
		"text2",
		similigo.WithNgramSize(4),
		similigo.WithWordSimWeight(0.4),
		similigo.WithNgramSimWeight(0.4),
		similigo.WithContainmentSimWeight(0.2),
	)
	fmt.Printf("Similarity Score: %.2f\n", similarityScore)
}

How to Contribute

If you want to contribute you can read Contributing

License

This project is under BSD 3-Clause License

Documentation

Index

Constants

View Source
const (
	DefaultNgramSize            = 2
	DefaultWordSimWeight        = 0.5
	DefaultNgramSimWeight       = 0.3
	DefaultContainmentSimWeight = 0.2
)

Variables

This section is empty.

Functions

func CalculateHybridSimilarity

func CalculateHybridSimilarity(text1, text2 string, opts ...Option) float64

CalculateHybridSimilarity calculates a hybrid similarity score between two text strings. It combines different similarity measures (word similarity, n-gram similarity, and containment similarity) with custom weightings to provide an overall similarity score between the two texts.

Parameters: - text1: The first text string for comparison. - text2: The second text string for comparison. - opts: An optional variadic parameter that allows customization of n-gram size and weights

Returns: The hybrid similarity score, which is a weighted combination of the three similarity measures.

func FindBestMatchInList

func FindBestMatchInList(targetText string, texts []string, opts ...Option) (bestMatch string, highestScore float64)

FindBestMatchInList takes a target text and a slice of texts, calculates the similarity for each, and returns the text with the highest similarity score.

func FindBestNMatchesInList

func FindBestNMatchesInList(targetText string, texts []string, n int, opts ...Option) []utils.Match

FindBestNMatchesInList searches through a list of texts to find the top `n` texts that are most similar to a target text. It uses a heap to efficiently keep track of the best matches while iterating through the list.

Parameters: - targetText: The text string you want to compare against the list of texts. - texts: A slice of text strings that you want to compare with the target text. - n: The number of top matches you want to find. - opts: Zero or more options that can modify the similarity calculation (such as n-gram size, weights, etc.).

Returns: A slice of Match structs, each containing a text from the input list and its similarity score to the target text. The slice is sorted in descending order of similarity scores, with the highest scoring matches first.

func PreprocessText

func PreprocessText(text string, opts *SimilarityOptions) string

PreprocessText processes the input text for similarity comparison by performing several steps:

  1. Tokenization: Splitting the text into words (tokens) based on whitespace.
  2. Normalization: Converting all words to lowercase to ensure case insensitivity.
  3. Stop word removal: Eliminating common words (stop words) that are unlikely to be useful in the similarity comparison. It uses both a predefined set of stop words and any custom stop words provided in the SimilarityOptions.
  4. Stemming: Reducing words to their base or root form (stem).

Parameters:

  • text: The input text to preprocess.
  • opts: A pointer to SimilarityOptions which contains settings for the preprocessing, including any custom stop words to consider.

Returns: A preprocessed version of the input text with all words stemmed and stop words removed, joined into a single string separated by spaces.

Types

type Option

type Option func(*SimilarityOptions)

func WithContainmentSimWeight

func WithContainmentSimWeight(w float64) Option

WithContainmentSimWeight sets the containmentSimWeight in similarityOptions.

func WithCustomStopWords

func WithCustomStopWords(words []string) Option

WithCustomStopWords allows users to add custom stop words by providing a list of words.

func WithNgramSimWeight

func WithNgramSimWeight(w float64) Option

WithNgramSimWeight sets the ngramSimWeight in similarityOptions.

func WithNgramSize

func WithNgramSize(n int) Option

WithNgramSize sets the ngramSize in similarityOptions.

func WithWordSimWeight

func WithWordSimWeight(w float64) Option

WithWordSimWeight sets the wordSimWeight in similarityOptions.

type SimilarityOptions

type SimilarityOptions struct {
	NgramSize            int
	WordSimWeight        float64
	NgramSimWeight       float64
	ContainmentSimWeight float64
	CustomStopWords      map[string]bool
}

SimilarityOptions represents optional settings for hybrid similarity calculation. - NgramSize: The n-gram size used for n-gram similarity calculation. - WordSimWeight: Weight for word similarity in the final score. - NgramSimWeight: Weight for n-gram similarity in the final score. - ContainmentSimWeight: Weight for containment similarity in the final score.

func DefaultSimilarityOptions

func DefaultSimilarityOptions() *SimilarityOptions

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL