textdistance

package module
v0.0.0-...-fa16639 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 12, 2022 License: MIT Imports: 5 Imported by: 0

README

textdistance

Go Report Card

textdistance is a string comparison library written in Go. Heavily inspired by the identically named Python library, it aims to provide a myriad of different algorithms.

Additionally, it aims to be:

  • safe for production use, preferring error where required over absolute raw performance
  • have consistent interfaces in order to test different implementations to enable easy dependency injection
  • within those constraints being as performant as possible: allowing assembly snippets but no C library links

Documentation

The full documentation with further examples is available at GoDoc.

Usage

Install
go get -u github.com/benfdking/textdistance
Example
package main 

import (
    "fmt"

    "github.com/benfdking/textdistance"
)

func main(){
	h := textdistance.NewHamming()
	distance, _ := h.Distance("drummer", "dresser")
	fmt.Println(distance)
}

Documentation

Overview

textdistance provides a set of string comparison functions for different applications. It includes a set of algorithms: * Hamming * Jaccard * Match Rating Approach * Sorenson Dice

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WordsToSet

func WordsToSet(s string) mapset.Set

Types

type Distance

type Distance interface {
	Distance(s1, s2 string) (float64, error)
}

Distance generally tends to be a measure of the difference

type Hamming

type Hamming struct {
}

Hamming structure incorporates methods for computing distance and similarity on Hamming.

func NewHamming

func NewHamming() Hamming

NewHamming returns a Hamming structure

func (Hamming) Distance

func (Hamming) Distance(s1, s2 string) (float64, error)

Distance returns the hamming distance

func (Hamming) Maximum

func (Hamming) Maximum(s1, s2 string) (float64, error)

Maximum returns the maximum value for Distance given two strings.

func (Hamming) Minimum

func (Hamming) Minimum(_, _ string) (float64, error)

Minimum returns the minimum value for Distance given two strings.

func (Hamming) Similarity

func (h Hamming) Similarity(s1, s2 string) (float64, error)

Similarity returns the hamming similarity

type Jaccard

type Jaccard struct {
	StringToSet func(s string) mapset.Set
}

func NewJaccard

func NewJaccard() Jaccard

NewJaccard returns a Jaccard structure with the StringToSet set to the default WordsToSet

func (Jaccard) Similarity

func (j Jaccard) Similarity(s1, s2 string) (float64, error)

type Levenshtein

type Levenshtein struct {
}

func NewLevenshtein

func NewLevenshtein() Levenshtein

NewLevenshtein returns a Levenshtein structure

func (Levenshtein) Distance

func (l Levenshtein) Distance(s1, b string) (float64, error)

Distance returns the Levenshtein distance

func (Levenshtein) Maximum

func (l Levenshtein) Maximum(s1, s2 string) (float64, error)

func (Levenshtein) Minimum

func (l Levenshtein) Minimum(s1, s2 string) (float64, error)

type MRA

type MRA struct {
}

func NewMRA

func NewMRA() MRA

NewMRA returns a MRA structure

func (MRA) Distance

func (m MRA) Distance(s1, s2 string) (float64, error)

func (MRA) Encoding

func (m MRA) Encoding(s string) (string, error)

Encoding returns the encoded MRA string according to the match rating approach. Encoding follows the following steps:

1. Delete all vowels unless the vowel begins the word 2. Remove the second consonant of any double consonants present 3. Reduce codex to 6 letters by joining the first 3 and last 3 letters only

From Wikipedia: https://en.wikipedia.org/wiki/Match_rating_approach

func (MRA) Minimum

func (m MRA) Minimum(s1, s2 string) (float64, error)

type Maximum

type Maximum interface {
	Maximum(s1, s2 string) (float64, error)
}

Maximum returns the theoretical maximum between two strings

type Minimum

type Minimum interface {
	Minimum(s1, s2 string) (float64, error)
}

Minimum returns the quickly calculated theoretical minimum between two strings

type Overlap

type Overlap struct {
	StringToSet func(string) mapset.Set
}

func NewOverlap

func NewOverlap() Overlap

NewOverlap returns a Overlap structure with the StringToSet set to the default WordsToSet

func (Overlap) Maximum

func (Overlap) Maximum(_, _ string) (float64, error)

func (Overlap) Similarity

func (o Overlap) Similarity(s1, s2 string) (float64, error)

type Similarity

type Similarity interface {
	Similarity(s1, s2 string) (float64, error)
}

Similarity is a normalised measure of the distance between 0 and 1

type SorensonDice

type SorensonDice struct {
	StringToSet func(s string) mapset.Set
}

func NewSorensonDice

func NewSorensonDice() SorensonDice

NewSorensonDice returns a SorensonDice structure with the StringToSet set to the default WordsToSet

func (SorensonDice) Maximum

func (SorensonDice) Maximum(_, _ string) (float64, error)

func (SorensonDice) Similarity

func (s SorensonDice) Similarity(s1, s2 string) (float64, error)

type SymmetricalTversky

type SymmetricalTversky struct {
	StringToSet func(s string) mapset.Set
	Alpha       float64
	Beta        float64
}

func NewSymmetricalTversky

func NewSymmetricalTversky(alpha, beta float64) SymmetricalTversky

NewSymmetricalTversky returns a SymmetricalTversky structure with the StringToSet set to the default WordsToSet

func (SymmetricalTversky) Maximum

func (SymmetricalTversky) Maximum(_, _ string) (float64, error)

func (SymmetricalTversky) Similarity

func (t SymmetricalTversky) Similarity(s1, s2 string) (float64, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL