fuzzy

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2023 License: GPL-3.0 Imports: 8 Imported by: 0

README

go-fuzzywuzzy

This is a port of SeatGeek's fuzzywuzzy, a fuzzy string matching library.

Usage

Levenshtein Edit Distance
fuzzy.EditDistance("bart", "bort")
1
Simple Ratio
fuzzy.Ratio("coolstring", "coooolstring")
91
fuzzy.Ratio("coolstring", "radstring"))
63
Partial Ratio
fuzzy.Ratio("needle", "haystackneedelhaystack")
36
fuzzy.PartialRatio("needle", "haystackneedelhaystack")
83
Token Sort Ratio
fuzzy.Ratio("several tokens arbitrary order", "order arbitrary several tokens")
50
fuzzy.TokenSortRatio("several tokens arbitrary order", "order arbitrary several tokens")
100
Token Set Ratio
fuzzy.TokenSortRatio("several tokens arbitrary order", "order order arbitrary several tokens")
91
fuzzy.TokenSetRatio("several tokens arbitrary order", "order order arbitrary several tokens")
100
Process
choices := []string{"Wayne Shorter", "Jonathan Richman", "Wayne Hancock", "Kate Bush"}
fuzzy.ExtractOne("wayne hancock", choices)
{Match:"Wayne Hancock", Score:100}
fuzzy.Extract("wayne hancock", choices, 2)
[{Match:"Wayne Hancock", Score:100}, {Match:"Wayne Shorter", Score:62}]

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ASCIIOnly

func ASCIIOnly(s string) string

func Cleanse

func Cleanse(s string, forceASCII bool) string

func Dedupe

func Dedupe(sliceWithDupes []string, args ...interface{}) ([]string, error)

func EditDistance

func EditDistance(s1, s2 string) int

EditDistance computes the Levenshtein distance between two strings, weighting replacements the same as insertions and deletions.

func LevEditDistance

func LevEditDistance(s1, s2 string, xcost int) int

LevEditDistance computes Levenshtein distance between 2 strings. If xcost parameter is zero, the replace operation has weight 1. Otherwise, all edit operations have equal weights of 1.

func PartialRatio

func PartialRatio(s1, s2 string) int

PartialRatio computes a score of how close a string is with the most similar substring from another string. Order of arguments does not matter. Returns an integer score [0,100], higher score indicates that the string and substring are closer.

func PartialTokenSetRatio

func PartialTokenSetRatio(s1, s2 string, opts ...bool) int

PartialTokenSetRatio extracts tokens from each input string, adds them to a set, construct two strings of the form <sorted intersection><sorted remainder>, takes the partial ratios of those two strings, and returns the max.

func PartialTokenSortRatio

func PartialTokenSortRatio(s1, s2 string, opts ...bool) int

PartialTokenSortRatio computes a score similar to PartialRatio, except tokens are sorted and (optionally) cleansed prior to comparison.

func QRatio

func QRatio(s1, s2 string) int

QRatio computes a score similar to Ratio, except both strings are trimmed, cleansed of non-ASCII characters, and case-standardized.

func Ratio

func Ratio(s1, s2 string) int

Ratio computes a score of how close two unicode strings are based on their Levenshtein edit distance. Returns an integer score [0,100], higher score indicates that strings are closer.

func TokenSetRatio

func TokenSetRatio(s1, s2 string, opts ...bool) int

TokenSetRatio extracts tokens from each input string, adds them to a set, construct strings of the form <sorted intersection><sorted remainder>, takes the ratios of those two strings, and returns the max.

func TokenSortRatio

func TokenSortRatio(s1, s2 string, opts ...bool) int

TokenSortRatio computes a score similar to Ratio, except tokens are sorted and (optionally) cleansed prior to comparison.

func UQRatio

func UQRatio(s1, s2 string) int

UQRatio computes a score similar to Ratio, except both strings are trimmed and case-standardized.

func UWRatio

func UWRatio(s1, s2 string) int

UWRatio computes a score similar to WRatio, except non-ASCII characters are allowed.

func WRatio

func WRatio(s1, s2 string) int

WRatio computes a score with the following steps:

  1. Cleanse both strings, remove non-ASCII characters.
  2. Take Ratio as baseline score.
  3. Run a few heuristics to determine whether partial ratios should be taken.
  4. If partial ratios were determined to be necessary, compute PartialRatio, PartialTokenSetRatio, and PartialTokenSortRatio. Otherwise, compute TokenSortRatio and TokenSetRatio.
  5. Return the max of all computed ratios.

Types

type MatchPair

type MatchPair struct {
	Match string
	Score int
}

func ExtractOne

func ExtractOne(query string, choices []string, args ...interface{}) (*MatchPair, error)

type MatchPairs

type MatchPairs []*MatchPair

func Extract

func Extract(query string, choices []string, limit int, args ...interface{}) (MatchPairs, error)

func ExtractWithoutOrder

func ExtractWithoutOrder(query string, choices []string, args ...interface{}) (MatchPairs, error)

func (MatchPairs) Len

func (slice MatchPairs) Len() int

func (MatchPairs) Less

func (slice MatchPairs) Less(i, j int) bool

func (*MatchPairs) Pop

func (slice *MatchPairs) Pop() interface{}

func (*MatchPairs) Push

func (slice *MatchPairs) Push(x interface{})

func (MatchPairs) Swap

func (slice MatchPairs) Swap(i, j int)

type StringSet

type StringSet struct {
	// contains filtered or unexported fields
}

func NewStringSet

func NewStringSet(slice []string) *StringSet

func (*StringSet) Difference

func (s *StringSet) Difference(other *StringSet) *StringSet

Difference returns the set of strings that are present in this set but not the other set

func (*StringSet) Equals

func (s *StringSet) Equals(other *StringSet) bool

Equals returns true if two sets contain the same elements

func (*StringSet) Intersect

func (s *StringSet) Intersect(other *StringSet) *StringSet

Intersection returns the set of strings that are contained in both sets

func (*StringSet) ToSlice

func (s *StringSet) ToSlice() []string

ToSlice produces a string slice from the set

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL