levenshtein

package module
v1.0.4-0...-30b76e3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 28, 2021 License: MIT Imports: 2 Imported by: 0

README

levenshtein Build Status Go Report Card GoDoc

Go package to calculate the Levenshtein Distance

The library is fully capable of working with non-ascii strings. But the strings are not normalized. That is left as a user-dependant use case. Please normalize the strings before passing it to the library if you have such a requirement.

Limitation

As a performance optimization, the library can handle strings only up to 65536 characters (runes). This is only available on tip, and is not part of a tagged release yet. If you require such an optimization, please use the version at tip.

Install

go get github.com/agnivade/levenshtein

Example

package main

import (
	"fmt"
	"github.com/agnivade/levenshtein"
)

func main() {
	s1 := "kitten"
	s2 := "sitting"
	distance := levenshtein.ComputeDistance(s1, s2)
	fmt.Printf("The distance between %s and %s is %d.\n", s1, s2, distance)
	// Output:
	// The distance between kitten and sitting is 3.
}

Benchmarks

name              time/op
Simple/ASCII-4     330ns ± 2%
Simple/French-4    617ns ± 2%
Simple/Nordic-4   1.16µs ± 4%
Simple/Tibetan-4  1.05µs ± 1%

name              alloc/op
Simple/ASCII-4     96.0B ± 0%
Simple/French-4     128B ± 0%
Simple/Nordic-4     192B ± 0%
Simple/Tibetan-4    144B ± 0%

name              allocs/op
Simple/ASCII-4      1.00 ± 0%
Simple/French-4     1.00 ± 0%
Simple/Nordic-4     1.00 ± 0%
Simple/Tibetan-4    1.00 ± 0%

Comparisons with other libraries

name                     time/op
Leven/ASCII/agniva-4      353ns ± 1%
Leven/ASCII/arbovm-4      485ns ± 1%
Leven/ASCII/dgryski-4     395ns ± 0%
Leven/French/agniva-4     648ns ± 1%
Leven/French/arbovm-4     791ns ± 0%
Leven/French/dgryski-4    682ns ± 0%
Leven/Nordic/agniva-4    1.28µs ± 1%
Leven/Nordic/arbovm-4    1.52µs ± 1%
Leven/Nordic/dgryski-4   1.32µs ± 1%
Leven/Tibetan/agniva-4   1.12µs ± 1%
Leven/Tibetan/arbovm-4   1.31µs ± 0%
Leven/Tibetan/dgryski-4  1.16µs ± 0%

Documentation

Overview

Package levenshtein is a Go implementation to calculate Levenshtein Distance.

Implementation taken from https://gist.github.com/andrei-m/982927#gistcomment-1931258

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComputeDistance

func ComputeDistance(s1, s2 []rune) int

ComputeDistance computes the levenshtein distance between the two strings passed as an argument. The return value is the levenshtein distance

Works on runes (Unicode code points) but does not normalize the input strings. See https://blog.golang.org/normalization and the golang.org/x/text/unicode/norm pacage.

func ComputeDistance64

func ComputeDistance64(s1, s2 []uint64) int

func ComputeWordDistance

func ComputeWordDistance(s1, s2 []rune) int

Types

type EditStats

type EditStats = struct {
	Subs map[string]int `json:"subs"`
	Ins  map[string]int `json:"ins"`
	Dels map[string]int `json:"dels"`
}

EditStats stores information about the number of substitutions, insertions and deletions used in the optimal path between the two string

func ComputeDistWithCon64

func ComputeDistWithCon64(s1, s2 []uint64, r1, r2 map[uint64]string) (int, EditStats)

func ComputeDistanceWithConstruction

func ComputeDistanceWithConstruction(s1, s2 []rune) (int, EditStats)

ComputeDistanceWithConstruction extends ComputeDistance to return information about the number of substitutions, insertions and deletions alongside the distance score

func ComputeWordDistCon

func ComputeWordDistCon(s1, s2 []rune) (int, EditStats)

func NewEditStats

func NewEditStats() EditStats

NewEditStats returns a new EditStats object

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL