levenshtein

package module
v0.0.0-...-fff5a58 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 7, 2019 License: GPL-3.0 Imports: 2 Imported by: 0

README

levenshtein

Go Report Card GoDoc license

This package provides functions for calculating the Levenshtein distance (a type of edit distance) between two strings, and for generating a minimal list of edit operations required to convert the source string into the target string. It does this by building an edit matrix according to the Wagner-Fischer Algorithm. Alternative insertion/removal/swap costs can be provided as options. The list of edit operations is retrieved via a recursive algorithm which reads off a backtrace of edit operations from the matrix.

Documentation and examples can be found at godoc.org

Documentation

Overview

Package levenshtein provides functions for calculating the Levenshtein distance (a type of edit distance) between two strings, and for generating a minimal list of edit operations required to convert the source string into the target string. It does this by building an edit matrix according to the Wagner-Fischer Algorithm. Alternative insertion/removal/swap costs can be provided as options. The list of edit operations is retrieved via a recursive algorithm which reads off a backtrace of edit operations from the matrix.

More information about the Wagner-Fischer algorithm can be found here: https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm

Example
package main

import (
	"fmt"

	"github.com/nathanjcochran/levenshtein"
)

func main() {
	matrix := levenshtein.Build("horse", "arose")

	fmt.Printf("Matrix:\n%s\n\n", matrix)
	fmt.Printf("Edit distance: %d\n", matrix.Distance())
	fmt.Printf("Operations:\n")
	for _, op := range matrix.Operations() {
		fmt.Printf(" %s\n", op)
	}

}
Output:

Matrix:
    a r o s e
  0 1 2 3 4 5
h 1 1 2 3 4 5
o 2 2 2 2 3 4
r 3 3 2 3 3 4
s 4 4 3 3 3 4
e 5 5 4 4 4 3

Edit distance: 3
Operations:
   swap a at index 0: aorse
 remove o at index 1: arse
   keep r at index 1: arse
 insert o at index 2: arose
   keep s at index 3: arose
   keep e at index 4: arose

Index

Examples

Constants

View Source
const (
	DefaultInsertCost = 1
	DefaultRemoveCost = 1
	DefaultSwapCost   = 1
)

Default costs for inserting, removing, and swapping characters.

Variables

This section is empty.

Functions

func Distance

func Distance(source, target string, options ...Option) int

Distance builds a matrix and returns the edit distance between the two strings - i.e. the minimum number of edits required to transform the source string into the target string. This method is a short-cut, useful in cases where you do not need to use the edit matrix for any other purpose. It is equivalent to: Build(source, target, options...).Distance()

Example
package main

import (
	"fmt"

	"github.com/nathanjcochran/levenshtein"
)

func main() {
	fmt.Println(levenshtein.Distance("horse", "arose"))

}
Output:

3

Types

type Matrix

type Matrix struct {
	// contains filtered or unexported fields
}

Matrix contains a two-dimensional matrix used for calculating edit distances between two strings, and for retrieving a minimal list of edit operations for converting the source string into the target string.

func Build

func Build(source, target string, options ...Option) *Matrix

Builds and fills a matrix which can be used to calculate the edit distance between the two strings, or to retrieve a list of edit operations required to transform the source string into the target string.

Example
package main

import (
	"fmt"

	"github.com/nathanjcochran/levenshtein"
)

func main() {
	fmt.Println(levenshtein.Build("horse", "arose"))

}
Output:

    a r o s e
  0 1 2 3 4 5
h 1 1 2 3 4 5
o 2 2 2 2 3 4
r 3 3 2 3 3 4
s 4 4 3 3 3 4
e 5 5 4 4 4 3

func (*Matrix) Distance

func (m *Matrix) Distance() int

Distance returns the edit distance between the two strings - i.e. the minimum number of edits required to transform the source string into the target string.

Example
package main

import (
	"fmt"

	"github.com/nathanjcochran/levenshtein"
)

func main() {
	matrix := levenshtein.Build("horse", "arose")
	fmt.Println(matrix.Distance())

}
Output:

3

func (*Matrix) Operations

func (m *Matrix) Operations() []Operation

Operations returns a minimal list of edit operations required to transform the source string into the target string.

Example
package main

import (
	"fmt"

	"github.com/nathanjcochran/levenshtein"
)

func main() {
	matrix := levenshtein.Build("horse", "arose")
	for _, op := range matrix.Operations() {
		fmt.Println(op)
	}

}
Output:

  swap a at index 0: aorse
remove o at index 1: arse
  keep r at index 1: arse
insert o at index 2: arose
  keep s at index 3: arose
  keep e at index 4: arose

func (*Matrix) String

func (m *Matrix) String() string

String returns a string representation of the edit matrix, with proper column spacing.

type OpType

type OpType int8

OpType represents a type of edit operation.

const (
	Insert OpType = iota
	Remove
	Keep
	Swap
)

func (OpType) String

func (o OpType) String() string

String returns the string representation of an operation type.

type Operation

type Operation struct {
	Type   OpType
	Char   rune
	Index  int
	Result string
}

Operation represents one of the operations performed on a source string during the process of converting it into a target string. Contains information about the type of operation, the character affected, the index at which the operation occured, and the intermediate result of performing this operation.

func Operations

func Operations(source, target string, options ...Option) []Operation

Operations builds a matrix and returns a minimal list of edit operations required to transform the source string into the target string. This method is a short-cut, useful in cases where you do not need to use the edit matrix for any other purpose. It is equivalent to: Build(source, target, options...).Distance()

Example
package main

import (
	"fmt"

	"github.com/nathanjcochran/levenshtein"
)

func main() {
	for _, op := range levenshtein.Operations("horse", "arose") {
		fmt.Println(op)
	}

}
Output:

  swap a at index 0: aorse
remove o at index 1: arse
  keep r at index 1: arse
insert o at index 2: arose
  keep s at index 3: arose
  keep e at index 4: arose

func (Operation) String

func (o Operation) String() string

String returns the string representation of an operation.

type Option

type Option func(m *Matrix)

An Option which can be applied when generating an edit matrix or calculating the edit distance between two strings - e.g. setting a non-default insert/removal/swap cost.

func SetInsertCost

func SetInsertCost(cost int) Option

SetInsertCost is an option which allows you to set a custom insertion cost to use when calculating edit distances. If this option is not provided, DefaultInsertCost is used instead.

func SetRemoveCost

func SetRemoveCost(cost int) Option

SetRemoveCost is an option which allows you to set a custom removal cost to use when calculating edit distances. If this option is not provided, DefaultRemoveCost is used instead.

func SetSwapCost

func SetSwapCost(cost int) Option

SetSwapCost is an option which allows you to set a custom swap cost to use when calculating edit distances. If this option is not provided, DefaultSwapCost is used instead.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL