diff

package
v0.0.0-...-81a76c0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2023 License: BSD-3-Clause Imports: 3 Imported by: 0

Documentation

Overview

Package diff implements an algorithm for producing edit-scripts. The edit-script is a sequence of operations needed to transform one list of symbols into another (or vice-versa). The edits allowed are insertions, deletions, and modifications. The summation of all edits is called the Levenshtein distance as this problem is well-known in computer science.

This package prioritizes performance over accuracy. That is, the run time is more important than obtaining a minimal Levenshtein distance.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type EditScript

type EditScript []EditType

EditScript represents the series of differences between two lists.

func Difference

func Difference(nx, ny int, f EqualFunc) (es EditScript)

Difference reports whether two lists of lengths nx and ny are equal given the definition of equality provided as f.

This function returns an edit-script, which is a sequence of operations needed to convert one list into the other. The following invariants for the edit-script are maintained:

  • eq == (es.Dist()==0)
  • nx == es.LenX()
  • ny == es.LenY()

This algorithm is not guaranteed to be an optimal solution (i.e., one that produces an edit-script with a minimal Levenshtein distance). This algorithm favors performance over optimality. The exact output is not guaranteed to be stable and may change over time.

func (EditScript) Dist

func (es EditScript) Dist() int

Dist is the Levenshtein distance and is guaranteed to be 0 if and only if lists X and Y are equal.

func (EditScript) LenX

func (es EditScript) LenX() int

LenX is the length of the X list.

func (EditScript) LenY

func (es EditScript) LenY() int

LenY is the length of the Y list.

func (EditScript) String

func (es EditScript) String() string

String returns a human-readable string representing the edit-script where Identity, UniqueX, UniqueY, and Modified are represented by the '.', 'X', 'Y', and 'M' characters, respectively.

type EditType

type EditType uint8

EditType represents a single operation within an edit-script.

const (
	// Identity indicates that a symbol pair is identical in both list X and Y.
	Identity EditType = iota
	// UniqueX indicates that a symbol only exists in X and not Y.
	UniqueX
	// UniqueY indicates that a symbol only exists in Y and not X.
	UniqueY
	// Modified indicates that a symbol pair is a modification of each other.
	Modified
)

type EqualFunc

type EqualFunc func(ix int, iy int) Result

EqualFunc reports whether the symbols at indexes ix and iy are equal. When called by Difference, the index is guaranteed to be within nx and ny.

type Result

type Result struct{ NumSame, NumDiff int }

Result is the result of comparison. NumSame is the number of sub-elements that are equal. NumDiff is the number of sub-elements that are not equal.

func BoolResult

func BoolResult(b bool) Result

BoolResult returns a Result that is either Equal or not Equal.

func (Result) Equal

func (r Result) Equal() bool

Equal indicates whether the symbols are equal. Two symbols are equal if and only if NumDiff == 0. If Equal, then they are also Similar.

func (Result) Similar

func (r Result) Similar() bool

Similar indicates whether two symbols are similar and may be represented by using the Modified type. As a special case, we consider binary comparisons (i.e., those that return Result{1, 0} or Result{0, 1}) to be similar.

The exact ratio of NumSame to NumDiff to determine similarity may change.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL