Documentation ¶
Overview ¶
Package levtrie provides a Trie implementation that supports fast searches for words within a given edit distance of a query string. Edit distance bounds are maintained during the search by simulating an NFA that accepts all words within distance d of the query string in parallel with the Trie traversal.
An example NFA is pictured below for d = 2 and the word "edit":
┌──┐ e ┌──┐ d ┌──┐ i ┌──┐ t ╔══╗ | |─────▷| |─────▷| |─────▷| |─────▷║ ║ └──┘ ◹└──┘ ◹└──┘ ◹└──┘ ◹╚══╝ △ ╱ △ ╱ △ ╱ △ ╱ △ │ ε,*╱ │ ε,*╱ │ ε,*╱ │ ε,*╱ │ │ ╱ │ ╱ │ ╱ │ ╱ │ *│ ╱ *│ ╱ *│ ╱ *│ ╱ *│ │ ╱ │ ╱ │ ╱ │ ╱ │ ┌──┐ e ┌──┐ d ┌──┐ i ┌──┐ t ╔══╗ | |─────▷| |─────▷| |─────▷| |─────▷║ ║ └──┘ ◹└──┘ ◹└──┘ ◹└──┘ ◹╚══╝ △ ╱ △ ╱ △ ╱ △ ╱ △ │ ε,*╱ │ ε,*╱ │ ε,*╱ │ ε,*╱ │ │ ╱ │ ╱ │ ╱ │ ╱ │ *│ ╱ *│ ╱ *│ ╱ *│ ╱ *│ │ ╱ │ ╱ │ ╱ │ ╱ │ ┌──┐ e ┌──┐ d ┌──┐ i ┌──┐ t ╔══╗ ──▷| |─────▷| |─────▷| |─────▷| |─────▷║ ║ └──┘ └──┘ └──┘ └──┘ ╚══╝
The state on the bottom left is the initial state and the double-bordered states on the far right are accepting states. *-transitions can be taken on any character and ε-transitions can be taken without consuming a character. Each transition in the NFA above corresponds to an edit operation: transitions to the right represent no edit, transitions up represent insertions, diagonal ε-transitions represent deletions and diagonal *-transitions represent substitutions.
The NFA above is a specific example from a parameterized family of Levenshtein NFAs. Creating other Levenshtein NFAs for different words or edit distances is straightfoward: add or remove columns as needed to fit the length of the word, add rows as needed to increase the edit distance, and adjust the labeled horizontal transitions to match the word.
Simulating the NFA involves keeping track of an active set of states. The active set of states is defined by the lowest active state on each diagonal and only diagonals within a sliding window of 2d + 1 possible diagonals ever contain active states during the simulation. These properties make it possible to simulate each transition of the NFA in time O(d).
Index ¶
- type KV
- type Trie
- func (t *Trie) Delete(key string)
- func (t *Trie) Get(key string) (string, bool)
- func (t *Trie) Set(key string, val string)
- func (t Trie) Suggest(key string, d int8, n int) []KV
- func (t Trie) SuggestAfterExactPrefix(key string, p int, d int8, n int) []KV
- func (t Trie) SuggestSuffixes(key string, d int8, n int) []KV
- func (t Trie) SuggestSuffixesAfterExactPrefix(key string, p int, d int8, n int) []KV
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Trie ¶
type Trie struct {
// contains filtered or unexported fields
}
Trie supports common map operations as well as lookups within a given edit distance bound. Don't create directly, use levtrie.New() instead.
func (*Trie) Delete ¶
Delete removes the key from the Trie. A subsequent call to Get(key) will return ("", false).
func (*Trie) Get ¶
Get returns the value stored in the Trie at the given key. If there is no such key in the Trie, it returns the empty string. The second value returned is true exactly when the key exists in the Trie.
func (*Trie) Set ¶
Set associates key with val in the Trie. A subsequent call to Get(key) will return (val, true).
func (Trie) Suggest ¶
Suggest returns up to n KVs with keys that are within edit distance d of the input key. Example: Suggest("banana", 2, 10) would return up to 10 results which might include keys like "bahama", "bananas", or "panama".
func (Trie) SuggestAfterExactPrefix ¶
SuggestAfterExactPrefix returns up to n KVs that share an exact prefix of length p with the input key and are within edit distance d of the input key. Example: SuggestAfterExactPrefix("britney", 3, 2, 10) would return up to 10 results which might include "brine" and "briney" but not "jitney".
func (Trie) SuggestSuffixes ¶
SuggestSuffixes returns up to n KVs, all of whose keys have a prefix that is within edit distance d of the input key. Example: SuggestSuffixes("eat", 1, 10) would return up to 10 results which might include keys like "eaten", "eating", "beaten", and "meatball"
func (Trie) SuggestSuffixesAfterExactPrefix ¶
SuggestSuffixesAfterExactPrefix returns up to n KVs, all of whose keys have a prefix that is within edit distance d of the input key and share an exact prefix of at least length p with the input key. Example: SuggestSuffixesAfterExactPrefix("toads", 1, 2, 10) would return up to 10 results which might include "toadstool" and "toast" but not "roads".