levtrie

package module
v0.0.0-...-3746106 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2023 License: Unlicense Imports: 1 Imported by: 0

README

levtrie

This go package implements a Trie that acts like a map from strings to strings, allowing the usual Get, Set, and Delete operations on key-value associations. In addition, the Trie supports a few variations on searches for key-value pairs where the key is within a particular edit distance of a query string.

Here's an example of how you might use levtrie to find the 10 closest words to the word "sitting" that are within edit distance 2 from a list of words held in the slice wordlist:

import "github.com/aaw/levtrie"

...

t := levtrie.New()
for _, word := range wordlist {
    t.Set(word, "")
}
results := t.Suggest("sitting", 2, 10)

Or see a full example at the Go Playground.

See the godoc for this package for complete documentation.

See examples/typeahead in this repo for an extended example of implementing typeahead-style query suggestions with levtrie. You can launch a small webapp with go run examples/typeahead/typeahead.go from the top level clone of this repo that lets you type in a query box to find suggestions for spelling corrections.

All of the searches restricted by edit distance in levtrie are accomplished by generating a non-deterministic Levenshtein Automata on the fly and simulating it in parallel with the Trie search. I've described this technique in more detail in this post.

Documentation

Overview

Package levtrie provides a Trie implementation that supports fast searches for words within a given edit distance of a query string. Edit distance bounds are maintained during the search by simulating an NFA that accepts all words within distance d of the query string in parallel with the Trie traversal.

An example NFA is pictured below for d = 2 and the word "edit":

   ┌──┐   e  ┌──┐   d  ┌──┐   i  ┌──┐   t  ╔══╗
   |  |─────▷|  |─────▷|  |─────▷|  |─────▷║  ║
   └──┘     ◹└──┘     ◹└──┘     ◹└──┘     ◹╚══╝
    △      ╱  △      ╱  △      ╱  △      ╱  △
    │  ε,*╱   │  ε,*╱   │  ε,*╱   │  ε,*╱   │
    │    ╱    │    ╱    │    ╱    │    ╱    │
   *│   ╱    *│   ╱    *│   ╱    *│   ╱    *│
    │  ╱      │  ╱      │  ╱      │  ╱      │
   ┌──┐   e  ┌──┐   d  ┌──┐   i  ┌──┐   t  ╔══╗
   |  |─────▷|  |─────▷|  |─────▷|  |─────▷║  ║
   └──┘     ◹└──┘     ◹└──┘     ◹└──┘     ◹╚══╝
    △      ╱  △      ╱  △      ╱  △      ╱  △
    │  ε,*╱   │  ε,*╱   │  ε,*╱   │  ε,*╱   │
    │    ╱    │    ╱    │    ╱    │    ╱    │
   *│   ╱    *│   ╱    *│   ╱    *│   ╱    *│
    │  ╱      │  ╱      │  ╱      │  ╱      │
   ┌──┐   e  ┌──┐   d  ┌──┐   i  ┌──┐   t  ╔══╗
──▷|  |─────▷|  |─────▷|  |─────▷|  |─────▷║  ║
   └──┘      └──┘      └──┘      └──┘      ╚══╝

The state on the bottom left is the initial state and the double-bordered states on the far right are accepting states. *-transitions can be taken on any character and ε-transitions can be taken without consuming a character. Each transition in the NFA above corresponds to an edit operation: transitions to the right represent no edit, transitions up represent insertions, diagonal ε-transitions represent deletions and diagonal *-transitions represent substitutions.

The NFA above is a specific example from a parameterized family of Levenshtein NFAs. Creating other Levenshtein NFAs for different words or edit distances is straightfoward: add or remove columns as needed to fit the length of the word, add rows as needed to increase the edit distance, and adjust the labeled horizontal transitions to match the word.

Simulating the NFA involves keeping track of an active set of states. The active set of states is defined by the lowest active state on each diagonal and only diagonals within a sliding window of 2d + 1 possible diagonals ever contain active states during the simulation. These properties make it possible to simulate each transition of the NFA in time O(d).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type KV

type KV struct {
	Key   string
	Value string
}

KV is a key-value pair, the basic storage unit of the Trie.

type Trie

type Trie struct {
	// contains filtered or unexported fields
}

Trie supports common map operations as well as lookups within a given edit distance bound. Don't create directly, use levtrie.New() instead.

func New

func New() *Trie

New returns a new Trie.

func (*Trie) Delete

func (t *Trie) Delete(key string)

Delete removes the key from the Trie. A subsequent call to Get(key) will return ("", false).

func (*Trie) Get

func (t *Trie) Get(key string) (string, bool)

Get returns the value stored in the Trie at the given key. If there is no such key in the Trie, it returns the empty string. The second value returned is true exactly when the key exists in the Trie.

func (*Trie) Set

func (t *Trie) Set(key string, val string)

Set associates key with val in the Trie. A subsequent call to Get(key) will return (val, true).

func (Trie) Suggest

func (t Trie) Suggest(key string, d int8, n int) []KV

Suggest returns up to n KVs with keys that are within edit distance d of the input key. Example: Suggest("banana", 2, 10) would return up to 10 results which might include keys like "bahama", "bananas", or "panama".

func (Trie) SuggestAfterExactPrefix

func (t Trie) SuggestAfterExactPrefix(key string, p int, d int8, n int) []KV

SuggestAfterExactPrefix returns up to n KVs that share an exact prefix of length p with the input key and are within edit distance d of the input key. Example: SuggestAfterExactPrefix("britney", 3, 2, 10) would return up to 10 results which might include "brine" and "briney" but not "jitney".

func (Trie) SuggestSuffixes

func (t Trie) SuggestSuffixes(key string, d int8, n int) []KV

SuggestSuffixes returns up to n KVs, all of whose keys have a prefix that is within edit distance d of the input key. Example: SuggestSuffixes("eat", 1, 10) would return up to 10 results which might include keys like "eaten", "eating", "beaten", and "meatball"

func (Trie) SuggestSuffixesAfterExactPrefix

func (t Trie) SuggestSuffixesAfterExactPrefix(key string, p int, d int8, n int) []KV

SuggestSuffixesAfterExactPrefix returns up to n KVs, all of whose keys have a prefix that is within edit distance d of the input key and share an exact prefix of at least length p with the input key. Example: SuggestSuffixesAfterExactPrefix("toads", 1, 2, 10) would return up to 10 results which might include "toadstool" and "toast" but not "roads".

Directories

Path Synopsis
examples
typeahead
A simple spelling corrector implemented as a HTTP server.
A simple spelling corrector implemented as a HTTP server.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL