Documentation ¶
Overview ¶
Package dawg is an implemention of a Directed Acyclic Word Graph, as described on my blog at http://stevehanov.ca/blog/?id=115
A DAWG provides fast lookup of all possible prefixes of words in a dictionary, as well as the ability to get the index number of any word.
This particular implementation may be different from others because it is very memory efficient, and it also works fast with large character sets. It can deal with thousands of branches out of a single node without needing to go through each one.
The storage format is as small as possible. Bits are used instead of bytes so that no space is wasted as padding, and there are no practical limitations to the number of nodes or characters. A summary of the data format is found at the top of disk.go.
In general, to use it you first create a builder using dawg.New(). You can then add words to the Dawg. The two restrictions are that you cannot repeat a word, and they must be in strictly increasing alphabetical order.
After all the words are added, call Finish() which returns a dawg.Finder interface. You can perform queries with this interface, such as finding all prefixes of a given string which are also words, or looking up a word's index that you have previously added.
After you have called Finish() on a Builder, you may choose to write it to disk using the Save() function. The DAWG can then be opened again later using the Load() function. When opened from disk, no memory is used. The structure is accessed in-place on disk.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Builder ¶
type Builder interface { // Add the word to the dawg Add(wordIn string) // Returns true if the word can be added. CanAdd(word string) bool // Complete the dawg and return a Finder. Finish() Finder }
Builder is the interface for creating a new Dawg. Use New() to create it.
func New ¶
func New() Builder
New creates a new dawg
Example ¶
package main import ( "fmt" "github.com/smhanov/dawg" ) func main() { dawg := dawg.New() dawg.Add("blip") // index 0 dawg.Add("cat") // index 1 dawg.Add("catnip") // index 2 dawg.Add("cats") // index 3 finder := dawg.Finish() for _, result := range finder.FindAllPrefixesOf("catsup") { fmt.Printf("Found prefix %s, index %d\n", result.Word, result.Index) } }
Output: Found prefix cat, index 1 Found prefix cats, index 3
type EnumFn ¶
type EnumFn = func(index int, word []rune, final bool) EnumerationResult
EnumFn is a method that you implement. It will be called with all prefixes stored in the DAWG. If final is true, the prefix represents a complete word that has been stored.
type EnumerationResult ¶
type EnumerationResult = int
EnumerationResult is returned by the enumeration function to indicate whether indication should continue below this depth or to stop altogether
const ( // Continue enumerating all words with this prefix Continue EnumerationResult = iota // Skip will skip all words with this prefix Skip // Stop will immediately stop enumerating words Stop )
type FindResult ¶
FindResult is the result of a lookup in the d. It contains both the word found, and it's index based on the order it was added.
type Finder ¶
type Finder interface { // Find all prefixes of the given string FindAllPrefixesOf(input string) []FindResult // Find the index of the given string IndexOf(input string) int AtIndex(index int) (string, error) // Enumerate all prefixes stored in the dawg. Enumerate(fn EnumFn) // Returns the number of words NumAdded() int // Returns the number of edges NumEdges() int // Returns the number of nodes NumNodes() int // Output a human-readable description of the dawg to stdout Print() // Close the dawg that was opened with Load(). After this, it is no longer // accessible. Close() error // Save to a writer Write(w io.Writer) (int64, error) // Save to a file Save(filename string) (int64, error) }
Finder is the interface for querying a dawg. Use either Builder.Finish() or Load() to obtain one.