import "github.com/smhanov/dawg"
Package dawg is an implemention of a Directed Acyclic Word Graph, as described on my blog at http://stevehanov.ca/blog/?id=115
It is designed to be as memory efficient as possible. Instead of storing a map in each node, which can consume hundreds of bytes per node, one map is used to store the edges of all nodes.
When you are finished adding words, all unneeded information is discarded.
In general, to use it you first create a builder using dawg.New(). You can then add words to the Dawg. The two restrictions are that you cannot repeat a word, and they must be in strictly increasing alphabetical order.
After all the words are added, call Finish() which returns a dawg.Finder interface. You can perform queries with this interface, such as finding all prefixes of a given string which are also words, or looking up a word's index that you have previously added.
After you have called Finish() on a Builder, you may choose to write it to disk using the Save() function. The DAWG can then be opened again later using the Load() function. When opened from dist, no memory is used. The structure is accessed in-place on disk using a minimal perfect hash.
type Builder interface { CanAdd(word string) bool Add(wordIn string) Finish() Finder Write(w io.Writer) (int, error) Save(filename string) (int, error) }
Builder is the interface for creating a new Dawg
New creates a new DAWG
Code:
dawg := dawg.New() dawg.Add("blip") // index 0 dawg.Add("cat") // index 1 dawg.Add("catnip") // index 2 dawg.Add("cats") // index 3 finder := dawg.Finish() for _, result := range finder.FindAllPrefixesOf("catsup") { fmt.Printf("Found prefix %s, index %d\n", result.Word, result.Index) }
Output:
Found prefix cat, index 1 Found prefix cats, index 3
type Dawg struct {
// contains filtered or unexported fields
}
Dawg represents a Directed Acyclic Word Graph
Add adds a word to the structure. Adding a word not in alphaetical order, or to a finished Dawg will panic.
CanAdd will return true if the word can be added to the Dawg. Words must be added in alphabetical order.
func (dawg *Dawg) FindAllPrefixesOf(input string) []FindResult
FindAllPrefixesOf returns all items in the dawg that are a prefix of the input string. It will panic if the dawg is not finished.
Finish will mark the dawg as complete. The Dawg cannot be used for lookups until Finish has been called.
IndexOf returns the index, which is the order the item was inserted. If the item was never inserted, it returns -1 It will panic if the dawg is not finished.
NumAdded returns the number of words added
NumEdges returns the number of edges in the dawg. This includes transitions to the "final" node after each word.
NumNodes returns the number of nodes in the dawg.
Print will print all edges to the standard output
Save writes the DAWG to disk. Returns the number of bytes written
Save writes the DAWG to an io.Writer. Returns the number of bytes written
FindResult is the result of a lookup in the Dawg. It contains both the word found, and it's index based on the order it was added.
type Finder interface { FindAllPrefixesOf(input string) []FindResult IndexOf(input string) int NumAdded() int NumEdges() int NumNodes() int Print() }
Finder is the interface for using a Dawg to find words A dawg stored on disk only implements this interface, since it cannot be added to.
Load loads the dawg from a file
Read returns a finder that accesses the dawg in-place using the given io.ReaderAt
Package dawg imports 9 packages (graph) and is imported by 1 packages. Updated 2019-04-08. Refresh now. Tools for package owners.