dawg: github.com/smhanov/dawg Index | Examples | Files

package dawg

import "github.com/smhanov/dawg"

Package dawg is an implemention of a Directed Acyclic Word Graph, as described on my blog at http://stevehanov.ca/blog/?id=115

It is designed to be as memory efficient as possible. Instead of storing a map in each node, which can consume hundreds of bytes per node, one map is used to store the edges of all nodes.

When you are finished adding words, all unneeded information is discarded.

In general, to use it you first create a builder using dawg.New(). You can then add words to the Dawg. The two restrictions are that you cannot repeat a word, and they must be in strictly increasing alphabetical order.

After all the words are added, call Finish() which returns a dawg.Finder interface. You can perform queries with this interface, such as finding all prefixes of a given string which are also words, or looking up a word's index that you have previously added.

After you have called Finish() on a Builder, you may choose to write it to disk using the Save() function. The DAWG can then be opened again later using the Load() function. When opened from dist, no memory is used. The structure is accessed in-place on disk using a minimal perfect hash.

Index

Examples

Package Files

dawg.go disk.go doc.go mph.go

type Builder Uses

type Builder interface {
    CanAdd(word string) bool
    Add(wordIn string)
    Finish() Finder
    Write(w io.Writer) (int, error)
    Save(filename string) (int, error)
}

Builder is the interface for creating a new Dawg

func New Uses

func New() Builder

New creates a new DAWG

Code:

dawg := dawg.New()

dawg.Add("blip")   // index 0
dawg.Add("cat")    // index 1
dawg.Add("catnip") // index 2
dawg.Add("cats")   // index 3

finder := dawg.Finish()

for _, result := range finder.FindAllPrefixesOf("catsup") {
    fmt.Printf("Found prefix %s, index %d\n", result.Word, result.Index)
}

Output:

Found prefix cat, index 1
Found prefix cats, index 3

type Dawg Uses

type Dawg struct {
    // contains filtered or unexported fields
}

Dawg represents a Directed Acyclic Word Graph

func (*Dawg) Add Uses

func (dawg *Dawg) Add(wordIn string)

Add adds a word to the structure. Adding a word not in alphaetical order, or to a finished Dawg will panic.

func (*Dawg) CanAdd Uses

func (dawg *Dawg) CanAdd(word string) bool

CanAdd will return true if the word can be added to the Dawg. Words must be added in alphabetical order.

func (*Dawg) FindAllPrefixesOf Uses

func (dawg *Dawg) FindAllPrefixesOf(input string) []FindResult

FindAllPrefixesOf returns all items in the dawg that are a prefix of the input string. It will panic if the dawg is not finished.

func (*Dawg) Finish Uses

func (dawg *Dawg) Finish() Finder

Finish will mark the dawg as complete. The Dawg cannot be used for lookups until Finish has been called.

func (*Dawg) IndexOf Uses

func (dawg *Dawg) IndexOf(input string) int

IndexOf returns the index, which is the order the item was inserted. If the item was never inserted, it returns -1 It will panic if the dawg is not finished.

func (*Dawg) NumAdded Uses

func (dawg *Dawg) NumAdded() int

NumAdded returns the number of words added

func (*Dawg) NumEdges Uses

func (dawg *Dawg) NumEdges() int

NumEdges returns the number of edges in the dawg. This includes transitions to the "final" node after each word.

func (*Dawg) NumNodes Uses

func (dawg *Dawg) NumNodes() int

NumNodes returns the number of nodes in the dawg.

func (*Dawg) Print Uses

func (dawg *Dawg) Print()

Print will print all edges to the standard output

func (*Dawg) Save Uses

func (dawg *Dawg) Save(filename string) (int, error)

Save writes the DAWG to disk. Returns the number of bytes written

func (*Dawg) Write Uses

func (dawg *Dawg) Write(w io.Writer) (int, error)

Save writes the DAWG to an io.Writer. Returns the number of bytes written

type FindResult Uses

type FindResult struct {
    Word  string
    Index int
}

FindResult is the result of a lookup in the Dawg. It contains both the word found, and it's index based on the order it was added.

type Finder Uses

type Finder interface {
    FindAllPrefixesOf(input string) []FindResult
    IndexOf(input string) int
    NumAdded() int
    NumEdges() int
    NumNodes() int
    Print()
}

Finder is the interface for using a Dawg to find words A dawg stored on disk only implements this interface, since it cannot be added to.

func Load Uses

func Load(filename string) (Finder, error)

Load loads the dawg from a file

func Read Uses

func Read(f io.ReaderAt, offset int64) (Finder, error)

Read returns a finder that accesses the dawg in-place using the given io.ReaderAt

Package dawg imports 9 packages (graph) and is imported by 1 packages. Updated 2019-04-08. Refresh now. Tools for package owners.