spell

package module
v0.0.0-...-e96bc70 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2022 License: MIT Imports: 17 Imported by: 0

README

SMARTY DISCLAIMER: Subject to the terms of the associated license agreement, this software is freely available for your use. This software is FREE, AS IN PUPPIES, and is a gift. Enjoy your new responsibility. This means that while we may consider enhancement requests, we may or may not choose to entertain requests at our sole and absolute discretion.

spell

GoDoc Go Report Card

A blazing fast spell checker written in Go.

N.B. This library is still in early development and may change.

Overview

package main

import (
    "fmt"

    "github.com/eskriett/spell"
)

func main() {

    // Create a new instance of spell
    s := spell.New()

    // Add words to the dictionary. Words require a frequency, but can have
    // other arbitrary metadata associated with them
    s.AddEntry(spell.Entry{
        Word: "two",
        WordData: spell.WordData{
            "frequency": 100,
            "type":      "number",
        },
    })
    s.AddEntry(spell.Entry{
        Word: "town",
        WordData: spell.WordData{
            "frequency": 10,
            "type":      "noun",
        },
    })

    // Lookup a misspelling, by default the "best" suggestion will be returned
    suggestions, _ := s.Lookup("twon")
    fmt.Printf("%v\n", suggestions)
    // -> [two]

    // Get metadata from the suggestion
    suggestion := suggestions[0]
    fmt.Printf("%v\n", suggestion.WordData["type"])
    // -> number

    // Get multiple suggestions during lookup
    suggestions, _ = s.Lookup("twon", spell.SuggestionLevel(spell.LevelAll))
    fmt.Printf("%v\n", suggestions)
    // -> [two, town]

    // Save the dictionary
    s.Save("dict.spell")

    // Load the dictionary
    s2, _ := spell.Load("dict.spell")

    suggestions, _ = s2.Lookup("twon", spell.SuggestionLevel(spell.LevelAll))
    fmt.Printf("%v\n", suggestions)
    // -> [two, town]

    // Spell supports word segmentation
    s3 := spell.New()

    wd := spell.WordData{"frequency": 1}
    s3.AddEntry(spell.Entry{Word: "the", WordData: wd})
    s3.AddEntry(spell.Entry{Word: "quick", WordData: wd})
    s3.AddEntry(spell.Entry{Word: "brown", WordData: wd})
    s3.AddEntry(spell.Entry{Word: "fox", WordData: wd})

    segmentResult, _ := s3.Segment("thequickbrownfox")
    fmt.Println(segmentResult)
    // -> the quick brown fox
}

Credits

Spell makes use of a symmetric delete algorithm and is loosely based on the SymSpell implementation.

Documentation

Overview

Package spell provides fast spelling correction and string segmentation

Index

Examples

Constants

View Source
const (
	// LevelBest will yield 'best' suggestion
	LevelBest suggestionLevel = iota

	// LevelClosest will yield closest suggestions
	LevelClosest

	// LevelAll will yield all suggestions
	LevelAll
)

Suggestion Levels used during Lookup.

Variables

This section is empty.

Functions

func LoadBigrams

func LoadBigrams(filename string) (map[string]int, error)

Load a bi-gram dictionary from disk from filename. Returns a new map on success, or will return an error if there's a problem reading the file.

Types

type Entry

type Entry struct {
	Word     string
	WordData WordData
}

Entry represents a word in the dictionary

type LookupOption

type LookupOption func(*lookupParams) error

LookupOption is a function that controls how a Lookup is performed. An error will be returned if the LookupOption is invalid.

func DistanceFunc

func DistanceFunc(df func(string, string, int) int) LookupOption

DistanceFunc accepts a function, f(str1, str2, maxDist), which calculates the distance between two strings. It should return -1 if the distance between the strings is greater than maxDist.

func EditDistance

func EditDistance(dist uint32) LookupOption

EditDistance allows the max edit distance to be set for the Lookup. Reducing the edit distance will improve lookup performance.

func PrefixLength

func PrefixLength(prefixLength uint32) LookupOption

PrefixLength defines how much of the input word should be used for the lookup.

func SortFunc

func SortFunc(sf func(SuggestionList)) LookupOption

SortFunc allows the sorting of the SuggestionList to be configured. By default, suggestions will be sorted by their edit distance, then their frequency.

func SuggestionLevel

func SuggestionLevel(level suggestionLevel) LookupOption

SuggestionLevel defines how many results are returned for the lookup. See the package constants for the levels available.

type Segment

type Segment struct {
	Word  string
	Entry *Entry
}

Segment contains details about an individual segment

type SegmentOption

type SegmentOption func(*segmentParams) error

SegmentOption is a function that controls how a Segment is performed. An error will be returned if the SegmentOption is invalid.

func SegmentLookupOpts

func SegmentLookupOpts(opt ...LookupOption) SegmentOption

SegmentLookupOpts allows the Lookup() options for the current segmentation to be configured

type SegmentResult

type SegmentResult struct {
	Segments []Segment
}

SegmentResult holds the result of a call to Segment()

func (SegmentResult) GetWords

func (s SegmentResult) GetWords() []string

GetWords returns a string slice of words for the segments

func (SegmentResult) String

func (s SegmentResult) String() string

String returns a string representation of the SegmentList.

type Spell

type Spell struct {
	// The max number of deletes that will be performed to each word in the
	// dictionary
	MaxEditDistance uint32

	// The prefix length that will be examined
	PrefixLength uint32
	// contains filtered or unexported fields
}

Spell provides access to functions for spelling correction

func Load

func Load(filename string) (*Spell, error)

Load a dictionary from disk from filename. Returns a new Spell instance on success, or will return an error if there's a problem reading the file.

func New

func New() *Spell

New creates a new spell instance

func (*Spell) AddEntry

func (s *Spell) AddEntry(de Entry) (bool, error)

AddEntry adds an entry to the dictionary. If the word already exists its data will be overwritten. Returns true if a new word was added, false otherwise. Will return an error if there was a problem adding a word, for example the dictionary entry must contain word data with a "frequency" field.

Example
// Create a new speller
s := New()

// Add a new word, "example" to the dictionary
s.AddEntry(Entry{
	Word:     "example",
	WordData: WordData{"frequency": 10},
})

// Overwrite the data for word "example"
s.AddEntry(Entry{
	Word:     "example",
	WordData: WordData{"frequency": 100},
})

// Output the frequency for word "example"
entry := s.GetEntry("example")
fmt.Printf("Output for word 'example' is: %v\n",
	entry.WordData.GetFrequency())
Output:

Output for word 'example' is: 100

func (*Spell) GetEntry

func (s *Spell) GetEntry(word string) *Entry

GetEntry returns the Entry for word. If a word does not exist, nil will be returned

func (*Spell) GetLongestWord

func (s *Spell) GetLongestWord() uint32

GetLongestWord returns the length of the longest word in the dictionary

func (*Spell) Lookup

func (s *Spell) Lookup(input string, opts ...LookupOption) (SuggestionList, error)

Lookup takes an input and returns suggestions from the dictionary for that word. By default it will return the best suggestion for the word if it exists.

Accepts zero or more LookupOption that can be used to configure how lookup occurs.

Example
// Create a new speller
s := New()
s.AddEntry(Entry{
	Word:     "example",
	WordData: WordData{"frequency": 1},
})

// Perform a default lookup for example
suggestions, _ := s.Lookup("eample")
fmt.Printf("Suggestions are: %v\n", suggestions)
Output:

Suggestions are: [example]
Example (ConfigureDistanceFunc)
// Create a new speller
s := New()
s.AddEntry(Entry{
	Word:     "example",
	WordData: WordData{"frequency": 1},
})

// Configure the Lookup to use Levenshtein distance rather than the default
// Damerau Levenshtein calculation
s.Lookup("example", DistanceFunc(func(s1, s2 string, maxDist int) int {
	// Call the Levenshtein function from github.com/eskriett/strmet
	return strmet.Levenshtein(s1, s2, maxDist)
}))
Output:

Example (ConfigureEditDistance)
// Create a new speller
s := New()
s.AddEntry(Entry{
	Word:     "example",
	WordData: WordData{"frequency": 1},
})

// Lookup exact matches, i.e. edit distance = 0
suggestions, _ := s.Lookup("eample", EditDistance(0))
fmt.Printf("Suggestions are: %v\n", suggestions)
Output:

Suggestions are: []
Example (ConfigureSortFunc)
// Create a new speller
s := New()
s.AddEntry(Entry{
	Word:     "example",
	WordData: WordData{"frequency": 1},
})

// Configure suggestions to be sorted solely by their frequency
s.Lookup("example", SortFunc(func(sl SuggestionList) {
	sort.Slice(sl, func(i, j int) bool {
		s1Freq := sl[i].WordData.GetFrequency()
		s2Freq := sl[j].WordData.GetFrequency()
		return s1Freq < s2Freq
	})
}))
Output:

func (*Spell) RemoveEntry

func (s *Spell) RemoveEntry(word string) bool

RemoveEntry removes a entry from the dictionary. Returns true if the entry was removed, false otherwise

func (*Spell) Save

func (s *Spell) Save(filename string) error

Save a representation of spell to disk at filename

func (*Spell) Segment

func (s *Spell) Segment(input string, opts ...SegmentOption) (*SegmentResult, error)

Segment takes an input string which may have word concatenations, and attempts to divide it into the most likely set of words by adding spaces at the most appropriate positions.

Accepts zero or more SegmentOption that can be used to configure how segmentation occurs

Example
// Create a new speller
s := New()

wd := WordData{"frequency": 1}
s.AddEntry(Entry{Word: "the", WordData: wd})
s.AddEntry(Entry{Word: "quick", WordData: wd})
s.AddEntry(Entry{Word: "brown", WordData: wd})
s.AddEntry(Entry{Word: "fox", WordData: wd})

// Segment a string with word concatenated together
segmentResult, _ := s.Segment("thequickbrownfox")
fmt.Println(segmentResult)
Output:

the quick brown fox

type Suggestion

type Suggestion struct {
	// The distance between this suggestion and the input word
	Distance int
	Entry
}

Suggestion is used to represent a suggested word from a lookup.

type SuggestionList

type SuggestionList []Suggestion

SuggestionList is a slice of Suggestion

func (SuggestionList) GetWords

func (s SuggestionList) GetWords() []string

GetWords returns a string slice of words for the suggestions

func (SuggestionList) String

func (s SuggestionList) String() string

String returns a string representation of the SuggestionList.

type WordData

type WordData map[string]interface{}

WordData stores metadata about a word, for example its frequency.

func (WordData) GetFrequency

func (w WordData) GetFrequency() int

GetFrequency returns the frequency of a word, i.e. how many times it's been seen

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL