spell

package module
v0.0.0-...-b5c37f2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2022 License: MIT Imports: 15 Imported by: 2

README

spell

GoDoc Go Report Card Build Status

A blazing fast spell checker written in Go.

N.B. This library is still in early development and may change.

Overview

package main

import (
	"fmt"

	"github.com/eskriett/spell"
)

func main() {
	// Create a new instance of spell
	s := spell.New()

	// Add words to the dictionary. Words require a frequency, but can have
	// other arbitrary metadata associated with them
	s.AddEntry(spell.Entry{
		Frequency: 100,
		Word:      "two",
		WordData: spell.WordData{
			"type": "number",
		},
	})
	s.AddEntry(spell.Entry{
		Frequency: 1,
		Word:      "town",
		WordData: spell.WordData{
			"type": "noun",
		},
	})

	// Lookup a misspelling, by default the "best" suggestion will be returned
	suggestions, _ := s.Lookup("twon")
	fmt.Println(suggestions)
	// -> [two]

	suggestion := suggestions[0]

	// Get the frequency from the suggestion
	fmt.Println(suggestion.Frequency)
	// -> 100

	// Get metadata from the suggestion
	fmt.Println(suggestion.WordData["type"])
	// -> number

	// Get multiple suggestions during lookup
	suggestions, _ = s.Lookup("twon", spell.SuggestionLevel(spell.LevelAll))
	fmt.Println(suggestions)
	// -> [two, town]

	// Save the dictionary
	s.Save("dict.spell")

	// Load the dictionary
	s2, _ := spell.Load("dict.spell")

	suggestions, _ = s2.Lookup("twon", spell.SuggestionLevel(spell.LevelAll))
	fmt.Println(suggestions)
	// -> [two, town]

	// Spell supports word segmentation
	s3 := spell.New()

	s3.AddEntry(spell.Entry{Frequency: 1, Word: "the"})
	s3.AddEntry(spell.Entry{Frequency: 1, Word: "quick"})
	s3.AddEntry(spell.Entry{Frequency: 1, Word: "brown"})
	s3.AddEntry(spell.Entry{Frequency: 1, Word: "fox"})

	segmentResult, _ := s3.Segment("thequickbrownfox")
	fmt.Println(segmentResult)
	// -> the quick brown fox

	// Spell supports multiple dictionaries
	s4 := spell.New()

	s4.AddEntry(spell.Entry{Word: "épeler"}, spell.DictionaryName("french"))
	suggestions, _ = s4.Lookup("épeler", spell.DictionaryOpts(
		spell.DictionaryName("french"),
	))
	fmt.Println(suggestions)
	// -> [épeler]
}

Credits

Spell makes use of a symmetric delete algorithm and is loosely based on the SymSpell implementation.

Documentation

Overview

Package spell provides fast spelling correction and string segmentation

Index

Examples

Constants

View Source
const (
	// LevelBest will yield 'best' suggestion.
	LevelBest suggestionLevel = iota

	// LevelClosest will yield closest suggestions.
	LevelClosest

	// LevelAll will yield all suggestions.
	LevelAll
)

Suggestion Levels used during Lookup.

Variables

This section is empty.

Functions

This section is empty.

Types

type DictionaryOption

type DictionaryOption func(*dictOptions) error

DictionaryOption is a function that controls the dictionary being used. An error will be returned if a dictionary option is invalid.

func DictionaryName

func DictionaryName(name string) DictionaryOption

DictionaryName defines the name of the dictionary that should be used when storing, deleting, looking up words, etc. If not set, the default dictionary will be used.

type Entry

type Entry struct {
	Frequency uint64 `json:",omitempty"`
	Word      string
	WordData  WordData `json:",omitempty"`
}

Entry represents a word in the dictionary.

type LookupOption

type LookupOption func(*lookupParams) error

LookupOption is a function that controls how a Lookup is performed. An error will be returned if the LookupOption is invalid.

func DictionaryOpts

func DictionaryOpts(opts ...DictionaryOption) LookupOption

DictionaryOpts accepts multiple DictionaryOption and controls what dictionary should be used during lookup.

func DistanceFunc

func DistanceFunc(df func([]rune, []rune, int) int) LookupOption

DistanceFunc accepts a function, f(str1, str2, maxDist), which calculates the distance between two strings. It should return -1 if the distance between the strings is greater than maxDist.

func EditDistance

func EditDistance(dist uint32) LookupOption

EditDistance allows the max edit distance to be set for the Lookup. Reducing the edit distance will improve lookup performance.

func PrefixLength

func PrefixLength(prefixLength uint32) LookupOption

PrefixLength defines how much of the input word should be used for the lookup.

func SortFunc

func SortFunc(sf func(SuggestionList)) LookupOption

SortFunc allows the sorting of the SuggestionList to be configured. By default, suggestions will be sorted by their edit distance, then their frequency.

func SuggestionLevel

func SuggestionLevel(level suggestionLevel) LookupOption

SuggestionLevel defines how many results are returned for the lookup. See the package constants for the levels available.

type Segment

type Segment struct {
	Input string
	Entry *Entry
	Word  string
}

Segment contains details about an individual segment.

type SegmentOption

type SegmentOption func(*segmentParams) error

SegmentOption is a function that controls how a Segment is performed. An error will be returned if the SegmentOption is invalid.

func SegmentLookupOpts

func SegmentLookupOpts(opt ...LookupOption) SegmentOption

SegmentLookupOpts allows the Lookup() options for the current segmentation to be configured.

type SegmentResult

type SegmentResult struct {
	Distance int
	Segments []Segment
}

SegmentResult holds the result of a call to Segment().

func (SegmentResult) GetWords

func (s SegmentResult) GetWords() []string

GetWords returns a string slice of words for the segments.

func (SegmentResult) String

func (s SegmentResult) String() string

String returns a string representation of the SegmentList.

type Spell

type Spell struct {
	// The max number of deletes that will be performed to each word in the
	// dictionary
	MaxEditDistance uint32

	// The prefix length that will be examined
	PrefixLength uint32
	// contains filtered or unexported fields
}

Spell provides access to functions for spelling correction.

func Load

func Load(filename string) (*Spell, error)

Load a dictionary from disk from filename. Returns a new Spell instance on success, or will return an error if there's a problem reading the file.

func New

func New() *Spell

New creates a new spell instance.

func (*Spell) AddEntry

func (s *Spell) AddEntry(de Entry, opts ...DictionaryOption) (bool, error)

AddEntry adds an entry to the dictionary. If the word already exists its data will be overwritten. Returns true if a new word was added, false otherwise. Will return an error if there was a problem adding a word.

Example
package main

import (
	"fmt"

	"github.com/eskriett/spell"
)

func main() {
	// Create a new speller
	s := spell.New()

	// Add a new word, "example" to the dictionary
	_, _ = s.AddEntry(spell.Entry{
		Frequency: 10,
		Word:      "example",
	})

	// Overwrite the data for word "example"
	_, _ = s.AddEntry(spell.Entry{
		Frequency: 100,
		Word:      "example",
	})

	// Output the frequency for word "example"
	entry, _ := s.GetEntry("example")
	fmt.Printf("Output for word 'example' is: %v\n",
		entry.Frequency)
}
Output:

Output for word 'example' is: 100

func (*Spell) GetEntry

func (s *Spell) GetEntry(word string, opts ...DictionaryOption) (*Entry, error)

GetEntry returns the Entry for word. If a word does not exist, nil will be returned.

func (*Spell) GetLongestWord

func (s *Spell) GetLongestWord() uint32

GetLongestWord returns the length of the longest word in the dictionary.

func (*Spell) Lookup

func (s *Spell) Lookup(input string, opts ...LookupOption) (SuggestionList, error)

Lookup takes an input and returns suggestions from the dictionary for that word. By default, it will return the best suggestion for the word if it exists.

Accepts zero or more LookupOption that can be used to configure how lookup occurs.

Example
package main

import (
	"fmt"

	"github.com/eskriett/spell"
)

func main() {
	// Create a new speller
	s := spell.New()
	_, _ = s.AddEntry(spell.Entry{
		Frequency: 1,
		Word:      "example",
	})

	// Perform a default lookup for example
	suggestions, _ := s.Lookup("eample")
	fmt.Printf("Suggestions are: %v\n", suggestions)
}
Output:

Suggestions are: [example]
Example (ConfigureDistanceFunc)
package main

import (
	"github.com/eskriett/spell"
	"github.com/eskriett/strmet"
)

func main() {
	// Create a new speller
	s := spell.New()
	_, _ = s.AddEntry(spell.Entry{
		Frequency: 1,
		Word:      "example",
	})

	// Configure the Lookup to use Levenshtein distance rather than the default
	// Damerau Levenshtein calculation
	_, _ = s.Lookup("example", spell.DistanceFunc(func(r1, r2 []rune, maxDist int) int {
		// Call the Levenshtein function from github.com/eskriett/strmet
		return strmet.LevenshteinRunes(r1, r2, maxDist)
	}))
}
Output:

Example (ConfigureEditDistance)
package main

import (
	"fmt"

	"github.com/eskriett/spell"
)

func main() {
	// Create a new speller
	s := spell.New()
	_, _ = s.AddEntry(spell.Entry{
		Frequency: 1,
		Word:      "example",
	})

	// Lookup exact matches, i.e. edit distance = 0
	suggestions, _ := s.Lookup("eample", spell.EditDistance(0))
	fmt.Printf("Suggestions are: %v\n", suggestions)
}
Output:

Suggestions are: []
Example (ConfigureSortFunc)
package main

import (
	"sort"

	"github.com/eskriett/spell"
)

func main() {
	// Create a new speller
	s := spell.New()
	_, _ = s.AddEntry(spell.Entry{
		Frequency: 1,
		Word:      "example",
	})

	// Configure suggestions to be sorted solely by their frequency
	_, _ = s.Lookup("example", spell.SortFunc(func(sl spell.SuggestionList) {
		sort.Slice(sl, func(i, j int) bool {
			return sl[i].Frequency < sl[j].Frequency
		})
	}))
}
Output:

func (*Spell) RemoveEntry

func (s *Spell) RemoveEntry(word string, opts ...DictionaryOption) (bool, error)

RemoveEntry removes a entry from the dictionary. Returns true if the entry was removed, false otherwise.

func (*Spell) Save

func (s *Spell) Save(filename string) error

Save a representation of spell to disk at filename.

func (*Spell) Segment

func (s *Spell) Segment(input string, opts ...SegmentOption) (*SegmentResult, error)

Segment takes an input string which may have word concatenations, and attempts to divide it into the most likely set of words by adding spaces at the most appropriate positions.

Accepts zero or more SegmentOption that can be used to configure how segmentation occurs.

Example
package main

import (
	"fmt"

	"github.com/eskriett/spell"
)

func main() {
	// Create a new speller
	s := spell.New()

	_, _ = s.AddEntry(spell.Entry{Frequency: 1, Word: "the"})
	_, _ = s.AddEntry(spell.Entry{Frequency: 1, Word: "quick"})
	_, _ = s.AddEntry(spell.Entry{Frequency: 1, Word: "brown"})
	_, _ = s.AddEntry(spell.Entry{Frequency: 1, Word: "fox"})

	// Segment a string with word concatenated together
	segmentResult, _ := s.Segment("thequickbrownfox")
	fmt.Println(segmentResult)
}
Output:

the quick brown fox

type Suggestion

type Suggestion struct {
	// The distance between this suggestion and the input word
	Distance int
	Entry
}

Suggestion is used to represent a suggested word from a lookup.

type SuggestionList

type SuggestionList []Suggestion

SuggestionList is a slice of Suggestion.

func (SuggestionList) GetWords

func (s SuggestionList) GetWords() []string

GetWords returns a string slice of words for the suggestions.

func (SuggestionList) String

func (s SuggestionList) String() string

String returns a string representation of the SuggestionList.

type WordData

type WordData map[string]interface{}

WordData stores metadata about a word.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL