spellchecker

package module
v0.0.0-...-e6a8834 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2023 License: MIT Imports: 12 Imported by: 0

README

Spellchecker

Yet another spellchecker written in go.

Features:

  • very small database: approximately 1mb for 30,000 unique words
  • average time to fix one word ~35μs
  • about 70-74% accuracy in Peter Norvig's test sets (see benchmarks)

Installation

$ go get -v github.com/f1monkey/spellchecker

Usage

func main() {
	// Create new instance
	sc, err := spellchecker.New(spellchecker.Alphabet{
		Letters: "abcdefghijklmnopqrstuvwxyz1234567890",
		Length:  36,
	})
	if err != nil {
		panic(err)
	}

	// Read data from any io.Reader
	in, err := os.Open("data/sample.txt")
	if err != nil {
		panic(err)
	}
	sc.AddFrom(in)

	// Add some more words
	sc.Add("lock", "stock", "and", "two", "smoking", "barrels")

	// Check if a word is correct
	result := sc.IsCorrect("coffee")
	fmt.Println(result) // true

	// Fix one word
	fixed, err := sc.Fix("awepon")
	if err != nil && !errors.Is(err, spellchecker.ErrUnknownWord) {
		panic(err)
	}
	fmt.Println(fixed) // weapon

	// Find max=10 suggestions for a word
	matches, err := sc.Suggest("rang", 10)
	if err != nil && !errors.Is(err, spellchecker.ErrUnknownWord) {
		panic(err)
	}
	fmt.Println(matches) // [range, orange]

	// Save data to any io.Writer
	out, err := os.Create("data/out.bin")
	if err != nil {
		panic(err)
	}
	sc.Save(out)

	// Load saved data from io.Reader
	in, err = os.Open("data/out.bin")
	if err != nil {
		panic(err)
	}
	sc, err = spellchecker.Load(in)
	if err != nil {
		panic(err)
	}
}

Benchmarks

Tests are based on data from Peter Norvig's article about spelling correction

Test set 1:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^Benchmark_Norvig1$ github.com/f1monkey/spellchecker

goos: linux
goarch: amd64
pkg: github.com/f1monkey/spellchecker
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
Benchmark_Norvig1-12    	     100	  10721930 ns/op	        74.07 success_percent	       200.0 success_words	       270.0 total_words	 1085913 B/op	    2063 allocs/op
PASS
ok  	github.com/f1monkey/spellchecker	1.910s
Test set 2:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^Benchmark_Norvig2$ github.com/f1monkey/spellchecker

goos: linux
goarch: amd64
pkg: github.com/f1monkey/spellchecker
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
Benchmark_Norvig2-12    	      72	  13977916 ns/op	        70.00 success_percent	       280.0 success_words	       400.0 total_words	 1573316 B/op	    3050 allocs/op
PASS
ok  	github.com/f1monkey/spellchecker	1.874s

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultAlphabet = Alphabet{
	Letters: "abcdefghijklmnopqrstuvwxyz",
	Length:  26,
}
View Source
var ErrUnknownWord = fmt.Errorf("unknown word")

Functions

This section is empty.

Types

type Alphabet

type Alphabet struct {
	// Letters to use in alphabet. Duplicates are not allowed
	Letters string
	// Length bit count to encode alphabet
	// If it is less than rune count in letters then
	// several letters will be encoded as one bit.
	// It decreases database size for a bit
	// but drastically reduces search performance in large dictionaries
	Length int
}

type Doc

type Doc struct {
	Word  string
	Count int
}

type OptionFunc

type OptionFunc func(m *Spellchecker) error

OptionFunc option setter

func WithSplitter

func WithSplitter(f bufio.SplitFunc) OptionFunc

WithSplitter set splitter func for AddFrom() reader

type Spellchecker

type Spellchecker struct {
	// contains filtered or unexported fields
}

func Load

func Load(reader io.Reader) (*Spellchecker, error)

Load reads spellchecker data from the provided reader and decodes it

func New

func New(alphabet Alphabet, opts ...OptionFunc) (*Spellchecker, error)

func (*Spellchecker) Add

func (m *Spellchecker) Add(words ...string)

Add adds provided words to dictionary

func (*Spellchecker) AddFrom

func (m *Spellchecker) AddFrom(input io.Reader) error

AddFrom reads input, splits it with spellchecker splitter func and adds words to dictionary

func (*Spellchecker) Fix

func (s *Spellchecker) Fix(word string) (string, error)

func (*Spellchecker) IsCorrect

func (s *Spellchecker) IsCorrect(word string) bool

IsCorrect check if provided word is in the dictionary

func (*Spellchecker) Save

func (m *Spellchecker) Save(w io.Writer) error

Save encodes spellchecker data and writes it to the provided writer

func (*Spellchecker) Suggest

func (s *Spellchecker) Suggest(word string, n int) ([]string, error)

Suggest find top n suggestions for the word

func (*Spellchecker) WithOpts

func (s *Spellchecker) WithOpts(opts ...OptionFunc) error

WithOpt set spellchecker options

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL