gospell

package module
v0.0.0-...-90dfc71 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 6, 2016 License: MIT Imports: 11 Imported by: 15

README

gospell

Build Status Go Report Card GoDoc Coverage license

pure golang spelling dictionary based on hunspell dictionaries.

NOTE: I'm not an expert in linguistics nor spelling. Help is very welcome!

What is hunspell?

NOTE: This is not affiliated with Hunspell although if they wanted merge it in as an official project, I'd be happy to donate the code (although it's in no shape to do so right now).

Where can I get English dictionaries?

The world of spelling dictionaries is surprisingly complicated, as "lists of words" are frequently proprietary and with conflicting software licenses.

Kevin Atkinson

Kevin Atkinson maintains many open source lists via the SCOWL project. The source code and raw lists are available on GitHub kevina/wordlist

Marco A.G.Pinto

Marco maintains the released dictionaries for Firefox and Apache Open Office. The word lists appears to be actively updated.

https://github.com/marcoagpinto/aoo-mozilla-en-dict

Open Office

http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice

The downloaded file has a .oxt extension but it's a compressed tar file. Extract the files using:

mkdir dict-en
cd dict-en
tar -xzf ../dict-en.oxt
Chromium

The Chrome/Chromium browser uses Hunspell and it's source tree contains various up-to-date dictionaries, some with additional words. You can view them at chromium.googlesource.com and you can check them out locally via

git clone --depth=1 https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries

More information can be found in the chromium developer guide

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CaseVariations

func CaseVariations(word string, style WordCase) []string

CaseVariations returns If AllUpper or First-Letter-Only is upcased: add the all upper case version If AllLower, add the original, the title and upcase forms If Mixed, return the original, and the all upcase form

func RemovePath

func RemovePath(s string) string

RemovePath attempts to strip away embedded file system paths, e.g.

/foo/bar or /static/myimg.png

TODO: windows style

func RemoveURL

func RemoveURL(s string) string

RemoveURL attempts to strip away obvious URLs

Types

type Affix

type Affix struct {
	Type         AffixType // either PFX or SFX
	CrossProduct bool
	Rules        []Rule
}

Affix is a rule for affix (adding prefixes or suffixes)

func (Affix) Expand

func (a Affix) Expand(word string, out []string) []string

Expand provides all variations of a given word based on this affix rule

type AffixType

type AffixType int

AffixType is either an affix prefix or suffix

const (
	Prefix AffixType = iota
	Suffix
)

specific Affix types

type DictConfig

type DictConfig struct {
	Flag              string
	TryChars          string
	WordChars         string
	NoSuggestFlag     rune
	IconvReplacements []string
	Replacements      [][2]string
	AffixMap          map[rune]Affix
	CamelCase         int
	CompoundMin       int
	CompoundOnly      string
	CompoundRule      []string
	// contains filtered or unexported fields
}

DictConfig is a partial representation of a Hunspell AFF (Affix) file.

func NewDictConfig

func NewDictConfig(file io.Reader) (*DictConfig, error)

NewDictConfig reads an Hunspell AFF file

func (DictConfig) Expand

func (a DictConfig) Expand(wordAffix string, out []string) ([]string, error)

Expand expands a word/affix using dictionary/affix rules

This also supports CompoundRule flags

type Diff

type Diff struct {
	Filename string
	Path     string
	Original string
	Line     string
	LineNum  int
}

Diff represent a unknown word in a file

func SpellFile

func SpellFile(gs *GoSpell, ext plaintext.Extractor, raw []byte) []Diff

SpellFile is attempts to spell-check a file. This interface is not very good so expect changes.

type GoSpell

type GoSpell struct {
	Config DictConfig
	Dict   map[string]struct{} // likely will contain some value later
	// contains filtered or unexported fields
}

GoSpell is main struct

func NewGoSpell

func NewGoSpell(affFile, dicFile string) (*GoSpell, error)

NewGoSpell from AFF and DIC Hunspell filenames

func NewGoSpellReader

func NewGoSpellReader(aff, dic io.Reader) (*GoSpell, error)

NewGoSpellReader creates a speller from io.Readers for Hunspell files

func (*GoSpell) AddWordList

func (s *GoSpell) AddWordList(r io.Reader) ([]string, error)

AddWordList adds basic word lists, just one word per line

Assumed to be in UTF-8

TODO: hunspell compatible with "*" prefix for forbidden words and affix support returns list of duplicated words and/or error

func (*GoSpell) AddWordListFile

func (s *GoSpell) AddWordListFile(name string) ([]string, error)

AddWordListFile reads in a word list file

func (*GoSpell) AddWordRaw

func (s *GoSpell) AddWordRaw(word string) bool

AddWordRaw adds a single word to the internal dictionary without modifications returns true if added return false is already exists

func (*GoSpell) InputConversion

func (s *GoSpell) InputConversion(raw []byte) string

InputConversion does any character substitution before checking

This is based on the ICONV stanza

func (*GoSpell) Spell

func (s *GoSpell) Spell(word string) bool

Spell checks to see if a given word is in the internal dictionaries TODO: add multiple dictionaries

func (*GoSpell) Split

func (s *GoSpell) Split(text string) []string

Split a text into Words

type Rule

type Rule struct {
	Strip     string
	AffixText string // suffix or prefix text to add
	Pattern   string // original matching pattern from AFF file
	// contains filtered or unexported fields
}

Rule is a Affix rule

type Splitter

type Splitter struct {
	// contains filtered or unexported fields
}

Splitter splits a text into words Highly likely this implementation will change so we are encapsulating.

func NewSplitter

func NewSplitter(chars string) *Splitter

NewSplitter creates a new splitter. The input is a string in UTF-8 encoding. Each rune in the string will be considered to be a valid word character. Runes that are NOT here are deemed a word boundary Current implementation uses https://golang.org/pkg/strings/#FieldsFunc

func (*Splitter) Split

func (s *Splitter) Split(in string) []string

Split is the function to split an input into a `[]string`

type WordCase

type WordCase int

WordCase is an enum of various word casing styles

const (
	AllLower WordCase = iota
	AllUpper
	Title
	Mixed
	Camel
)

Various WordCase types.. likely to be not correct

func CaseStyle

func CaseStyle(word string) WordCase

CaseStyle returns what case style a word is in

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL