golem

package module
v4.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2020 License: MIT Imports: 3 Imported by: 17

README

GoLem

This project is a dictionary based lemmatizer written in go.

Since v4 all dictionaries need to be gotten individually.

go get github.com/aaaton/golem/v4
What?

A lemmatizer is a tool that finds the base form of words.

Lang Input Output
English aligning align
Swedish sprungit springa
French abattaient abattre

It's based on the dictionaries found on michmech/lemmatization-lists, which are available under the Open Database License. This project would not be feasible without them.

Languages

At the moment golem supports English, Swedish, French, Spanish, Italian & German, but adding another language should be no more trouble than getting the dictionary for that language. Some of which are already available on lexiconista. Please let me know if there is something you would like to see in here, or fork the project and create a pull request.

English

go get github.com/aaaton/golem/v4/dicts/en

Swedish

go get github.com/aaaton/golem/v4/dicts/sv

French

go get github.com/aaaton/golem/v4/dicts/fr

German

go get github.com/aaaton/golem/v4/dicts/de

Spanish

go get github.com/aaaton/golem/v4/dicts/es

Italian

go get github.com/aaaton/golem/v4/dicts/it
Basic usage
package main

import (
	"github.com/aaaton/golem/v4"
	"github.com/aaaton/golem/v4/dicts/en"
)

func main() {
	// the language packages are available under golem/dicts
	// "en" is for english
	lemmatizer, err := golem.New(en.New())
	if err != nil {
		panic(err)
	}
	word := lemmatizer.Lemma("Abducting")
	if word != "abduct" {
		panic("The output is not what is expected!")
	}
}
Contributors
  • axamon
  • charlesgiroux
  • glaslos

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type LanguagePack

type LanguagePack interface {
	GetResource() ([]byte, error)
	GetLocale() string
}

LanguagePack is what each language should implement

type Lemmatizer

type Lemmatizer struct {
	// contains filtered or unexported fields
}

Lemmatizer is the key to lemmatizing a word in a language

func New

func New(pack LanguagePack) (*Lemmatizer, error)

New produces a new Lemmatizer

func (*Lemmatizer) InDict

func (l *Lemmatizer) InDict(word string) bool

InDict checks if a certain word is in the dictionary

func (*Lemmatizer) Lemma

func (l *Lemmatizer) Lemma(word string) string

Lemma gets one of the base forms of a word

func (*Lemmatizer) LemmaLower

func (l *Lemmatizer) LemmaLower(word string) string

LemmaLower gets one of the base forms of a lower case word expects `word` to be lowercased

func (*Lemmatizer) Lemmas

func (l *Lemmatizer) Lemmas(word string) (out []string)

Lemmas gets all the base forms of a word, if multiple exist

Directories

Path Synopsis
cmd
dicts
de Module
en Module
es Module
fr Module
it Module
ru Module
sv Module
uk Module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL