porterstemmer

package module

v1.0.1 Latest Latest Go to latest Published: Jan 1, 2014 License: MIT Imports: 2 Imported by: 37

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/reiver/go-porterstemmer

Links

Open Source Insights

README ¶

Go Porter Stemmer

A native Go clean room implementation of the Porter Stemming Algorithm.

This algorithm is of interest to people doing Machine Learning or Natural Language Processing (NLP).

This is NOT a port. This is a native Go implementation from the human-readable description of the algorithm.

I've tried to make it (more) efficient by NOT internally using string's, but instead internally using []rune's and using the same (array) buffer used by the []rune slice (and sub-slices) at all steps of the algorithm.

For Porter Stemmer algorithm, see:

http://tartarus.org/martin/PorterStemmer/def.txt (URL #1)

http://tartarus.org/martin/PorterStemmer/ (URL #2)

Departures

Also, since when I initially implemented it, it failed the tests at...

http://tartarus.org/martin/PorterStemmer/voc.txt (URL #3)

http://tartarus.org/martin/PorterStemmer/output.txt (URL #4)

... after reading the human-readble text over and over again to try to figure out what the error I made was (and doing all sorts of things to debug it) I came to the conclusion that the some of these tests were wrong according to the human-readable description of the algorithm.

This led me to wonder if maybe other people's code that was passing these tests had rules that were not in the human-readable description. Which led me to look at the source code here...

http://tartarus.org/martin/PorterStemmer/c.txt (URL #5)

... When I looked there I noticed that there are some items marked as a "DEPARTURE", which differ from the original algorithm. (There are 2 of these.)

I implemented these departures, and the tests at URL #3 and URL #4 all passed.

Usage

To use this Golang library, use with something like:

package main

import (
  "fmt"
  "github.com/reiver/go-porterstemmer"
)

func main() {
  
  word := "Waxes"
  
  stem := porterstemmer.StemString(word)
  
  fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
}

Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:

package main

import (
  "fmt"
  "github.com/reiver/go-porterstemmer"
)

func main() {
  
  word := []rune("Waxes")
  
  stem := porterstemmer.Stem(word)
  
  fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
}

Although NOTE that the above code may modify original slice (named "word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem" in the example above may be a sub-slice of the slice named "word".

Also alternatively, if you already know that your word is already lowercase (and you don't need this library to lowercase your word for you) you can instead use code like:

package main

import (
  "fmt"
  "github.com/reiver/go-porterstemmer"
)

func main() {
  
  word := []rune("waxes")
  
  stem := porterstemmer.StemWithoutLowerCasing(word)
  
  fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
}

Again NOTE (like with the previous example) that the above code may modify original slice (named "word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem" in the example above may be a sub-slice of the slice named "word".

Documentation ¶

Index ¶

func Stem(s []rune) []rune
func StemString(s string) string
func StemWithoutLowerCasing(s []rune) []rune
func TestHasSuffix(t *testing.T)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Stem ¶

func Stem(s []rune) []rune

func StemString ¶

func StemString(s string) string

func StemWithoutLowerCasing ¶

func StemWithoutLowerCasing(s []rune) []rune

func TestHasSuffix ¶

func TestHasSuffix(t *testing.T)

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL