porter

package module
v0.0.0-...-7097357 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 3, 2013 License: MIT Imports: 3 Imported by: 2

README

Porter Stemmer for Go

This is a fairly straighforward port of Martin Porter's C implementation of the Porter stemming algorithm. The C version this port is based on is available for download here: http://tartarus.org/~martin/PorterStemmer/c_thread_safe.txt

The original algorithm is described in the paper:

M.F. Porter, 1980, An algorithm for suffix stripping, Program, 14(3) pp
130-137.

While the internal implementation and interface is nearly identical to the original implementation, the Go interface is much simplified. The stemmer can be called as follows:

import "porter"
...
stemmed := porter.Stem(word_to_stem)

Installing

go get github.com/a2800276/porter

to use the stemmer when installed using goinstall, import:

import "github.com/a2800276/porter"

Limitations

While the implementation is fairly robust, this is a work in progress. In particular, a new interface will likely be provided to prevent excessive conversions between strings and []byte. Currently, on calling Stem the string argument is converted to a byte slice which the algorithm works on and is converted back into a string before returning.

Also, the implementation is not particularly robust at handling Unicode input, currently, only bytes with the high bit set are ignored. It's up to the caller to make sure the string contains only ASCII characters. Since the algorithm itself operates on English words only, this doens't restrict the functionality, but it is nuisance.

TODO:

  • byte slice API to void roundtripping to string and back

Documentation

Overview

The package `porter` implements the Porter stemming algorithm, following, for all pratical purposes, the algorithm published in:

Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
no. 3, pp 130-137

For more information on the alorithm itself, see:

http://tartarus.org/~martin/PorterStemmer/

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Stem

func Stem(word string) string

Stem the parameter word, returns the stemmed term.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL