tokenizer

package
v0.0.0-...-6217932 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 31, 2016 License: GPL-3.0 Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

Functions

func LowercaseFilter

func LowercaseFilter(s string) (string, bool)

LowercaseFilter turns a token into lower case.

func RemoveDiacritics

func RemoveDiacritics(s string) string

RemoveDiacritics drops diacritic marks from the input string, using a Unicode NFKD normalization.

func RemoveDiacriticsFilter

func RemoveDiacriticsFilter(s string) (string, bool)

RemoveDiacriticsFilter drops diacritics from the token.

func Tokenize

func Tokenize(s string) []string

func TrackNumFilter

func TrackNumFilter(s string) (string, bool)

TrackNumFilter drops tokens that look like track numbers (small two-digit zero-padded values), which frequently end up in song metadata.

func WhitespaceTokenizer

func WhitespaceTokenizer(s string) []string

WhitespaceTokenizer splits a string into alphanumeric tokens.

Types

type FilterFunc

type FilterFunc func(string) (string, bool)

func MinLengthFilter

func MinLengthFilter(minLength int) FilterFunc

MinLengthFilter returns a FilterFunc which selects tokens that meet a minimum length.

type Tokenizer

type Tokenizer struct {
	T       TokenizerFunc
	Filters []FilterFunc
}

func NewTokenizer

func NewTokenizer(tokenizer TokenizerFunc, filters ...FilterFunc) *Tokenizer

func (*Tokenizer) Tokenize

func (t *Tokenizer) Tokenize(s string) []string

Tokenize applies the analysis chain to the provided string.

type TokenizerFunc

type TokenizerFunc func(string) []string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL