whitespacepretokenizer

package

v0.2.0 Latest Latest Go to latest Published: Dec 12, 2020 License: BSD-2-Clause Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/nlpodyssey/gotokenizers

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
type WhiteSpacePreTokenizer
- func New(r *regexp2.Regexp) *WhiteSpacePreTokenizer
- func NewDefault() *WhiteSpacePreTokenizer
- func (w *WhiteSpacePreTokenizer) PreTokenize(pts *pretokenizedstring.PreTokenizedString) error

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultWordRegexp = regexp2.MustCompile(`\w+|[^\w\s]+`, regexp2.IgnoreCase|regexp2.Multiline)

(readonly)

Functions ¶

This section is empty.

Types ¶

type WhiteSpacePreTokenizer ¶

type WhiteSpacePreTokenizer struct {
	// contains filtered or unexported fields
}

WhiteSpacePreTokenizer allows the generation of pre-tokens made by distinct groups of unicode letters (words) and non-letter characters (such as punctuation signs or other symbols). Whitespace-like characters are always identified as explicit tokens separators.

func New ¶

func New(r *regexp2.Regexp) *WhiteSpacePreTokenizer

New returns a new WhiteSpacePreTokenizer.

func NewDefault ¶

func NewDefault() *WhiteSpacePreTokenizer

func (*WhiteSpacePreTokenizer) PreTokenize ¶

func (w *WhiteSpacePreTokenizer) PreTokenize(pts *pretokenizedstring.PreTokenizedString) error

PreTokenize splits the NormalizedString into word and non-word groups separated by whitespace-like characters.

Source Files ¶

View all Source files

whitespacepretokenizer.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL