tokenize

package

v0.0.0-...-0d25092 Latest Latest Go to latest Published: May 18, 2018 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/korobool/nlp4go

Links

Open Source Insights

Documentation ¶

Index ¶

type EnglishContractions
- func NewEnglishContractions() *EnglishContractions
- func (c *EnglishContractions) Expand(token *SentenceToken) ([]*SentenceToken, bool)
type LangContractions
type SentenceToken
- func NewSentenceToken(str []rune, posStart, posEnd int) *SentenceToken
- func (t *SentenceToken) Equals(compare *SentenceToken) bool
- func (t *SentenceToken) String() string
type TBWordTokenizer
- func NewTBWordTokenizer(normalize, checkContr bool, langContr LangContractions) *TBWordTokenizer
- func (t *TBWordTokenizer) Tokenize(s []rune) []*SentenceToken
type TokenExtractor

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type EnglishContractions ¶

type EnglishContractions struct {
	// contains filtered or unexported fields
}

func NewEnglishContractions ¶

func NewEnglishContractions() *EnglishContractions

func (*EnglishContractions) Expand ¶

func (c *EnglishContractions) Expand(token *SentenceToken) ([]*SentenceToken, bool)

type LangContractions ¶

type LangContractions interface {
	Expand(*SentenceToken) ([]*SentenceToken, bool)
}

type SentenceToken ¶

type SentenceToken struct {
	Text          []rune `json:"text"`
	PosStart      int    `json:"pos_start"`
	PosEnd        int    `json:"pos_end"`
	IsQuoteStart  bool   `json:"is_quote_start"`
	IsQuoteEnd    bool   `json:"is_quote_end"`
	IsEllipsis    bool   `json:"is_ellipsis"`
	HasApostrophe bool   `json:"has_apostrophe"`
}

func NewSentenceToken ¶

func NewSentenceToken(str []rune, posStart, posEnd int) *SentenceToken

func (*SentenceToken) Equals ¶

func (t *SentenceToken) Equals(compare *SentenceToken) bool

func (*SentenceToken) String ¶

func (t *SentenceToken) String() string

type TBWordTokenizer ¶

type TBWordTokenizer struct {
	LangContractions  LangContractions
	ExpandContrations bool
	Normalize         bool
	// contains filtered or unexported fields
}

Mimics TreeBank word tokenizer without using mass of regexps

func NewTBWordTokenizer ¶

func NewTBWordTokenizer(normalize, checkContr bool, langContr LangContractions) *TBWordTokenizer

func (*TBWordTokenizer) Tokenize ¶

func (t *TBWordTokenizer) Tokenize(s []rune) []*SentenceToken

type TokenExtractor ¶

type TokenExtractor func([]rune, int) (*SentenceToken, bool)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL