lex

package

v0.7.0 Latest Latest Go to latest Published: Jan 14, 2020 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

Documentation ¶

Overview ¶

Package lex contains base types to tokenize text for the linter. See sub packages for specific implementations. Folx by default uses the GLexer in package ggl.

Index ¶

type Document
type EntityType
type Lexer
type PartOfSpeechType
type Sentence
type Token
type TokenNode
- func NewTokenNode(id int, token *Token) TokenNode
- func (t TokenNode) ID() int
- func (t *TokenNode) SetID(id int)
type TokenType

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Document ¶

type Document struct {
	// IANA name for the content encoding
	Encoding string

	// Text to be analysed by the Lexer
	Content string
}

Document describes the content for lexical analysis

type EntityType ¶

type EntityType int

EntityType describes the type of entity represented by the token. TODO: add further entities as required e.g. Organisation.

const (
	// UnknownEntity identifies a token with no entity.
	UnknownEntity EntityType = iota
	// Person identfiies a person entity.
	Person
)

type Lexer ¶

type Lexer interface {
	Init(context.Context, *Document) error
	Next() (*Token, error)
	GetExecTime() time.Duration
	GetDocument() *Document
}

Lexer performs a lexical analysis on a source text to return tokens.

type PartOfSpeechType ¶

type PartOfSpeechType int

PartOfSpeechType describes the lexical tag of a word. TODO: add further tags as required e.g. Verb.

const (
	// UnknownPOS identifies a token with no part of speech tag.
	UnknownPOS PartOfSpeechType = iota
	// Noun identifies a word token that is a noun.
	Noun
	// Adjective identfiied a token that is an adjective.
	Adjective
)

type Sentence ¶

type Sentence struct {
	Offset int
	Text   string
}

Sentence contains 1 or more tokens within Text.

type Token ¶

type Token struct {
	Type         TokenType
	PartOfSpeech PartOfSpeechType
	Entity       EntityType
	Offset       int
	Text         string
	Lemma        string
	Sentence     Sentence
	Adjectives   []*Token
}

Token describes a discrete lexical element in a source text, returned by Lexer.Next.

type TokenNode ¶

type TokenNode struct {
	Token *Token
	// contains filtered or unexported fields
}

TokenNode stores a token in a graph with an ID.

func NewTokenNode ¶

func NewTokenNode(id int, token *Token) TokenNode

Factory for new TokenNodes.

func (TokenNode) ID ¶

func (t TokenNode) ID() int

ID implements Node interface of gonum.org/v1/gonum/graph.

func (*TokenNode) SetID ¶

func (t *TokenNode) SetID(id int)

SetID updates the ID of the node.

type TokenType ¶

type TokenType int

TokenType describes the type of token the lexa will return.

const (
	// EOF identifies the end of the source.
	EOF TokenType = iota
	// Word identifies a single word token.
	Word
	// NonWord identifies a non-word token such as punctuation or a number.
	NonWord
)

Source Files ¶

View all Source files

lex.go

Directories ¶

Path	Synopsis
ggl Package ggl contains an implementation of lex.Lexer using cloud.google.com/natural-language/.	Package ggl contains an implementation of lex.Lexer using cloud.google.com/natural-language/.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL