lex

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2020 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package lex contains base types to tokenize text for the linter. See sub packages for specific implementations. Folx by default uses the GLexer in package ggl.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	// IANA name for the content encoding
	Encoding string

	// Text to be analysed by the Lexer
	Content string
}

Document describes the content for lexical analysis

type EntityType

type EntityType int

EntityType describes the type of entity represented by the token. TODO: add further entities as required e.g. Organisation.

const (
	// UnknownEntity identifies a token with no entity.
	UnknownEntity EntityType = iota
	// Person identfiies a person entity.
	Person
)

type Lexer

type Lexer interface {
	Init(context.Context, *Document) error
	Next() (*Token, error)
	GetExecTime() time.Duration
	GetDocument() *Document
}

Lexer performs a lexical analysis on a source text to return tokens.

type PartOfSpeechType

type PartOfSpeechType int

PartOfSpeechType describes the lexical tag of a word. TODO: add further tags as required e.g. Verb.

const (
	// UnknownPOS identifies a token with no part of speech tag.
	UnknownPOS PartOfSpeechType = iota
	// Noun identifies a word token that is a noun.
	Noun
	// Adjective identfiied a token that is an adjective.
	Adjective
)

type Sentence

type Sentence struct {
	Offset int
	Text   string
}

Sentence contains 1 or more tokens within Text.

type Token

type Token struct {
	Type         TokenType
	PartOfSpeech PartOfSpeechType
	Entity       EntityType
	Offset       int
	Text         string
	Lemma        string
	Sentence     Sentence
	Adjectives   []*Token
}

Token describes a discrete lexical element in a source text, returned by Lexer.Next.

type TokenNode

type TokenNode struct {
	Token *Token
	// contains filtered or unexported fields
}

TokenNode stores a token in a graph with an ID.

func NewTokenNode

func NewTokenNode(id int, token *Token) TokenNode

Factory for new TokenNodes.

func (TokenNode) ID

func (t TokenNode) ID() int

ID implements Node interface of gonum.org/v1/gonum/graph.

func (*TokenNode) SetID

func (t *TokenNode) SetID(id int)

SetID updates the ID of the node.

type TokenType

type TokenType int

TokenType describes the type of token the lexa will return.

const (
	// EOF identifies the end of the source.
	EOF TokenType = iota
	// Word identifies a single word token.
	Word
	// NonWord identifies a non-word token such as punctuation or a number.
	NonWord
)

Directories

Path Synopsis
Package ggl contains an implementation of lex.Lexer using cloud.google.com/natural-language/.
Package ggl contains an implementation of lex.Lexer using cloud.google.com/natural-language/.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL