lexer

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 31, 2020 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package lexer is a subpackage of ptk that contains various implementations of the ILexer, an interface describing lexers, along with related support types and code such as Token. A lexer is an object that can be queried for the next token in a token stream; generally, a lexer takes a character stream such as that generated by the scanners in the scanner subpackage and groups them together into words with semantic meaning. Its return value is a Token, which contains the token type, its location (to be used for error reporting), and any meaning associated with the token, such as the numerical value of numeric literals.

Index

Constants

View Source
const ChanLexerSize = 20

ChanLexerSize is the size of the input channel.

View Source
const TrackAll = -1

TrackAll is a special value for the max argument to BackTracker.SetMax that indicates the desire to track all characters.

Variables

This section is empty.

Functions

This section is empty.

Types

type BackTracker added in v0.4.0

type BackTracker struct {
	Src scanner.Scanner // The source scanner
	// contains filtered or unexported fields
}

BackTracker is an implementation of scanner.Scanner that includes backtracking capability. A BackTracker wraps another scanner.Scanner (including another instance of BackTracker), but provides additional methods for controlling backtracking.

func NewBackTracker

func NewBackTracker(src scanner.Scanner, max int) *BackTracker

NewBackTracker wraps another scanner (which may also be a BackTracker, if desired) in a BackTracker. The max parameter indicates the maximum number of characters to track; use 0 to track no characters, and TrackAll to track all characters.

func (*BackTracker) Accept added in v0.4.0

func (bt *BackTracker) Accept(leave int)

Accept accepts characters from the backtracking queue, leaving only the specified number of characters on the queue.

func (*BackTracker) BackTrack added in v0.4.0

func (bt *BackTracker) BackTrack()

BackTrack resets to the beginning of the backtracking queue.

func (*BackTracker) Len added in v0.4.0

func (bt *BackTracker) Len() int

Len returns the number of characters saved so far on the backtracking queue.

func (*BackTracker) More added in v0.4.0

func (bt *BackTracker) More() bool

More is used to determine if there are any more characters available for Next to return, given the current state of the BackTracker.

func (*BackTracker) Next added in v0.5.0

func (bt *BackTracker) Next() (ch scanner.Char, err error)

Next returns the next character from the stream as a Char, which will include the character's location. If an error was encountered, that will also be returned.

func (*BackTracker) Pos added in v0.4.0

func (bt *BackTracker) Pos() int

Pos returns the position of the most recently returned character within the saved character list.

func (*BackTracker) SetMax added in v0.4.0

func (bt *BackTracker) SetMax(max int)

SetMax allows updating the maximum number of characters to allow backtracking over. Setting a TrackAll value will allow all newly returned characters to be backtracked over. If the new value for max is less than the previous value, characters at the front of the backtracking queue will be discarded to bring the size down to max.

type BaseState added in v0.5.0

type BaseState struct {
	Cls Classifier // The classifier for the lex
}

BaseState is a basic implementation of the State interface. It assumes a fixed Classifier for the lifetime of the lexer's operation.

func (*BaseState) Classifier added in v0.5.0

func (bs *BaseState) Classifier() Classifier

Classifier must return the classifier to use. It is safe for the application to return different Classifier implementations depending on the lexer state.

type ChanLexer added in v0.5.0

type ChanLexer struct {
	Chan chan *Token // The input channel
}

ChanLexer is a trivial implementation of Lexer that uses a channel to retrieve tokens. It implements an extra Push method, that allows pushing tokens onto the lexer, as well as a Done method to signal the lexer that all tokens have been pushed.

func NewChanLexer added in v0.5.0

func NewChanLexer() *ChanLexer

NewChanLexer returns a ChanLexer

func (*ChanLexer) Done added in v0.5.0

func (q *ChanLexer) Done()

Done indicates to the lexer that there will be no more tokens pushed onto the queue.

func (*ChanLexer) Next added in v0.5.0

func (q *ChanLexer) Next() *Token

Next returns the next token. At the end of the lexer, a nil should be returned.

func (*ChanLexer) Push added in v0.5.0

func (q *ChanLexer) Push(tok *Token) (ok bool)

Push pushes a token onto the lexer. It returns true if the push was successful; it will return false if Done has been called.

type Classifier

type Classifier interface {
	// Classify takes a lexer, a state, and a backtracking scanner
	// and determines one or more recognizers to extract a token
	// or a set of tokens from the lexer input.
	Classify(lexer *Lexer) []Recognizer

	// Error is called by the lexer if all recognizers returned by
	// Classify return without success.
	Error(lexer *Lexer)
}

Classifier represents a character classification tool. A classifier has a Classify method that takes the lexer, the state, and a backtracker, and returns a list of recognizers, which the lexer then runs in order until one of them succeeds.

type IBackTracker added in v0.5.0

type IBackTracker interface {
	scanner.Scanner

	// More is used to determine if there are any more characters
	// available for Next to return, given the current state of
	// the BackTracker.
	More() bool

	// SetMax allows updating the maximum number of characters to
	// allow backtracking over.  Setting a TrackAll value will
	// allow all newly returned characters to be backtracked over.
	// If the new value for max is less than the previous value,
	// characters at the front of the backtracking queue will be
	// discarded to bring the size down to max.
	SetMax(max int)

	// Accept accepts characters from the backtracking queue,
	// leaving only the specified number of characters on the
	// queue.
	Accept(leave int)

	// Len returns the number of characters saved so far on the
	// backtracking queue.
	Len() int

	// Pos returns the position of the most recently returned
	// character within the saved character list.
	Pos() int

	// BackTrack resets to the beginning of the backtracking
	// queue.
	BackTrack()
}

IBackTracker is an interface for a backtracker, a scanner.Scanner that also provides the ability to back up to an earlier character in the stream.

type ILexer added in v0.5.0

type ILexer interface {
	// Next returns the next token.  At the end of the lexer, a
	// nil should be returned.
	Next() *Token
}

ILexer presents a stream of tokens. The basic lexer does not provide token pushback.

func NewAsyncLexer added in v0.5.0

func NewAsyncLexer(ts ILexer) ILexer

NewAsyncLexer wraps another lexer and uses the ChanLexer to allow running that other lexer in a separate goroutine.

type Lexer

type Lexer struct {
	Scanner IBackTracker // The character source, wrapped in a BackTracker
	State   State        // The state of the lexer
	// contains filtered or unexported fields
}

Lexer is an implementation of ILexer.

func New

func New(src scanner.Scanner, state State) *Lexer

New constructs a new Lexer using the provided source and state.

func (*Lexer) Next added in v0.5.0

func (l *Lexer) Next() *Token

Next returns the next token. At the end of the lexer, a nil should be returned.

func (*Lexer) Push added in v0.5.0

func (l *Lexer) Push(tok *Token) bool

Push pushes a token onto the list of tokens to be returned by the lexer. Recognizers should call this method with the token or tokens that they recognize from the input.

type ListLexer added in v0.5.0

type ListLexer struct {
	// contains filtered or unexported fields
}

ListLexer is an implementation of Lexer that is initialized with a list of tokens, and simply returns the tokens in sequence.

func NewListLexer added in v0.5.0

func NewListLexer(toks []*Token) *ListLexer

NewListLexer returns a Lexer that retrieves its tokens from a list passed to the function. This actually uses a ChanLexer under the covers.

func (*ListLexer) Next added in v0.5.0

func (lts *ListLexer) Next() *Token

Next returns the next token. At the end of the lexer, a nil should be returned.

type Recognizer

type Recognizer interface {
	// Recognize is called by the lexer on the objects returned by
	// the Classifier.  Each will be called in turn until one of
	// the methods returns a boolean true value.  If no recognizer
	// returns true, or if the Classifier returns an empty list,
	// then the Error recognizer will be called, if one is
	// declared, after which the character will be discarded.  The
	// Recognize method will be called with the lexer, the state,
	// and a backtracking scanner.
	Recognize(lexer *Lexer) bool
}

Recognizer describes a recognizer. A recognizer is an object returned by the Classify method of a Classifier; its Recognize method will be passed the lexer, the state, and a backtracker, and it should read input from the backtracker until it has a complete lexeme (think "word" in your grammar). Assuming that lexeme is a valid token (a comment or a run of whitespace would not be), the Recognize method should then use Lexer.Push to push one or more tokens.

type State

type State interface {
	// Classifier must return the classifier to use.  It is safe
	// for the application to return different Classifier
	// implementations depending on the lexer state.
	Classifier() Classifier
}

State represents the state of the lexer. This is an interface; an implementation must be provided by the user. A base implementation is available as BaseState.

type Token added in v0.5.0

type Token struct {
	Type  string           // The type of token
	Loc   scanner.Location // The location of the token
	Value interface{}      // The semantic value of the token; optional
	Text  string           // The original text of the token; optional
}

Token represents a single token emitted by the lexical analyzer. A token has an associated symbol, a location, and optionally the original text and a semantic value.

func (*Token) Location added in v0.5.0

func (t *Token) Location() scanner.Location

Location returns the node's location range.

func (*Token) String added in v0.5.0

func (t *Token) String() string

String returns a string describing the node. This should include the location range that encompasses all of the node's tokens.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL