lexer

package
v0.0.0-...-bf3c7c9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2023 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package lexer contains a lexer for the Hydra parser. The lexer takes the output produced by the scanner and organizes the characters into labeled tokens. A token consists of a symbol, the physical file location of that symbol (expressed as a range from one line and column to another line and column, exclusive), and the semantic value of that token (e.g., a string token, TokString, will have the decoded and de-escaped value of that string literal as its semantic value).

To perform its work, the lexer relies on recognizers, which implement the Recognizer interface defined in recognizers.go. This vastly simplifies the task of unit testing the lexer by allowing the code that recognizes individual token types to be mocked out for the testing, and allows the recognizers to be handled in isolation. The specific structure of the breakdown is needed because recognizers are not 100% isolated: a string with flags will be passed through to the recognizer for identifiers, so it needs to be able to interface with the recognizer for strings.

The lexer is incredibly flexible, owing to the use of a Profile (see hydra/parser/common.Profile). This allows string flags, string escapes, string quote characters, keywords, and operators to be dynamically specified, and even changed on the fly. This capability means that one lexer may be used to process different versions of the Hydra language without needing to write a custom lexer for each, or to introduce ad-hoc complications to the lexer to accommodate them.

Index

Constants

View Source
const (
	NumInt   uint8 = 1 << iota // Number may be an integer
	NumFloat                   // Number may be a float
	NumWhole                   // Collecting the whole part of a float/int
	NumFract                   // Collecting the fraction
	NumExp                     // Collecting the exponent
	NumSign                    // Sign allowed next

	NumType  = NumInt | NumFloat            // Number type
	NumState = NumWhole | NumFract | NumExp // Number state
)

Flags that define tracking data needed by the number recognizer.

View Source
const (
	SkipLeadFF uint8 = 1 << iota // Skip leading form feeds
	SkipNL                       // Skip newlines as well
)

Flags that may be given to skipSpaces.

Variables

View Source
var NumFlags = utils.FlagSet8{
	NumInt:   "integer",
	NumFloat: "float",
	NumWhole: "whole state",
	NumFract: "fraction state",
	NumExp:   "exponent state",
	NumSign:  "sign allowed",
}

NumFlags provides a mapping between number flags and the string describing them.

View Source
var SkipFlags = utils.FlagSet8{
	SkipLeadFF: "skip leading form feeds",
	SkipNL:     "skip newlines",
}

SkipFlags is a mapping of skip flags to names.

Functions

func Lex

func Lex(opts *common.Options, s common.Scanner) (common.Lexer, error)

Lex prepares a new lexer from the parser options and the scanner. If the scanner is nil, one will be constructed from the options.

Types

type RecogInit

type RecogInit func(l *lexer) Recognizer

RecogInit is a function that initializes a recognizer. It will be passed the lexer object, and must return a Recognizer.

type Recognizer

type Recognizer interface {
	// Recognize is called to recognize a lexical construct.  Will
	// be called with the first character, and should push zero or
	// more tokens onto the lexer's tokens queue.
	Recognize(ch common.AugChar)
}

Recognizer is a type describing recognizers. A recognizer is initialized with the lexer object and implements the logic necessary to recognize a sequence of characters from the scanner.

Note: some recognizers implement additional state; for instance, the string recognizer has state designed to interact with the recognizer for identifiers, to allow string flags to be recognized and processed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL