Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type CharacterClass ¶
type CharacterClass interface { // Matches returns whether the given rune is matched by this character // class. Matches(rune) bool String() string }
CharacterClass is an interface providing methods for matching runes.
Implementations are
lexer.StringCharacterClass lexer.NotStringCharacterClass
Both of these can be used to define constant character classes. See their documentation for more information.
type Lexer ¶
type Lexer interface { // StartLexing will cause the lexer to start pushing tokens onto the token // stream. See the documentation of the implementing struct for information // on how to use this. StartLexing() // TokenStream returns the token stream that the lexer will push tokens // onto. TokenStream() token.Stream // Emit pushes a token of the given type with its position and all consumes // runes onto the token stream. Emit(token.Type) // EmitError emits an error token with the given error token type (that was // defined by you) and a given error message. EmitError(token.Type, string) // IsEOF determines whether the lexer has already reached the end of the // input. IsEOF() bool // Peek reads the next rune, but does not consume it. Peek does not advance // the lexer position in the input. Peek() rune // Next reads the next rune and consumes it. Next advances the lexer // position in the input by the byte-width of the read rune. Next() rune // Ignore discards all consumes runes. This behaves like Emit(...), except // it doesn't create/push a token onto the token stream. Ignore() // Backup unreads the last consumed rune. Backup() // Accept consumes the next rune, if and only if it is matched by the given // character class. Accept returns true if the next rune was matched and // consumed. Accept(CharacterClass) bool // AcceptMultiple consumes the next N runes that are matched by the given // character class. AcceptMultiple returns the amount of runes that were // matched. AcceptMultiple(CharacterClass) uint }
Lexer is an interface providing all nececssary methods for lexing text. There is a default implementation for UTF-8 input, which can be used as follows.
func main() { l := lexer.New(input, lexRoot) _ = l }
See the examples in the godoc for more information.
type NotStringCharacterClass ¶
type NotStringCharacterClass string
NotStringCharacterClass is an implementation of lexer.CharacterClass, which matches runes that are NOT contained in the string used to define the character class.
const WhitespaceNoLinefeed = lexer.StringCharacterClass(" \t") // will match all runes that are neither ' ' nor '\t'
func (NotStringCharacterClass) Matches ¶
func (s NotStringCharacterClass) Matches(r rune) bool
Matches returns true if the given rune is NOT contained inside the definition of this character class.
func (NotStringCharacterClass) String ¶
func (s NotStringCharacterClass) String() string
type State ¶
State is a recursive definition of lexer states, that are executed non-recursively.
See the following example. The goal is, to lex strings that match exactly ABC. This should be tokenized into three tokens, TokenA, TokenB and TokenC. The following example shows, how to define states to achieve this (without error handling, just to show the sequence).
const ( TokenA MyTokenType = iota TokenB TokenC ) const ( CCA = lexer.StringCharacterClass("A") CCB = lexer.StringCharacterClass("B") CCC = lexer.StringCharacterClass("C") ) func lexABCString(l lexer.Lexer) lexer.State { return lexA } func lexA(l lexer.Lexer) lexer.State { l.Accept(CCA) l.Emit(TokenA) return lexB } func lexB(l lexer.Lexer) lexer.State { l.Accept(CCB) l.Emit(TokenB) return lexC } func lexC(l lexer.Lexer) lexer.State { l.Accept(CCC) l.Emit(TokenC) return nil }
The lexer will start with lexABCString (assuming that this is the start State that you passed when creating the lexer), which will be executed. The lexer passed in is the lexer you are working with. lexABCString does nothing with the lexer, and returns lexA as next state. The lexer will execute lexA next. The lexer passed in is the same lexer as for lexABCString. lexA accepts an "A", emits a TokenA and returns lexB. The lexer will now execute lexB, which does almost the same as lexA. lexB the returns lexC, which will cause the lexer to execute lexC next. lexC returns nil as next state, which tells the lexer that the state machine is done, and it will stop execution and close the token stream.
type StringCharacterClass ¶
type StringCharacterClass string
StringCharacterClass is an implementation of lexer.CharacterClass, which matches runes that are contained in the string used to define the character class.
const WhitespaceNoLinefeed = lexer.StringCharacterClass(" \t") // will match all runes that are either ' ' or '\t'
func (StringCharacterClass) Matches ¶
func (s StringCharacterClass) Matches(r rune) bool
Matches returns true if the given rune is contained inside the definition of this character class.
func (StringCharacterClass) String ¶
func (s StringCharacterClass) String() string