lexer

package module
v0.2.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 5, 2016 License: MIT Imports: 5 Imported by: 1

README

lexer

GoDoc

A basic lexer toolkit

The design is essentially Rob Pike's lexer from his talk "Lexical Scanning in Go" See https://www.youtube.com/watch?v=HxaD_trXwRE

Documentation

Overview

Package Lexer implements a simple lexing toolkit.

Index

Examples

Constants

View Source
const Eof rune = -1

This is returned by next when there are no more characters to read.

This is returned when a bad rune is encountered.

View Source
const MaxEmitsInFunction = 10

The maximum number of emits in a single state function when using Token. If this number has been reached, Token returns a StateError. If you wish to emit more than this, use the Go method to read tokens off the channel directly.

Variables

This section is empty.

Functions

This section is empty.

Types

type Channel

type Channel <-chan Token

Generates tokens asynchronously. See Lexer.Go

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Generates tokens synchronously. See Lexer.Iterate

func (Iterator) Token

func (it Iterator) Token() (token Token)

Get a Token from the Lexer. Please note that only 10 tokens can be emitted in a single state function. If you wish to emit more per function, use the Go method.

type LexInner

type LexInner struct {
	// contains filtered or unexported fields
}

LexInner is the inner type which is used within StateFn to do the actual lexing.

func (*LexInner) Accept

func (l *LexInner) Accept(valid string) bool

Read one character, but only if it is one of the characters in the given string.

func (*LexInner) AcceptRun

func (l *LexInner) AcceptRun(valid string) (acceptnum int)

Read as many characters as possible, but only characters that exist in the given string.

func (*LexInner) Back

func (l *LexInner) Back()

Undo the last Next. This is probably won't work after calling any other lexer functions. If you need to undo more, use Mark and Unmark.

func (*LexInner) Bytes added in v0.2.5

func (l *LexInner) Bytes(number int) bool

Consume the given number of bytes. Returns true if successful, false if there are not enough bytes.

func (*LexInner) Emit

func (l *LexInner) Emit(typ TokenType)

Emit the gathered token, given its type. Emits the result of ReplaceGet, then calls Ignore.

func (*LexInner) EmitEof

func (l *LexInner) EmitEof() StateFn

Emit a token of type TokenEOF. Returns nil.

func (*LexInner) EmitString

func (l *LexInner) EmitString(typ TokenType, str string)

Emit a token with the given type and string.

func (*LexInner) Eof

func (l *LexInner) Eof() bool

Return true if the lexer has reached the end of the file.

func (*LexInner) Errorf

func (l *LexInner) Errorf(format string, args ...interface{}) StateFn

Emit an Error token. Like EmitEof, Errorf returns nil.

func (*LexInner) Except

func (l *LexInner) Except(valid string) bool

Read one character, but only if it is NOT one of the characters in the given string. If Eof or Err is reached, Except fails regardless of what the given string is.

func (*LexInner) ExceptRun

func (l *LexInner) ExceptRun(valid string) (acceptnum int)

Read as many characters as possible, but only characters that do NOT exist in the given string. If Eof is reached, ExceptRun stops as though it found a successful character. Thus, ExceptRun("") accepts everything until Eof. or Err.

func (*LexInner) Find

func (l *LexInner) Find(valid string) bool

Accepts things until the first occurence of the given string. The string itself is not accepted.

func (*LexInner) Get

func (l *LexInner) Get() string

Get the string of the token gathered so far.

func (*LexInner) Ignore

func (l *LexInner) Ignore()

Ignore everything gathered about the token so far. Also removes any Replaces.

func (*LexInner) Last

func (l *LexInner) Last() rune

Get the last character accepted into the token.

func (*LexInner) Len

func (l *LexInner) Len() int

Return the length of the token gathered so far.

func (*LexInner) Mark

func (l *LexInner) Mark() Mark

Store the state of the lexer.

func (*LexInner) Next

func (l *LexInner) Next() (char rune)

Read a single character. If there are no more characters, it will return Eof. If a non-utf8 character is read, it will return Err.

func (*LexInner) One

func (l *LexInner) One(f func(rune) bool) bool

Accept a single character and return true if f returns true. Otherwise, do nothing and return false.

func (*LexInner) Peek

func (l *LexInner) Peek() rune

Spy on the upcoming rune.

func (*LexInner) Replace added in v0.2.2

func (l *LexInner) Replace(start Mark, with string)

Replace the text from the start Mark to the current position with the given string. With may be a different length than the string being replaced, but this change will not be reflected by functions like Len and Get. Call ReplaceGet to get the token including its replaces. This is how it will be sent by Emit. The replace is part of the current Mark, so Unmarking to before a replace was done will remove the replace.

func (*LexInner) ReplaceGet added in v0.2.2

func (l *LexInner) ReplaceGet() string

Get the current token with all replaces included. This can be expensive, if you have many replaces. Without any replaces, it is identical to Get.

func (*LexInner) Retry

func (l *LexInner) Retry()

Retry everything since starting this token.

func (*LexInner) Run

func (l *LexInner) Run(f func(rune) bool) (acceptnum int)

Reads characters and feeds them to the given function, and keeps reading until it returns false.

func (*LexInner) Skip

func (l *LexInner) Skip(n int) int

Read n characters. Returns the number of characters read. If it returns less than n, it will have reached EOF.

func (*LexInner) String

func (l *LexInner) String(valid string) bool

Attempt to read a string. Only if the entire string is successfully accepted does it return true. If only a part of the string was matched, none of it is.

func (*LexInner) Unmark

func (l *LexInner) Unmark(mark Mark)

Recover the state of the lexer.

func (*LexInner) Warningf

func (l *LexInner) Warningf(format string, args ...interface{})

Emit a Warning token.

func (*LexInner) Whitespace

func (l *LexInner) Whitespace(except string) (acceptnum int)

Accepts any whitespace (unicode.IsSpace), except for whitespace in except. For instance, Whitespace("\n") will accept all whitespace except newlines. Returns the number of runes read.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer is the external type which emits tokens.

Example
package main

import (
	"fmt"
	"unicode"

	"github.com/PieterD/lexer"
)

const (
	tokenComment lexer.TokenType = 1 + iota
	tokenVariable
	tokenAssign
	tokenNumber
	tokenString
)

func main() {
	text := `
/* comment */
pie=314
// comment
string = "Hello world!"
`
	l := lexer.New("filename", text, state_base)
	tokenchan := l.Go()
	for token := range tokenchan {
		fmt.Printf("%s:%d [%d]\"%s\"\n", token.File, token.Line, token.Typ, token.Val)
	}
}

// Start parsing with this.
func state_base(l *lexer.LexInner) lexer.StateFn {
	// Ignore all whitespace.
	l.Run(unicode.IsSpace)
	l.Ignore()
	if l.String("//") {
		// We're remembering the '//' here so it gets included in the Emit
		// contained in state_comment_line.
		return state_comment_line
	}
	if l.String("/*") {
		return state_comment_block(state_base)
	}
	if l.Eof() {
		return l.EmitEof()
	}
	// It's not a comment or Eof, so it must be a variable name.
	return state_variable
}

// Parse a line comment.
func state_comment_line(l *lexer.LexInner) lexer.StateFn {
	// Eat up everything until end of line (or Eof)
	l.ExceptRun("\n")
	l.Emit(tokenComment)
	// Consume the end of line. If we reached Eof, this does nothing.
	l.Accept("\n")
	// Ignore that last newline
	l.Ignore()
	return state_base
}

// Parse a block comment.
// Since block comments may appear in different states,
// instead of defining the usual StateFn we define a function that
// returns a statefn, which in turn will return the parent state
// after its parsing is done.
func state_comment_block(parent lexer.StateFn) lexer.StateFn {
	return func(l *lexer.LexInner) lexer.StateFn {
		if !l.Find("*/") {
			// If closing statement couldn't be found, emit an error.
			// Errorf always returns nil, so parsing is done after this.
			return l.Errorf("Couldn't find end of block comment")
		}
		l.String("*/")
		l.Emit(tokenComment)
		return parent
	}
}

// Parse a variable name
func state_variable(l *lexer.LexInner) lexer.StateFn {
	if l.AcceptRun("abcdefghijklmnopqrstuvwxyz") == 0 {
		return l.Errorf("Invalid variable name")
	}
	l.Emit(tokenVariable)

	return state_operator
}

// Parse an assignment operator
func state_operator(l *lexer.LexInner) lexer.StateFn {
	l.Run(unicode.IsSpace)
	l.Ignore()
	if l.Accept("=") {
		l.Emit(tokenAssign)
		return state_value
	}
	return l.Errorf("Only '=' is a valid operator")
}

// Parse a value
func state_value(l *lexer.LexInner) lexer.StateFn {
	l.Run(unicode.IsSpace)
	l.Ignore()
	if l.AcceptRun("0123456789") > 0 {
		l.Emit(tokenNumber)
		return state_base
	}
	if l.Accept("\"") {
		return state_string
	}
	return l.Errorf("Unidentified value")
}

// Parse a string
func state_string(l *lexer.LexInner) lexer.StateFn {
	for {
		l.ExceptRun("\"\\")
		// Now we're either at a ", a \, or Eof.
		if l.Accept("\"") {
			l.Emit(tokenString)
			return state_base
		}
		if l.Accept("\\") {
			if !l.Accept("nrt\"'\\") {
				return l.Errorf("Invalid escape sequence: \"\\%c\"", l.Last())
			}
		}
		if l.Eof() {
			return l.Errorf("No closing '\"' found")
		}
	}
}
Output:

filename:2 [1]"/* comment */"
filename:3 [2]"pie"
filename:3 [3]"="
filename:3 [4]"314"
filename:4 [1]"// comment"
filename:5 [2]"string"
filename:5 [3]"="
filename:5 [5]""Hello world!""
filename:5 [-3]"EOF"

func New

func New(name string, input string, start_state StateFn) *Lexer

Create a new lexer.

func (*Lexer) Go

func (ln *Lexer) Go() Channel

Spawn a goroutine which keeps sending tokens on the returned channel, until TokenEmpty would be encountered. If Go or Iterate has already been called, it will return nil.

func (*Lexer) Iterate

func (ln *Lexer) Iterate() *Iterator

Where Go starts a goroutine, Iterate returns an iterator. When using an Iterator, only MaxEmitsInFunction emits may be done in any single state function, or an error will be reported. If Go or Iterate has already been called, it will return nil.

type Mark

type Mark struct {
	// contains filtered or unexported fields
}

The Mark type (used by Mark and Unmark) can be used to save the current state of the lexer, and restore it later.

type Replacer added in v0.2.2

type Replacer struct {
	// contains filtered or unexported fields
}

type StateFn

type StateFn func(*LexInner) StateFn

StateFn is a function that takes a LexInner and returns a StateFn.

type Token

type Token struct {
	Typ  TokenType
	Val  string
	File string
	Line int
}

Tokens are emitted by the lexer. They contained a (usually) user-defined Typ, the Value of the token, and the Filename and Line number where the token was generated.

func (Token) String

func (i Token) String() string

Return a simple string representation of the value contained within the token.

type TokenType

type TokenType int

TokenType is an integer representing the type of token that has been emitted. Most TokenTypes will be user-defined, and those that are must be greater than 0. Other than TokenEmpty, which is read when there is absolutely nothing left to read or when the channel is closed, the package-defined Error, Warning and EOF tokens are only generated by emitting them manually, or by evoking their corresponding Emit* functions.

const (
	// TokenEmpty is the TokenType with value 0.
	// Any zero-valued token will have this as its Typ.
	// It is also returned when the lexer has stopped (by an error, or Eof)
	TokenEmpty TokenType = -iota
	// TokenError is the Typ for errors reported by, for example, Lexer.Errorf.
	TokenError
	// TokenWarning is the Typ for warnings.
	TokenWarning
	// TokenEOF should be returned once per file, when the end of file has been reached.
	// This is not done automatically!
	TokenEOF
)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL