lexer

package module
v0.0.0-...-77992e8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 16, 2014 License: BSD-2-Clause Imports: 6 Imported by: 3

README

About go-lexer

Package lexer provides a simple scanner and types for handrolling lexers. The implementation is based on Rob Pike's talk.

http://www.youtube.com/watch?v=HxaD_trXwRE

Documentation

Prerequisites

Install Go.

Installation

go get github.com/bmatsuo/go-lexer

General Documentation

Use go doc to vew the documentation for go-lexer

go doc github.com/bmatsuo/go-lexer

Or alternatively, visit the GoPkgDoc url.

Author

Bryan Matsuo <bryan.matsuo [at] gmail.com>

Copyright (c) 2012, Bryan Matsuo. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

Documentation

Overview

Package lexer provides a simple scanner and types for handrolling lexers. The implementation is based on Rob Pike's talk.

http://www.youtube.com/watch?v=HxaD_trXwRE

There are some key differences to Pike's presented code. Next has been renamed Advance to be more idiomatic with Backup. Next is used by the parser to retrieve items from the lexer.

Two APIs

The Lexer type has two APIs, one is used byte StateFn types. The other is called by the parser. These APIs are called the scanner and the parser APIs here.

The parser API

The only function the parser calls on the lexer is Next to retreive the next token from the input stream. Eventually an item with type ItemEOF is returned at which point there are no more tokens in the stream.

The scanner API

The lexer uses Emit to construct complete lexemes to return from future/concurrent calls to Next by the parser. The scanner uses a combination of methods to manipulate its position and and prepare lexemes to be emitted. Lexer errors are emitted to the parser using the Errorf method which keeps the scanner-parser interface uniform.

Common lexer methods used in a scanner are the Accept[Run][Range] family of methods. Accept* methods take a set and advance the lexer if incoming runes are in the set. The AcceptRun* subfamily advance the lexer as far as possible.

For scanning known sequences of bytes (e.g. keywords) the AcceptString method avoids a lot of branching that would be incurred using methods that match character classes.

The remaining methods provide low level functionality that can be combined to address corner cases.

Index

Examples

Constants

View Source
const EOF rune = 0x04

Variables

This section is empty.

Functions

func IsEOF

func IsEOF(c rune, n int) bool

IsEOF returns true if n is zero.

func IsInvalid

func IsInvalid(c rune, n int) bool

IsInvalid returns true if c is utf8.RuneError and n is 1.

Types

type Error

type Error Item

Error is an item of type ItemError

func (*Error) Error

func (err *Error) Error() string

type Item

type Item struct {
	Type  ItemType
	Pos   int
	Value string
}

An individual scanned item (a lexeme).

func (*Item) Err

func (i *Item) Err() error

Err returns the error corresponding to i, if one exists.

func (*Item) String

func (i *Item) String() string

String returns the raw lexeme of i.

type ItemType

type ItemType uint16

A type for all the types of items in the language being lexed.

const (
	ItemEOF ItemType = math.MaxUint16 - iota
	ItemError
)

Special item types.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer contains an input string and state associate with the lexing the input.

Example (Advance)

This example shows a trivial parser using Advance, the lowest level lexer function. The parser decodes a serialization format for test status messages generated by a hypothetical test suite. The rune '.' is translated using the format "%d success", the rune '!' is translated using "%d failure".

// delare token types as constants
const (
	itemOK lexer.ItemType = iota
	itemFail
)

// create a StateFn to parse the language.
var start lexer.StateFn
start = func(lex *lexer.Lexer) lexer.StateFn {
	c, n := lex.Advance()
	if lexer.IsEOF(c, n) {
		return nil
	}
	if lexer.IsInvalid(c, n) {
		return lex.Errorf("invalid utf-8 rune")
	}
	switch c {
	case '.':
		lex.Emit(itemOK)
	case '!':
		lex.Emit(itemFail)
	default:
		// lex.Backup() does not need to be called even though lex.Pos()
		// points at the next rune. The position of the error is the start
		// of the current lexeme (in this case the unexpected rune we just
		// read).
		return lex.Errorf("unexpected rune %q", c)
	}
	return start
}

// create a parser for the language.
parse := func(input string) ([]string, error) {
	lex := lexer.New(start, input)
	var status []string
	for {
		item := lex.Next()
		err := item.Err()
		if err != nil {
			return nil, fmt.Errorf("%v (pos %d)", err, item.Pos)
		}
		switch item.Type {
		case lexer.ItemEOF:
			return status, nil
		case itemOK:
			status = append(status, fmt.Sprintf("%d success", item.Pos))
		case itemFail:
			status = append(status, fmt.Sprintf("%d failure", item.Pos))
		default:
			panic(fmt.Sprintf("unexpected item %0x (pos %d)", item.Type, item.Pos))
		}
	}
}

// parse a valid string and print the status
status, err := parse(".!")
fmt.Printf("%q %v\n", status, err)

// parse an invalid string and print the error
status, err = parse("!.!?.")
fmt.Printf("%q %v\n", status, err)
Output:

["0 success" "1 failure"] <nil>
[] unexpected rune '?' (pos 3)

func New

func New(start StateFn, input string) *Lexer

Create a new lexer. Must be given a non-nil state.

func (*Lexer) Accept

func (l *Lexer) Accept(valid string) (ok bool)

Accept advances the lexer if the next rune is in valid.

func (*Lexer) AcceptFunc

func (l *Lexer) AcceptFunc(fn func(rune) bool) (ok bool)

AcceptFunc advances the lexer if fn return true for the next rune.

func (*Lexer) AcceptRange

func (l *Lexer) AcceptRange(tab *unicode.RangeTable) (ok bool)

AcceptRange advances l's position if the current rune is in tab.

func (*Lexer) AcceptRun

func (l *Lexer) AcceptRun(valid string) (n int)

AcceptRun advances l's position as long as the current rune is in valid.

func (*Lexer) AcceptRunFunc

func (l *Lexer) AcceptRunFunc(fn func(rune) bool) int

AcceptRunFunc advances l's position as long as fn returns true for the next input rune.

func (*Lexer) AcceptRunRange

func (l *Lexer) AcceptRunRange(tab *unicode.RangeTable) (n int)

AcceptRunRange advances l's possition as long as the current rune is in tab.

func (*Lexer) AcceptString

func (l *Lexer) AcceptString(s string) (ok bool)

AcceptString advances the lexer len(s) bytes if the next len(s) bytes equal s. AcceptString returns true if l advanced.

func (*Lexer) Advance

func (l *Lexer) Advance() (rune, int)

Advance adds one rune of input to the current lexeme, increments the lexer's position, and returns the input rune with its size in bytes (encoded as UTF-8). Invalid UTF-8 codepoints cause the current call and all subsequent calls to return (utf8.RuneError, 1). If there is no input the returned size is zero.

func (*Lexer) Backup

func (l *Lexer) Backup()

Backup removes the last rune from the current lexeme and moves l's position back in the input string accordingly. Backup should only be called after a call to Advance.

func (*Lexer) Current

func (l *Lexer) Current() string

Current returns the contents of the item currently being lexed.

func (*Lexer) Emit

func (l *Lexer) Emit(t ItemType)

Emit the current value as an Item with the specified type.

func (*Lexer) Errorf

func (l *Lexer) Errorf(format string, vs ...interface{}) StateFn

Errorf causes an error item to be emitted from l.Next(). The item's value (and its error message) are the result of evaluating format and vs with fmt.Sprintf.

func (*Lexer) Ignore

func (l *Lexer) Ignore()

Ignore throws away the current lexeme.

func (*Lexer) Input

func (l *Lexer) Input() string

Input returns the input string being lexed by the l.

func (*Lexer) Last

func (l *Lexer) Last() (r rune, width int)

Last return the last rune read from the input stream.

func (*Lexer) Next

func (l *Lexer) Next() (i *Item)

The method by which items are extracted from the input. Returns nil if the lexer has entered a nil state.

func (*Lexer) Peek

func (l *Lexer) Peek() (rune, int)

Peek returns the next rune in the input stream without adding it to the current lexeme.

func (*Lexer) Pos

func (l *Lexer) Pos() int

Pos marks the next byte to be read in the input string. The behavior of Pos is unspecified if an error previously occurred or if all input has been consumed.

func (*Lexer) Start

func (l *Lexer) Start() int

Start marks the first byte of item currently being lexed.

type StateFn

type StateFn func(*Lexer) StateFn

StateFn functions scan runes from the lexer's input and emit items. A StateFn is responsible for emitting ItemEOF after input has been consumed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL