lexmachine: github.com/timtadh/lexmachine Index | Files | Directories

package lexmachine

import "github.com/timtadh/lexmachine"

Package lexmachine is a full lexical analysis framework for the Go programming language. It supports a restricted but usable set of regular expressions appropriate for writing lexers for complex programming languages. The framework also supports sub-lexers and non-regular lexing through an "escape hatch" which allows the users to consume any number of further bytes after a match. So if you want to support nested C-style comments or other paired structures you can do so at the lexical analysis stage.

For a tutorial see http://hackthology.com/writing-a-lexer-in-go-with-lexmachine.html

Example of defining a lexer

// CreateLexer defines a lexer for the graphviz dot language.
func CreateLexer() (*lexmachine.Lexer, error) {
    lexer := lexmachine.NewLexer()

    for _, lit := range Literals {
        r := "\\" + strings.Join(strings.Split(lit, ""), "\\")
        lexer.Add([]byte(r), token(lit))
    }
    for _, name := range Keywords {
        lexer.Add([]byte(strings.ToLower(name)), token(name))
    }

    lexer.Add([]byte(`//[^\n]*\n?`), token("COMMENT"))
    lexer.Add([]byte(`/\*([^*]|\r|\n|(\*+([^*/]|\r|\n)))*\*+/`), token("COMMENT"))
    lexer.Add([]byte(`([a-z]|[A-Z])([a-z]|[A-Z]|[0-9]|_)*`), token("ID"))
    lexer.Add([]byte(`"([^\\"]|(\\.))*"`), token("ID"))
    lexer.Add([]byte("( |\t|\n|\r)+"), skip)
    lexer.Add([]byte(`\<`),
        func(scan *lexmachine.Scanner, match *machines.Match) (interface{}, error) {
            str := make([]byte, 0, 10)
            str = append(str, match.Bytes...)
            brackets := 1
            match.EndLine = match.StartLine
            match.EndColumn = match.StartColumn
            for tc := scan.TC; tc < len(scan.Text); tc++ {
                str = append(str, scan.Text[tc])
                match.EndColumn += 1
                if scan.Text[tc] == '\n' {
                    match.EndLine += 1
                }
                if scan.Text[tc] == '<' {
                    brackets += 1
                } else if scan.Text[tc] == '>' {
                    brackets -= 1
                }
                if brackets == 0 {
                    match.TC = scan.TC
                    scan.TC = tc + 1
                    match.Bytes = str
                    return token("ID")(scan, match)
                }
            }
            return nil,
                fmt.Errorf("unclosed HTML literal starting at %d, (%d, %d)",
                    match.TC, match.StartLine, match.StartColumn)
        },
    )

    err := lexer.Compile()
    if err != nil {
        return nil, err
    }
    return lexer, nil
}

func token(name string) lex.Action {
    return func(s *lex.Scanner, m *machines.Match) (interface{}, error) {
        return s.Token(TokenIds[name], string(m.Bytes), m), nil
    }
}

Example of using a lexer

func ExampleLex() error {
    lexer, err := CreateLexer()
    if err != nil {
        return err
    }
    scanner, err := lexer.Scanner([]byte(`digraph {
      rankdir=LR;
      a [label="a" shape=box];
      c [<label>=<<u>C</u>>];
      b [label="bb"];
      a -> c;
      c -> b;
      d -> c;
      b -> a;
      b -> e;
      e -> f;
    }`))
    if err != nil {
        return err
    }
    fmt.Println("Type    | Lexeme     | Position")
    fmt.Println("--------+------------+------------")
    for tok, err, eos := scanner.Next(); !eos; tok, err, eos = scanner.Next() {
        if err != nil {
            return err
        }
        token := tok.(*lexmachine.Token)
        fmt.Printf("%-7v | %-10v | %v:%v-%v:%v\n",
            dot.Tokens[token.Type],
            string(token.Lexeme),
            token.StartLine,
            token.StartColumn,
            token.EndLine,
            token.EndColumn)
    }
    return nil
}

Index

Package Files

doc.go lexer.go

type Action Uses

type Action func(scan *Scanner, match *machines.Match) (interface{}, error)

An Action is a function which get called when the Scanner finds a match during the lexing process. They turn a low level machines.Match struct into a token for the users program. As different compilers/interpretters/parsers have different needs Actions merely return an interface{}. This allows you to represent a token in anyway you wish. An example Token struct is provided above.

type Lexer Uses

type Lexer struct {
    // contains filtered or unexported fields
}

Lexer is a "builder" object which lets you construct a Scanner type which does the actual work of tokenizing (splitting up and categorizing) a byte string. Get a new Lexer by calling the NewLexer() function. Add patterns to match (with their callbacks) by using the Add function. Finally, construct a scanner with Scanner to tokenizing a byte string.

func NewLexer Uses

func NewLexer() *Lexer

NewLexer constructs a new lexer object.

func (*Lexer) Add Uses

func (l *Lexer) Add(regex []byte, action Action)

Add pattern to match on. When a match occurs during scanning the action function will be called by the Scanner to turn the low level machines.Match struct into a token.

func (*Lexer) Compile Uses

func (l *Lexer) Compile() error

Compile the supplied patterns to an DFA (default). You don't need to call this method (it is called automatically by Scanner). However, you may want to call this method if you construct a lexer once and then use it many times as it will precompile the lexing program.

func (*Lexer) CompileDFA Uses

func (l *Lexer) CompileDFA() error

CompileDFA compiles an DFA explicitly. This will be used by Scanners when they are created.

func (*Lexer) CompileNFA Uses

func (l *Lexer) CompileNFA() error

CompileNFA compiles an NFA explicitly. If no DFA has been created (which is only created explicitly) this will be used by Scanners when they are created.

func (*Lexer) Scanner Uses

func (l *Lexer) Scanner(text []byte) (*Scanner, error)

Scanner creates a scanner for a particular byte string from the lexer.

type Scanner Uses

type Scanner struct {
    Text []byte
    TC   int
    // contains filtered or unexported fields
}

Scanner tokenizes a byte string based on the patterns provided to the lexer object which constructed the scanner. This object works as functional iterator using the Next method.

Example

lexer, err := CreateLexer()
if err != nil {
    return err
}
scanner, err := lexer.Scanner(someBytes)
if err != nil {
    return err
}
for tok, err, eos := scanner.Next(); !eos; tok, err, eos = scanner.Next() {
    if err != nil {
        return err
    }
    fmt.Println(tok)
}

func (*Scanner) Next Uses

func (s *Scanner) Next() (tok interface{}, err error, eos bool)

Next iterates through the string being scanned returning one token at a time until either an error is encountered or the end of the string is reached. The token is returned by the tok value. An error is indicated by err. Finally, eos (a bool) indicates the End Of String when it returns as true.

Example

for tok, err, eos := scanner.Next(); !eos; tok, err, eos = scanner.Next() {
    if err != nil {
        // handle the error and exit the loop. For example:
        return err
    }
    // do some processing on tok or store it somewhere. eg.
    fmt.Println(tok)
}

One useful error type which could be returned by Next() is a match.UnconsumedInput which provides the position information for where in the text the scanning failed.

For more information on functional iterators see: http://hackthology.com/functional-iteration-in-go.html

func (*Scanner) Token Uses

func (s *Scanner) Token(typ int, value interface{}, m *machines.Match) *Token

Token is a helper function for constructing a Token type inside of a Action.

type Token Uses

type Token struct {
    Type        int
    Value       interface{}
    Lexeme      []byte
    TC          int
    StartLine   int
    StartColumn int
    EndLine     int
    EndColumn   int
}

Token is an optional token representation you could use to represent the tokens produced by a lexer built with lexmachine.

Here is an example for constructing a lexer Action which turns a machines.Match struct into a token using the scanners Token helper function.

func token(name string, tokenIds map[string]int) lex.Action {
    return func(s *lex.Scanner, m *machines.Match) (interface{}, error) {
        return s.Token(tokenIds[name], string(m.Bytes), m), nil
    }
}

func (*Token) Equals Uses

func (t *Token) Equals(other *Token) bool

Equals checks the equality of two tokens ignoring the Value field.

func (*Token) String Uses

func (t *Token) String() string

String formats the token in a human readable form.

Directories

PathSynopsis
dfa
frontendPackage frontend parses regular expressions and compiles them into NFA bytecode
inst
lexc
machinesPackage machines implements the lexing algorithms.
queue

Package lexmachine imports 6 packages (graph) and is imported by 2 packages. Updated 2018-03-25. Refresh now. Tools for package owners.