goawk: github.com/benhoyt/goawk/lexer Index | Examples | Files

package lexer

import "github.com/benhoyt/goawk/lexer"

Package lexer is an AWK lexer (tokenizer).

The lexer turns a string of AWK source code into a stream of tokens for parsing.

To tokenize some source, create a new lexer with NewLexer(src) and then call Scan() until the token type is EOF or ILLEGAL.

Code:

lexer := NewLexer([]byte(`$0 { print $1 }`))
for {
    pos, tok, val := lexer.Scan()
    if tok == EOF {
        break
    }
    fmt.Printf("%d:%d %s %q\n", pos.Line, pos.Column, tok, val)
}

Output:

1:1 $ ""
1:2 number "0"
1:4 { ""
1:6 print ""
1:12 $ ""
1:13 number "1"
1:15 } ""

Index

Examples

Package Files

lexer.go token.go

type Lexer Uses

type Lexer struct {
    // contains filtered or unexported fields
}

Lexer tokenizes a byte string of AWK source code. Use NewLexer to actually create a lexer, and Scan() or ScanRegex() to get tokens.

func NewLexer Uses

func NewLexer(src []byte) *Lexer

NewLexer creates a new lexer that will tokenize the given source code. See the module-level example for a working example.

func (*Lexer) HadSpace Uses

func (l *Lexer) HadSpace() bool

HadSpace returns true if the previously-scanned token had whitespace before it. Used by the parser because when calling a user-defined function the grammar doesn't allow a space between the function name and the left parenthesis.

func (*Lexer) Scan Uses

func (l *Lexer) Scan() (Position, Token, string)

Scan scans the next token and returns its position (line/column), token value (one of the uppercased token constants), and the string value of the token. For most tokens, the token value is empty. For NAME, NUMBER, STRING, and REGEX tokens, it's the token's value. For an ILLEGAL token, it's the error message.

func (*Lexer) ScanRegex Uses

func (l *Lexer) ScanRegex() (Position, Token, string)

ScanRegex parses an AWK regular expression in /slash/ syntax. The AWK grammar has somewhat special handling of regex tokens, so the parser can only call this after a DIV or DIV_ASSIGN token has just been scanned.

type Position Uses

type Position struct {
    // Line number of the token (starts at 1).
    Line int
    // Column on the line (starts at 1). Note that this is the byte
    // offset into the line, not rune offset.
    Column int
}

Position stores the source line and column where a token starts.

type Token Uses

type Token int

Token is the type of a single token.

const (
    ILLEGAL Token = iota
    EOF
    NEWLINE
    CONCAT // Not really a token, but used as an operator

    // Symbols
    ADD
    ADD_ASSIGN
    AND
    APPEND
    ASSIGN
    COLON
    COMMA
    DECR
    DIV
    DIV_ASSIGN
    DOLLAR
    EQUALS
    GTE
    GREATER
    INCR
    LBRACE
    LBRACKET
    LESS
    LPAREN
    LTE
    MATCH
    MOD
    MOD_ASSIGN
    MUL
    MUL_ASSIGN
    NOT_MATCH
    NOT
    NOT_EQUALS
    OR
    PIPE
    POW
    POW_ASSIGN
    QUESTION
    RBRACE
    RBRACKET
    RPAREN
    SEMICOLON
    SUB
    SUB_ASSIGN

    // Keywords
    BEGIN
    BREAK
    CONTINUE
    DELETE
    DO
    ELSE
    END
    EXIT
    FOR
    FUNCTION
    GETLINE
    IF
    IN
    NEXT
    PRINT
    PRINTF
    RETURN
    WHILE

    // Built-in functions
    F_ATAN2
    F_CLOSE
    F_COS
    F_EXP
    F_GSUB
    F_INDEX
    F_INT
    F_LENGTH
    F_LOG
    F_MATCH
    F_RAND
    F_SIN
    F_SPLIT
    F_SPRINTF
    F_SQRT
    F_SRAND
    F_SUB
    F_SUBSTR
    F_SYSTEM
    F_TOLOWER
    F_TOUPPER

    // Literals and names (variables and arrays)
    NAME
    NUMBER
    STRING
    REGEX

    LAST       = REGEX
    FIRST_FUNC = F_ATAN2
    LAST_FUNC  = F_TOUPPER
)

func KeywordToken Uses

func KeywordToken(name string) Token

KeywordToken returns the token associated with the given keyword string, or ILLEGAL if given name is not a keyword.

func (Token) String Uses

func (t Token) String() string

String returns the string name of this token.

Package lexer imports 1 packages (graph) and is imported by 4 packages. Updated 2019-02-12. Refresh now. Tools for package owners.