golexers

package module
v0.0.0-...-ac6c567 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 17, 2024 License: MIT Imports: 7 Imported by: 0

README

golexers

go package for lexing various languages enough to support search by word type and simpler syntax highlighting. The lexers themselves are built using the re2c https://re2c.org/ tool.

Generates pure go library so no issues with requiring cgo.

Is not as sophisticated as tree-sitter but hopefully good enough for highlighting and extracting words or other tokens for simple more text based indexing of code. No guarantees though there aren't misinterpreted cases with some languages.

Includes some simple command line tools: lex to just print tokens in the input file and format to generate html formatted output based on the output of this library.

Documentation

Index

Constants

View Source
const (
	STATE_POSSIBLEREGEX = STATE_CUSTOM // like normal but affects parsing '/'
	STATE_REGEX         = STATE_CUSTOM + 1
)
View Source
const (
	STRING_PROCESS = iota
	STRING_IGNORE
)
View Source
const (
	STATE_NORMAL = iota
	STATE_CHARLITERAL
	STATE_STRINGLITERAL
	STATE_RAWSTRINGLITERAL
	STATE_LONGSTRINGLITERAL
	STATE_EOLCOMMENT
	STATE_MLCOMMENT
	STATE_CUSTOM // keep last - first for a language specific state
)
View Source
const (
	IS_WORD    = 1 << 8
	IS_STRING  = 1 << 9
	IS_COMMENT = 1 << 10
)
View Source
const (
	INVALID      TokenType = -2
	END                    = -1
	KEYWORD                = 1 | IS_WORD
	KEYWORD_TYPE           = 2 | IS_WORD // a keyword (or known identifier) which is a type e.g. types in go
	IDENTIFIER             = 3 | IS_WORD
	BUILTIN                = 4 | IS_WORD // a built in function as used in go or perl
	PUNCTUATION            = 5
	LITERAL                = 6 // literal but not string or char
	CHARLITERAL            = 7
	STRING                 = 8 | IS_STRING // inside a string but not a word
	STRINGWORD             = 9 | IS_WORD | IS_STRING
	COMMENT                = 10 | IS_COMMENT // inside a commant but not a aord
	COMMENTWORD            = 11 | IS_WORD | IS_COMMENT
	NEWLINE                = 100
)
View Source
const (
	STATE_STARTTAG   = STATE_CUSTOM
	STATE_ATTRSTRING = STATE_CUSTOM + 1
)
View Source
const STATE_PERL_AFTER_END = STATE_CUSTOM
View Source
const (
	STATE_VERBATIMSTRING = STATE_CUSTOM
)

Variables

This section is empty.

Functions

func CanLex

func CanLex(filename string) bool

func Register

func Register(exts []string, lexFunc LexFunc)

func RegisterAlias

func RegisterAlias(alias string, ext string)

func TypeString

func TypeString(tt TokenType) string

Types

type Input

type Input struct {
	// contains filtered or unexported fields
}

type LexFunc

type LexFunc func(input *Input) TokenType

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

func NewLexer

func NewLexer(filename string, input []byte) *Lexer

func (*Lexer) Lex

func (lexer *Lexer) Lex() TokenType

func (*Lexer) Line

func (lexer *Lexer) Line() int

func (*Lexer) LineText

func (lexer *Lexer) LineText() []byte

func (*Lexer) Token

func (lexer *Lexer) Token() []byte

func (*Lexer) TokenPos

func (lexer *Lexer) TokenPos() (int, int)

func (*Lexer) TokenType

func (lexer *Lexer) TokenType() TokenType

type TokenType

type TokenType int

func (TokenType) IsComment

func (tt TokenType) IsComment() bool

func (TokenType) IsString

func (tt TokenType) IsString() bool

func (TokenType) IsWord

func (tt TokenType) IsWord() bool

Directories

Path Synopsis
cmd
lex

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL