tokenize

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2023 License: BSD-2-Clause Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsDecimalDigit

func IsDecimalDigit(r rune) bool

func IsKeyword

func IsKeyword(r []rune) bool

Types

type ScanState

type ScanState struct {
	Input  []rune
	Cursor int
}

ScanState represents a state for scanning and processing a sequence of runes. It contains the Input slice, which holds the runes to be scanned, and the Cursor indicating the current position in the Input slice.

func (ScanState) CountWhile

func (s ScanState) CountWhile(begin int, satisfy func(rune) bool) int

CountWhile counts the number of runes that satisfy the given function 'satisfy', starting from the 'begin' position from the current Cursor position. The counting stops as soon as a rune that does not satisfy the condition is encountered. 'begin' must be in [0, s.Len()).

func (ScanState) FindFirst

func (s ScanState) FindFirst(begin int, patternSize int, pattern func([]rune) bool) (int, bool)

FindFirst searches for the first occurrence of a pattern with the given 'patternSize' using the 'pattern' function, starting from the 'begin' position from the current Cursor position. It returns the index of the first occurrence and a boolean indicating if the pattern was found. If the pattern is not found, it returns the index 's.Len()' and 'false'. 'begin' must be in [0, s.Len()).

func (ScanState) Len

func (s ScanState) Len() int

Len returns the remaining number of runes in the Input slice from the current Cursor position.

func (ScanState) PeekAt

func (s ScanState) PeekAt(offset int) rune

PeekAt returns the rune at a given relative position 'offset' from the current Cursor position. 'offset' must be in [0, s.Len()).

func (ScanState) PeekSlice

func (s ScanState) PeekSlice(begin int, endExclusive int) []rune

PeekSlice returns a slice of runes starting from 'begin' to 'endExclusive' positions from the current Cursor position. 'begin' must be in [0, s.Len()). 'endExclusive' must be in [0, s.Len()].

type Token

type Token struct {
	Kind    TokenKind
	Content []rune

	Begin  int
	End    int
	Line   int
	Column int
}

Token represents a single token identified by the token scanner. The Token struct is used to represent identified tokens during the tokenization process. It contains information about the type and location of the token in the source code. A valid token has its TokenKind set to a specific type (not TokenUnspecified). It contains the following fields: - Kind: The TokenKind representing the type of the token. - Content: The content or value of the token. - Begin: The starting position (index) of the token in the input sequence. - End: The ending position (index) of the token in the input sequence. - Line: The line number where the token starts in the input source. - Column: The column number where the token starts in the input source.

func Tokenize

func Tokenize(input []rune) ([]Token, error)

Tokenize returns a slice of Token representing the identified tokens in the input sequence. If an error occurs during tokenization, the function returns an error with a message indicating the failure to tokenize.

func (Token) IsValid

func (t Token) IsValid() bool

IsValid checks if the token is valid, i.e., its TokenKind is not TokenUnspecified. It returns true if the token is valid and false otherwise.

type TokenKind

type TokenKind int

TokenKind represents the type of token identified by the token scanner.

const (
	// TokenUnspecified represents an unspecified or unknown token.
	TokenUnspecified TokenKind = iota
	// TokenEOF represents the end of the file (EOF) token.
	TokenEOF
	// TokenSpace represents a space token.
	TokenSpace
	// TokenComment represents a comment token.
	TokenComment
	// TokenIdentifier represents an identifier token.
	TokenIdentifier
	// TokenIdentifierQuoted represents a quoted identifier token.
	TokenIdentifierQuoted
	// TokenLiteralQuoted represents a quoted literal (string) token.
	TokenLiteralQuoted
	// TokenLiteralInteger represents an integer literal token.
	TokenLiteralInteger
	// TokenLiteralFloat represents a floating-point literal token.
	TokenLiteralFloat
	// TokenKeyword represents a keyword token.
	TokenKeyword
	// TokenSpecialChar represents a special character token.
	TokenSpecialChar
)

func Comment

func Comment(s *ScanState) (int, TokenKind, error)

Comment scans the input sequence represented by the ScanState 's' to identify and handle comments. It returns the count of runes in the scanned comment token and the corresponding TokenKind. If no comments are found at the current Cursor position, the function returns 0 for the count and TokenUnspecified for the TokenKind. If the comment starts with '#' and extends to the end of the line, the function returns the count of runes up to the newline character. If the comment starts with '//' or '--' and extends to the end of the line, the function returns the count of runes up to the newline character. If the comment starts with '/*' and ends with '*/', the function returns the count of runes up to the closing '*/' sequence. If the comment is not properly terminated with '*/', the function returns an error with a message indicating an incomplete comment.

func IdentifierOrKeyword

func IdentifierOrKeyword(s *ScanState) (int, TokenKind, error)

IdentifierOrKeyword scans the input sequence represented by the ScanState 's' to identify and handle identifiers or keywords. It returns the count of runes in the scanned identifier or keyword, the corresponding TokenKind, and an error if any occurs during processing. If no identifier or keyword is found at the current Cursor position, the function returns 0 for the count, TokenUnspecified for the TokenKind, and nil for the error. If the scanned token is a keyword, the function returns the count of runes in the scanned keyword and TokenKind TokenKeyword. If the scanned token is an identifier, the function returns the count of runes in the scanned identifier and TokenKind TokenIdentifier.

func IdentifierQuoted

func IdentifierQuoted(s *ScanState) (int, TokenKind, error)

IdentifierQuoted scans the input sequence represented by the ScanState 's' to identify and handle quoted identifiers enclosed within back quotes (`). It returns the count of runes in the scanned quoted identifier token and the corresponding TokenKind. If no quoted identifier is found at the current Cursor position, the function returns 0 for the count and TokenUnspecified for the TokenKind. If the quoted identifier is empty (two consecutive backticks), the function returns an error indicating an empty quoted identifier. If the quoted identifier is not properly enclosed within backticks, the function returns an error with a message indicating an invalid quoted identifier.

func LiteralQuoted

func LiteralQuoted(s *ScanState) (int, TokenKind, error)

LiteralQuoted scans the input sequence represented by the ScanState 's' to identify and handle quoted literals (strings or bytes) with optional prefixes. It returns the count of runes in the scanned quoted literal, the corresponding TokenKind, and an error if any occurs during processing. If no quoted literal is found at the current Cursor position, the function returns 0 for the count, TokenUnspecified for the TokenKind, and nil for the error.

func NumberOrDot

func NumberOrDot(s *ScanState) (int, TokenKind, error)

NumberOrDot scans the input sequence represented by the ScanState 's' to identify and handle numbers or the dot (.) operator. It returns the count of runes in the scanned number or dot operator, the corresponding TokenKind, and an error if any occurs during processing. If no number or dot operator is found at the current Cursor position, the function returns 0 for the count, TokenUnspecified for the TokenKind, and nil for the error. The function recognizes hexadecimal integers (starting with "0x"), decimals (with or without a decimal point), and floating-point numbers (with or without an exponent using 'e' or 'E'). If the scanned token is the dot (.) operator, the function returns 1 for the count and TokenKind TokenSpecialChar. If the scanned token is an integer (either decimal or hexadecimal), the function returns the count of runes in the scanned integer and TokenKind TokenLiteralInteger. If the scanned token is a floating-point number, the function returns the count of runes in the scanned number and TokenKind TokenLiteralFloat.

func Spaces

func Spaces(s *ScanState) (int, TokenKind, error)

Spaces scans the input sequence represented by the ScanState 's' to find the number of consecutive space runes at the current Cursor position. It returns the count of runes in the scanned space token and the corresponding TokenKind. If no spaces are found at the current Cursor position, the function returns 0 for the count and TokenUnspecified for the TokenKind. If an error occurs during processing, it will be returned as the third value, which will be nil in this implementation.

func SpecialChar

func SpecialChar(s *ScanState) (int, TokenKind, error)

SpecialChar scans the input sequence represented by the ScanState 's' to identify and handle special characters. It returns the count of runes in the scanned special character, the corresponding TokenKind, and an error if any occurs during processing. If no special character is found at the current Cursor position, the function returns 0 for the count, TokenUnspecified for the TokenKind, and nil for the error. If the scanned token is a dot (.) character followed by a decimal digit, the function returns 0 for the count, TokenUnspecified for the TokenKind, and nil. For all other cases, where the current rune represents a standalone special character, the function returns 1 for the count and TokenKind TokenSpecialChar.

func (TokenKind) String

func (i TokenKind) String() string

type TokenScanner

type TokenScanner struct {
	ScanState
	// contains filtered or unexported fields
}

TokenScanner provides a tokenizer for processing a sequence of runes and identifying different types of tokens.

func (*TokenScanner) Init

func (s *TokenScanner) Init(input []rune)

func (*TokenScanner) ScanNext

func (s *TokenScanner) ScanNext() (Token, error)

ScanNext scans the next token in the input sequence and returns the Token and an error if any occurs during processing. If the end of the input sequence is reached, the method returns a special Token with TokenKind TokenEOF to indicate the end of the file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL