tokenizer

package module

v1.4.0 Latest Latest Go to latest Published: Oct 23, 2023 License: MIT Imports: 12 Imported by: 3

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bzick/tokenizer

Links

Open Source Insights

README ¶

Tokenizer

Tokenizer — parse any string, slice or infinite buffer to any tokens.

Main features:

High performance.
No regexp.
Provides simple API.
Supports integer and float numbers.
Supports quoted string or other "framed" strings.
Supports injection in quoted or "framed" strings.
Supports unicode.
Customization of tokens.
Autodetect white space symbols.
Parse any data syntax (xml, json, yaml), any programming language.
Single pass through the data.
Parses infinite incoming data and don't panic.

Use cases:

Parsing html, xml, json, yaml and other text formats.
Parsing huge or infinite texts.
Parsing any programming languages.
Parsing templates.
Parsing formulas.

For example, parsing SQL WHERE condition user_id = 119 and modified > "2020-01-01 00:00:00" or amount >= 122.34:

// define custom tokens keys
const ( 
	TEquality = 1
	TDot      = 2
	TMath     = 3
)

// configure tokenizer
parser := tokenizer.New()
parser.DefineTokens(TEquality, []string{"<", "<=", "==", ">=", ">", "!="})
parser.DefineTokens(TDot, []string{"."})
parser.DefineTokens(TMath, []string{"+", "-", "/", "*", "%"})
parser.DefineStringToken(`"`, `"`).SetEscapeSymbol(tokenizer.BackSlash)

// create tokens stream
stream := parser.ParseString(`user_id = 119 and modified > "2020-01-01 00:00:00" or amount >= 122.34`)
defer stream.Close()

// iterate over each token
for stream.Valid() {
	if stream.CurrentToken().Is(tokenizer.TokenKeyword) {
		field := stream.NextToken().ValueString()
		// ... 
	}
	stream.Next()
}

stream tokens:

string:  user_id  =  119  and  modified  >  "2020-01-01 00:00:00"  or  amount  >=  122.34
tokens: |user_id| =| 119| and| modified| >| "2020-01-01 00:00:00"| or| amount| >=| 122.34|
        |   0   | 1|  2 |  3 |    4    | 5|            6         | 7 |    8  | 9 |    10 |

0:  {key: TokenKeyword, value: "user_id"}                token.Value()          == "user_id"
1:  {key: TEquality, value: "="}                         token.Value()          == "="
2:  {key: TokenInteger, value: "119"}                    token.ValueInt()       == 119
3:  {key: TokenKeyword, value: "and"}                    token.Value()          == "and"
4:  {key: TokenKeyword, value: "modified"}               token.Value()          == "modified"
5:  {key: TEquality, value: ">"}                         token.Value()          == ">"
6:  {key: TokenString, value: "\"2020-01-01 00:00:00\""} token.ValueUnescaped() == "2020-01-01 00:00:00"
7:  {key: TokenKeyword, value: "or"}                     token.Value()          == "and"
8:  {key: TokenKeyword, value: "amount"}                 token.Value()          == "amount"
9:  {key: TEquality, value: ">="}                        token.Value()          == ">="
10: {key: TokenFloat, value: "122.34"}                   token.ValueFloat()     == 122.34

More examples:

JSON parser

Begin

Create and parse

import (
    "github.com/bzick/tokenizer"
)

var parser := tokenizer.New()
parser.AllowKeywordUnderscore() // ... and other configuration code

There is two ways to parse string or slice:

parser.ParseString(str)
parser.ParseBytes(slice)

The package allows to parse an endless stream of data into tokens. For parsing, you need to pass io.Reader, from which data will be read (chunk-by-chunk):

fp, err := os.Open("data.json") // huge JSON file
// check fs, configure tokenizer ...

stream := parser.ParseStream(fp, 4096).SetHistorySize(10)
defer stream.Close()
for stream.IsValid() { 
	// ...
	stream.Next()
}

Embedded tokens

tokenizer.TokenUnknown — unspecified token key.
tokenizer.TokenKeyword — keyword, any combination of letters, including unicode letters.
tokenizer.TokenInteger — integer value
tokenizer.TokenFloat — float/double value
tokenizer.TokenString — quoted string
tokenizer.TokenStringFragment — fragment framed (quoted) string

Unknown token

A token marks as tokenizer.TokenUnknown if the parser detects an unknown token:

stream := parser.ParseString(`one!`)

stream: [
    {
        Key: tokenizer.TokenKeyword
        Value: "One"
    },
    {
        Key: tokenizer.TokenUnknown
        Value: "!"
    }
]

By default, TokenUnknown tokens are added to the stream. Setting tokenizer.StopOnUndefinedToken() stops parser when tokenizer.TokenUnknown appears in the stream.

stream := parser.ParseString(`one!`)

stream: [
    {
        Key: tokenizer.TokenKeyword
        Value: "one"
    }
]

Please note that if the StopOnUndefinedToken() setting is enabled, then the string may not be fully parsed. To find out that the string was not fully parsed, check the length of the parsed string stream.GetParsedLength() and the length of the original string.

Keywords

Any word that is not a custom token is stored in a single token as tokenizer.TokenKeyword.

The word can contain unicode characters, numbers (see tokenizer.AllowNumbersInKeyword()) and underscore (see tokenizer.AllowKeywordUnderscore ()).

stream := parser.ParseString(`one 二 три`)

stream: [
    {
        Key: tokenizer.TokenKeyword
        Value: "one"
    },
    {
        Key: tokenizer.TokenKeyword
        Value: "二"
    },
    {
        Key: tokenizer.TokenKeyword
        Value: "три"
    }
]

Keyword may be modified with tokenizer.AllowKeywordSymbols(majorSymbols, minorSymbols)

Major symbols (any quantity in the keyword) can be at the beginning, in the middle and in the end of the keyword.
Minor symbols (any quantity in the keyword) can be in the middle and in the end of the keyword.

parser.AllowKeywordSymbols(tokenizer.Underscore, tokenizer.Numbers)
// allows: "_one23", "__one2__two3"

parser.AllowKeywordSymbols([]rune{'_', '@'}, tokenizer.Numbers)
// allows: "one@23", "@_one_two23", "_one23", "_one2_two3", "@@one___two@_9"

Integer number

Any integer is stored as one token with key tokenizer.TokenInteger.

stream := parser.ParseString(`223 999`)

stream: [
    {
        Key: tokenizer.TokenInteger
        Value: "223"
    },
    {
        Key: tokenizer.TokenInteger
        Value: "999"
    },
]

To get int64 from the token value use stream.GetInt():

stream := tokenizer.ParseString("123")
fmt.Print("Token is %d", stream.CurrentToken().GetInt())  // Token is 123

Float number

Any float number is stored as one token with key tokenizer.TokenFloat. Float number may

have point, for example 1.2
have exponent, for example 1e6
have lower e or upper E letter in the exponent, for example 1E6, 1e6
have sign in the exponent, for example 1e-6, 1e6, 1e+6

stream := parser.ParseString(`1.3e-8`):

stream: [
    {
        Key: tokenizer.TokenFloat
        Value: "1.3e-8"
    },
]

To get float64 from the token value use token.GetFloat():

stream := parser.ParseString("1.3e2")
fmt.Print("Token is %d", stream.CurrentToken().GetFloat())  // Token is 130

Framed string

Strings that are framed with tokens are called framed strings. An obvious example is quoted a string like "one two". There quotes — edge tokens.

You can create and customize framed string through tokenizer.DefineStringToken():

const TokenDoubleQuotedString = 10
// ...
parser.DefineStringToken(TokenDoubleQuotedString, `"`, `"`).SetEscapeSymbol('\\')
// ...
stream := parser.ParseString(`"two \"three"`)

stream: [
    {
        Key: tokenizer.TokenString
        Value: "\"two \\"three\""
    },
]

To get a framed string without edge tokens and special characters, use the stream.ValueUnescape() method:

value := stream.CurrentToken().ValueUnescape() // result: two "three

The method token.StringKey() will be return token string key defined in the DefineStringToken:

if stream.CurrentToken().StringKey() == TokenDoubleQuotedString {
	// true
}

Injection in framed string

Strings can contain expression substitutions that can be parsed into tokens. For example "one {{two}} three". Fragments of strings before, between and after substitutions will be stored in tokens as tokenizer.TokenStringFragment.

const (
    TokenOpenInjection = 1
    TokenCloseInjection = 2
    TokenQuotedString = 3
)

parser := tokenizer.New()
parser.DefineTokens(TokenOpenInjection, []string{"{{"})
parser.DefineTokens(TokenCloseInjection, []string{"}}"})
parser.DefineStringToken(TokenQuotedString, `"`, `"`).AddInjection(TokenOpenInjection, TokenCloseInjection)

parser.ParseString(`"one {{ two }} three"`)

Tokens:

{
    {
        Key: tokenizer.TokenStringFragment,
        Value: "one"
    },
    {
        Key: TokenOpenInjection,
        Value: "{{"
    },
    {
        Key: tokenizer.TokenKeyword,
        Value: "two"
    },
    {
        Key: TokenCloseInjection,
        Value: "}}"
    },
    {
        Key: tokenizer.TokenStringFragment,
        Value: "three"
    },
}

Use cases:

parse templates
parse placeholders

User defined tokens

The new token can be defined via the DefineTokens() method:


const (
    TokenCurlyOpen    = 1
    TokenCurlyClose   = 2
    TokenSquareOpen   = 3
    TokenSquareClose  = 4
    TokenColon        = 5
    TokenComma        = 6
	TokenDoubleQuoted = 7
)

// json parser
parser := tokenizer.New()
parser.
	DefineTokens(TokenCurlyOpen, []string{"{"}).
	DefineTokens(TokenCurlyClose, []string{"}"}).
	DefineTokens(TokenSquareOpen, []string{"["}).
	DefineTokens(TokenSquareClose, []string{"]"}).
	DefineTokens(TokenColon, []string{":"}).
	DefineTokens(TokenComma, []string{","}).
	DefineStringToken(TokenDoubleQuoted, `"`, `"`).SetSpecialSymbols(tokenizer.DefaultStringEscapes)

stream := parser.ParseString(`{"key": [1]}`)

Known issues

zero-byte \0 ignores in the source string.

Benchmark

Parse string/bytes

pkg: tokenizer
cpu: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
BenchmarkParseBytes
    stream_test.go:251: Speed: 70 bytes string with 19.689µs: 3555284 byte/sec
    stream_test.go:251: Speed: 7000 bytes string with 848.163µs: 8253130 byte/sec
    stream_test.go:251: Speed: 700000 bytes string with 75.685945ms: 9248744 byte/sec
    stream_test.go:251: Speed: 11093670 bytes string with 1.16611538s: 9513355 byte/sec
BenchmarkParseBytes-8   	  158481	      7358 ns/op

Parse infinite stream

pkg: tokenizer
cpu: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
BenchmarkParseInfStream
    stream_test.go:226: Speed: 70 bytes at 33.826µs: 2069414 byte/sec
    stream_test.go:226: Speed: 7000 bytes at 627.357µs: 11157921 byte/sec
    stream_test.go:226: Speed: 700000 bytes at 27.675799ms: 25292856 byte/sec
    stream_test.go:226: Speed: 30316440 bytes at 1.18061702s: 25678471 byte/sec
BenchmarkParseInfStream-8   	  433092	      2726 ns/op
PASS

Documentation ¶

Index ¶

Constants
Variables
type QuoteInjectSettings
type Stream
- func NewInfStream(p *parsing) *Stream
- func NewStream(p *parsing) *Stream
type StringSettings
type Token
type TokenKey
type Tokenizer
- func New() *Tokenizer

Constants ¶

View Source

const BackSlash = '\\'

BackSlash just backslash byte

View Source

const DefaultChunkSize = 4096

DefaultChunkSize default chunk size for reader.

Variables ¶

View Source

var DefaultSpecialString = []string{
	"\\",
	"n",
	"r",
	"t",
}

DefaultSpecialString is default escaped symbols.

View Source

var DefaultStringEscapes = map[byte]byte{
	'n':  '\n',
	'r':  '\r',
	't':  '\t',
	'\\': '\\',
}

DefaultStringEscapes is default escaped symbols. Those symbols are often used everywhere. Deprecated: use DefaultSpecialString and AddSpecialStrings

View Source

var DefaultWhiteSpaces = []byte{' ', '\t', '\n', '\r'}

View Source

var Numbers = []rune{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}

View Source

var Underscore = []rune{'_'}

Functions ¶

This section is empty.

Types ¶

type QuoteInjectSettings ¶

type QuoteInjectSettings struct {
	// Token type witch opens quoted string.
	StartKey TokenKey
	// Token type witch closes quoted string.
	EndKey TokenKey
}

QuoteInjectSettings describes open injection token and close injection token.

type Stream ¶

type Stream struct {
	// contains filtered or unexported fields
}

Stream iterator via parsed tokens. If data reads from an infinite buffer then the iterator will be read data from reader chunk-by-chunk.

func NewInfStream ¶

func NewInfStream(p *parsing) *Stream

NewInfStream creates new stream with active parser.

func NewStream ¶

func NewStream(p *parsing) *Stream

NewStream creates new parsed stream of tokens.

func (*Stream) Close ¶

func (s *Stream) Close()

Close releases all token objects to pool

func (*Stream) CurrentToken ¶

func (s *Stream) CurrentToken() *Token

CurrentToken always returns the token. If the pointer is not valid (see IsValid) CurrentToken will be returns TokenUndef token. Do not save result (Token) into variables — current token may be changed at any time.

func (*Stream) GetParsedLength ¶

func (s *Stream) GetParsedLength() int

GetParsedLength returns currently count parsed bytes.

func (*Stream) GetSnippet ¶

func (s *Stream) GetSnippet(before, after int) []Token

GetSnippet returns slice of tokens. Slice generated from current token position and include tokens before and after current token.

func (*Stream) GetSnippetAsString ¶

func (s *Stream) GetSnippetAsString(before, after, maxStringLength int) string

GetSnippetAsString returns tokens before and after current token as string. `maxStringLength` specify max length of each token string. Zero — unlimited token string length. If string greater than maxLength method removes some runes in the middle of the string.

func (*Stream) GoNext ¶

func (s *Stream) GoNext() *Stream

GoNext moves stream pointer to the next token. If there is no token, it initiates the parsing of the next chunk of data. If there is no data, the pointer will point to the TokenUndef token.

func (*Stream) GoNextIfNextIs ¶

func (s *Stream) GoNextIfNextIs(key TokenKey, otherKeys ...TokenKey) bool

GoNextIfNextIs moves stream pointer to the next token if the next token has specific token keys. If keys matched pointer will be updated and method returned true. Otherwise, returned false.

func (*Stream) GoPrev ¶

func (s *Stream) GoPrev() *Stream

GoPrev moves pointer of stream to the next token. The number of possible calls is limited if you specified SetHistorySize. If the beginning of the stream or the end of the history is reached, the pointer will point to the TokenUndef token.

func (*Stream) GoTo ¶

func (s *Stream) GoTo(id int) *Stream

GoTo moves pointer of stream to specific token. The search is done by token ID.

func (*Stream) HeadToken ¶

func (s *Stream) HeadToken() *Token

HeadToken returns pointer to head-token. Head-token may be changed by parser if history size is enabled.

func (*Stream) IsAnyNextSequence ¶ added in v1.3.0

func (s *Stream) IsAnyNextSequence(keys ...[]TokenKey) bool

IsAnyNextSequence checks that at least one token from each group is contained in a sequence of tokens

func (*Stream) IsNextSequence ¶ added in v1.3.0

func (s *Stream) IsNextSequence(keys ...TokenKey) bool

IsNextSequence checks if these are next tokens in exactly the same sequence as specified.

func (*Stream) IsValid ¶

func (s *Stream) IsValid() bool

IsValid checks if stream is valid. This means that the pointer has not reached the end of the stream.

func (*Stream) NextToken ¶

func (s *Stream) NextToken() *Token

NextToken returns next token from the stream. If next token doesn't exist method return TypeUndef token. Do not save result (Token) into variables — next token may be changed at any time.

func (*Stream) PrevToken ¶

func (s *Stream) PrevToken() *Token

PrevToken returns previous token from the stream. If previous token doesn't exist method return TypeUndef token. Do not save result (Token) into variables — previous token may be changed at any time.

func (*Stream) SetHistorySize ¶

func (s *Stream) SetHistorySize(size int) *Stream

SetHistorySize sets the number of tokens that should remain after the current token

func (*Stream) String ¶

func (s *Stream) String() string

type StringSettings ¶

type StringSettings struct {
	Key          TokenKey
	StartToken   []byte
	EndToken     []byte
	EscapeSymbol byte
	SpecSymbols  [][]byte
	Injects      []QuoteInjectSettings
}

StringSettings describes framed(quoted) string tokens like quoted strings.

func (*StringSettings) AddInjection ¶

func (q *StringSettings) AddInjection(startTokenKey, endTokenKey TokenKey) *StringSettings

AddInjection configure injection in to string. Injection - parsable fragment of framed(quoted) string. Often used for parsing of placeholders or template's expressions in the framed string.

func (*StringSettings) AddSpecialStrings ¶ added in v1.4.0

func (q *StringSettings) AddSpecialStrings(special []string) *StringSettings

AddSpecialStrings set mapping of all escapable string for escape symbol, like \n, \t, \r.

func (*StringSettings) SetEscapeSymbol ¶

func (q *StringSettings) SetEscapeSymbol(symbol byte) *StringSettings

SetEscapeSymbol set escape symbol for framed(quoted) string. Escape symbol allows ignoring close token of framed string. Also escape symbol allows using special symbols in the frame strings, like \n, \t.

func (*StringSettings) SetSpecialSymbols ¶

func (q *StringSettings) SetSpecialSymbols(special map[byte]byte) *StringSettings

SetSpecialSymbols set mapping of all escapable symbols for escape symbol, like \n, \t, \r. Deprecated: use AddSpecialStrings

type Token ¶

type Token struct {
	// contains filtered or unexported fields
}

Token struct describe one token.

func (*Token) ID ¶

func (t *Token) ID() int

ID returns id of token. Id is the sequence number of tokens in the stream.

func (*Token) Indent ¶

func (t *Token) Indent() []byte

Indent returns spaces before the token.

func (*Token) Is ¶

func (t *Token) Is(key TokenKey, keys ...TokenKey) bool

Is checks if the token has any of these keys.

func (*Token) IsFloat ¶

func (t *Token) IsFloat() bool

IsFloat checks if this token is float — the key is TokenFloat.

func (*Token) IsInteger ¶

func (t *Token) IsInteger() bool

IsInteger checks if this token is integer — the key is TokenInteger.

func (*Token) IsKeyword ¶

func (t *Token) IsKeyword() bool

IsKeyword checks if this is keyword — the key is TokenKeyword.

func (*Token) IsNumber ¶

func (t *Token) IsNumber() bool

IsNumber checks if this token is integer or float — the key is TokenInteger or TokenFloat.

func (*Token) IsString ¶

func (t *Token) IsString() bool

IsString checks if current token is a quoted string. Token key may be TokenString or TokenStringFragment.

func (*Token) IsValid ¶

func (t *Token) IsValid() bool

IsValid checks if this token is valid — the key is not TokenUndef.

func (*Token) Key ¶

func (t *Token) Key() TokenKey

Key returns the key of the token pointed to by the pointer. If pointer is not valid (see IsValid) TokenUndef will be returned.

func (*Token) Line ¶

func (t *Token) Line() int

Line returns line number in input string. Line numbers starts from 1.

func (*Token) Offset ¶

func (t *Token) Offset() int

Offset returns the byte position in input string (from start).

func (*Token) String ¶

func (t *Token) String() string

String returns a multiline string with the token's information.

func (*Token) StringKey ¶

func (t *Token) StringKey() TokenKey

StringKey returns key of string. If key not defined for string TokenString will be returned.

func (*Token) StringSettings ¶

func (t *Token) StringSettings() *StringSettings

StringSettings returns StringSettings structure if token is framed string.

func (*Token) Value ¶

func (t *Token) Value() []byte

Value returns value of current token as slice of bytes from source. If current token is invalid value returns nil.

Do not change bytes in the slice. Copy slice before change.

func (*Token) ValueFloat deprecated

func (t *Token) ValueFloat() float64

Deprecated: use ValueFloat64

func (*Token) ValueFloat64 ¶ added in v1.4.0

func (t *Token) ValueFloat64() float64

ValueFloat64 returns value as float64. If the token is not TokenInteger or TokenFloat then method returns zero. Method doesn't use cache — each call starts a number parser.

func (*Token) ValueInt deprecated

func (t *Token) ValueInt() int64

Deprecated: use ValueInt64

func (*Token) ValueInt64 ¶ added in v1.4.0

func (t *Token) ValueInt64() int64

ValueInt64 returns value as int64. If the token is float the result wild be round by math's rules. If the token is not TokenInteger or TokenFloat then method returns zero Method doesn't use cache — each call starts a number parser.

func (*Token) ValueString ¶

func (t *Token) ValueString() string

ValueString returns value of the token as string. If the token is TokenUndef method returns empty string.

func (*Token) ValueUnescaped ¶

func (t *Token) ValueUnescaped() []byte

ValueUnescaped returns clear (unquoted) string

without edge-tokens (quotes)
with character escaping handling

For example quoted string

"one \"two\"\t three"

transforms to

one "two"		three

Method doesn't use cache. Each call starts a string parser.

func (*Token) ValueUnescapedString ¶

func (t *Token) ValueUnescapedString() string

ValueUnescapedString like as ValueUnescaped but returns string.

type TokenKey ¶

type TokenKey int

TokenKey token type identifier

const (
	// TokenUnknown means that this token not embedded token and not user defined.
	TokenUnknown TokenKey = -6
	// TokenStringFragment means that this is only fragment of quoted string with injections
	// For example, "one {{ two }} three", where "one " and " three" — TokenStringFragment
	TokenStringFragment TokenKey = -5
	// TokenString means than this token is quoted string.
	// For example, "one two"
	TokenString TokenKey = -4
	// TokenFloat means that this token is float number with point and/or exponent.
	// For example, 1.2, 1e6, 1E-6
	TokenFloat TokenKey = -3
	// TokenInteger means that this token is integer number.
	// For example, 3, 49983
	TokenInteger TokenKey = -2
	// TokenKeyword means that this token is word.
	// For example, one, two, три
	TokenKeyword TokenKey = -1
	// TokenUndef means that token doesn't exist.
	// Then stream out of range of token list any getter or checker will return TokenUndef token.
	TokenUndef TokenKey = 0
)

type Tokenizer ¶

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer stores all tokens configuration and behaviors.

func New ¶

func New() *Tokenizer

New creates new tokenizer.

func (*Tokenizer) AllowKeywordSymbols ¶ added in v1.4.0

func (t *Tokenizer) AllowKeywordSymbols(majorSymbols []rune, minorSymbols []rune) *Tokenizer

AllowKeywordSymbols sets major and minor symbols for keywords. Major symbols (any quantity) might be in begin, in middle and in the end of keyword. Minor symbols (any quantity) might be in middle and in the end of the keyword.

parser.AllowKeywordSymbols(tokenizer.Underscore, tokenizer.Numbers)
// allows: "_one23", "__one2__two3"
parser.AllowKeywordSymbols([]rune{'_', '@'}, tokenizer.Numbers)
// allows: "one@23", "@_one_two23", "_one23", "_one2_two3", "@@one___two@_9"

Beware, the tokenizer does not control consecutive duplicates of these runes.

func (*Tokenizer) AllowKeywordUnderscore ¶

func (t *Tokenizer) AllowKeywordUnderscore() *Tokenizer

AllowKeywordUnderscore allows underscore symbol in keywords, like `one_two` or `_three` Deprecated: use AllowKeywordSymbols

func (*Tokenizer) AllowNumberUnderscore ¶ added in v1.4.0

func (t *Tokenizer) AllowNumberUnderscore() *Tokenizer

AllowNumberUnderscore allows underscore symbol in numbers, like `1_000`

func (*Tokenizer) AllowNumbersInKeyword ¶

func (t *Tokenizer) AllowNumbersInKeyword() *Tokenizer

AllowNumbersInKeyword allows numbers in keywords, like `one1` or `r2d2` The method allows numbers in keywords, but the keyword itself must not start with a number. There should be no spaces between letters and numbers. Deprecated: use AllowKeywordSymbols

func (*Tokenizer) DefineStringToken ¶

func (t *Tokenizer) DefineStringToken(key TokenKey, startToken, endToken string) *StringSettings

DefineStringToken defines a token string. For example, a piece of data surrounded by quotes: "string in quotes" or 'string on sigle quotes'. Arguments startToken and endToken defines open and close "quotes".

`t.DefineStringToken("`", "`")` - parse string "one `two three`" will be parsed as [{key: TokenKeyword, value: "one"}, {key: TokenString, value: "`two three`"}]
`t.DefineStringToken("//", "\n")` - parse string "parse // like comment\n" will be parsed as [{key: TokenKeyword, value: "parse"}, {key: TokenString, value: "// like comment"}]

func (*Tokenizer) DefineTokens ¶

func (t *Tokenizer) DefineTokens(key TokenKey, tokens []string) *Tokenizer

DefineTokens add custom token. There `key` unique is identifier of `tokens`, `tokens` — slice of string of tokens. If key already exists tokens will be rewritten.

func (*Tokenizer) ParseBytes ¶

func (t *Tokenizer) ParseBytes(str []byte) *Stream

ParseBytes parse and convert slice of bytes into stream of tokens.

func (*Tokenizer) ParseStream ¶

func (t *Tokenizer) ParseStream(r io.Reader, bufferSize uint) *Stream

ParseStream parse and convert infinite stream of bytes into infinite stream of tokens.

func (*Tokenizer) ParseString ¶

func (t *Tokenizer) ParseString(str string) *Stream

ParseString parse string into stream of tokens.

func (*Tokenizer) SetWhiteSpaces ¶

func (t *Tokenizer) SetWhiteSpaces(ws []byte) *Tokenizer

SetWhiteSpaces sets custom whitespace symbols between tokens. By default: `{' ', '\t', '\n', '\r'}`

func (*Tokenizer) StopOnUndefinedToken ¶

func (t *Tokenizer) StopOnUndefinedToken() *Tokenizer

StopOnUndefinedToken stops parsing if unknown token detected.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL