tokenizer

package
v0.0.0-...-c37ded0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2018 License: BSD-3-Clause Imports: 9 Imported by: 1

Documentation

Overview

Package gorilla/css/tokenizer generates tokens for a CSS3 input.

It follows the CSS3 specification located at:

http://www.w3.org/TR/css-syntax-3/#tokenizer-algorithms

To use it, create a new tokenizer for a given CSS input and call Next() until the token returned is a "stop token":

s := tokenizer.New(strings.NewReader(myCSS))
for {
	token := s.Next()
	if token.Type.StopToken() {
		break
	}
	// Do something with the token...
}

If the consumer wants to accept malformed input, use the following check instead:

token := s.Next()
if token.Type == tokenizer.TokenEOF || token.Type == tokenizer.TokenError {
	break
}

The three potential tokenization errors are a "bad-escape" (backslash-newline outside a "string" or url() in the input), a "bad-string" (unescaped newline inside a "string"), and a "bad-url" (a few different cases). Parsers can choose to abort when seeing one of these errors, or ignore the declaration and attempt to recover.

Returned tokens that carry extra information have a non-nil .Extra value. For TokenError, TokenBadEscape, TokenBadString, and TokenBadURI, the TokenExtraError type carries an `error` with informative text about the nature of the error. For TokenNumber, TokenPercentage, and TokenDimension, the TokenExtraNumeric specifies whether the number is integral, and for TokenDimension, contains the unit string (e.g. "px"). For TokenUnicodeRange, the TokenExtraUnicodeRange type contains the actual start and end values of the range.

Note: the tokenizer doesn't perform lexical analysis, it only implements Section 4 of the CSS Syntax Level 3 specification. See Section 5 for the parsing rules.

Index

Constants

View Source
const TokenChar = TokenDelim

backwards compatibility

Variables

TokenExtraTypeLookup provides a handy check for whether a given token type should contain extra data.

Functions

func Fuzz

func Fuzz(b []byte) int

Entry point for fuzz testing.

Types

type ParseError

type ParseError struct {
	Type    TokenType
	Message string
	Loc     int
}

ParseError represents a CSS syntax error.

func (*ParseError) Error

func (e *ParseError) Error() string

implements error

type Token

type Token struct {
	Type TokenType
	// A string representation of the token value that depends on the type.
	// For example, for a TokenURI, the Value is the URI itself.  For a
	// TokenPercentage, the Value is the number without the percent sign.
	Value string
	// Extra data for the token beyond a simple string.  Will always be a
	// pointer to a "TokenExtra*" type in this package.
	Extra TokenExtra
}

Token represents a token in the CSS syntax.

func (*Token) Render

func (t *Token) Render() string

Return the CSS source representation of the token. (Wrapper around WriteTo.)

func (*Token) WriteTo

func (t *Token) WriteTo(w io.Writer) (n int64, err error)

Write the CSS source representation of the token to the provided writer. If you are attempting to render a series of tokens, see the TokenRenderer type to handle comment insertion rules.

Tokens with type TokenError do not write anything.

type TokenExtra

type TokenExtra interface {
	String() string
}

TokenExtra fills the .Extra field of a token. Consumers should perform a type cast to the proper type to inspect its data.

type TokenExtraError

type TokenExtraError struct {
	Err error
}

TokenExtraError is attached to a TokenError and contains the same value as Tokenizer.Err(). See also the ParseError type and ParseError.Recoverable().

func (*TokenExtraError) Cause

func (e *TokenExtraError) Cause() error

Cause implements errors.Causer.

func (*TokenExtraError) Error

func (e *TokenExtraError) Error() string

Error implements error.

func (*TokenExtraError) ParseError

func (e *TokenExtraError) ParseError() *ParseError

Returns the ParseError object, if present.

func (*TokenExtraError) String

func (e *TokenExtraError) String() string

Returns Err.Error().

type TokenExtraHash

type TokenExtraHash struct {
	IsIdentifier bool
}

TokenExtraHash is attached to TokenHash.

func (*TokenExtraHash) String

func (e *TokenExtraHash) String() string

Returns a descriptive string, either "unrestricted" or "id".

type TokenExtraNumeric

type TokenExtraNumeric struct {
	// Value float64 // omitted from this implementation
	NonInteger bool
	Dimension  string
}

TokenExtraNumeric is attached to TokenNumber, TokenPercentage, and TokenDimension.

func (*TokenExtraNumeric) String

func (e *TokenExtraNumeric) String() string

Returns the Dimension field.

type TokenExtraUnicodeRange

type TokenExtraUnicodeRange struct {
	Start rune
	End   rune
}

TokenExtraUnicodeRange is attached to a TokenUnicodeRange.

func (*TokenExtraUnicodeRange) String

func (e *TokenExtraUnicodeRange) String() string

Returns a valid CSS representation of the token.

type TokenRenderer

type TokenRenderer struct {
	// contains filtered or unexported fields
}

TokenRenderer takes care of the comment insertion rules for serialization. This type is mostly intended for the fuzz test and not for general consumption, but it can be used by consumers that want to re-render a parse stream.

func (*TokenRenderer) WriteTokenTo

func (r *TokenRenderer) WriteTokenTo(w io.Writer, t Token) (n int64, err error)

Write a token to the given io.Writer, potentially inserting an empty comment in front based on what the previous token was.

type TokenType

type TokenType int

TokenType identifies the type of lexical tokens.

const (
	// Scanner flags.
	TokenError TokenType = iota
	TokenEOF

	// Tokens
	TokenIdent
	TokenFunction
	TokenURI
	TokenDelim // Single character
	TokenAtKeyword
	TokenString
	TokenS // Whitespace
	// CSS Syntax Level 3 removes comments from the token stream, but they are
	// preserved here.
	TokenComment

	// Extra data: TokenExtraHash
	TokenHash
	// Extra data: TokenExtraNumeric
	TokenNumber
	TokenPercentage
	TokenDimension
	// Extra data: TokenExtraUnicodeRange
	TokenUnicodeRange

	// Error tokens
	TokenBadString
	TokenBadURI
	TokenBadEscape // a '\' right before a newline

	// Fixed-string tokens
	TokenIncludes
	TokenDashMatch
	TokenPrefixMatch
	TokenSuffixMatch
	TokenSubstringMatch
	TokenColumn
	TokenColon
	TokenSemicolon
	TokenComma
	TokenOpenBracket
	TokenCloseBracket
	TokenOpenParen
	TokenCloseParen
	TokenOpenBrace
	TokenCloseBrace
	TokenCDO
	TokenCDC
)

The complete list of tokens in CSS Syntax Level 3.

func (TokenType) StopToken

func (t TokenType) StopToken() bool

Stop tokens are TokenError, TokenEOF, TokenBadEscape, TokenBadString, TokenBadURI. A consumer that does not want to tolerate parsing errors should stop parsing when this returns true.

func (TokenType) String

func (t TokenType) String() string

String returns a string representation of the token type.

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer scans an input and emits tokens following the CSS Syntax Level 3 specification.

func NewTokenizer

func NewTokenizer(r io.Reader) *Tokenizer

Construct a Tokenizer from the given input. Input need not be 'normalized' according to the spec (newlines changed to \n, zero bytes changed to U+FFFD).

func (*Tokenizer) Err

func (z *Tokenizer) Err() error

Err returns the last input reading error to be encountered. It is filled when TokenError is returned.

func (*Tokenizer) Next

func (z *Tokenizer) Next() Token

Scan for the next token and return it.

func (*Tokenizer) Scan

func (z *Tokenizer) Scan()

Scan for the next token. If the tokenizer is in an error state, no input will be consumed.

func (*Tokenizer) Token

func (z *Tokenizer) Token() Token

Get the most recently scanned token.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL