tokenizer

package

v0.0.0-...-c37ded0 Latest Latest Go to latest Published: Mar 25, 2018 License: BSD-3-Clause Imports: 9 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/riking/cssparse

Links

Open Source Insights

Documentation ¶

Overview ¶

Package gorilla/css/tokenizer generates tokens for a CSS3 input.

It follows the CSS3 specification located at:

http://www.w3.org/TR/css-syntax-3/#tokenizer-algorithms

To use it, create a new tokenizer for a given CSS input and call Next() until the token returned is a "stop token":

s := tokenizer.New(strings.NewReader(myCSS))
for {
	token := s.Next()
	if token.Type.StopToken() {
		break
	}
	// Do something with the token...
}

If the consumer wants to accept malformed input, use the following check instead:

token := s.Next()
if token.Type == tokenizer.TokenEOF || token.Type == tokenizer.TokenError {
	break
}

The three potential tokenization errors are a "bad-escape" (backslash-newline outside a "string" or url() in the input), a "bad-string" (unescaped newline inside a "string"), and a "bad-url" (a few different cases). Parsers can choose to abort when seeing one of these errors, or ignore the declaration and attempt to recover.

Returned tokens that carry extra information have a non-nil .Extra value. For TokenError, TokenBadEscape, TokenBadString, and TokenBadURI, the TokenExtraError type carries an `error` with informative text about the nature of the error. For TokenNumber, TokenPercentage, and TokenDimension, the TokenExtraNumeric specifies whether the number is integral, and for TokenDimension, contains the unit string (e.g. "px"). For TokenUnicodeRange, the TokenExtraUnicodeRange type contains the actual start and end values of the range.

Note: the tokenizer doesn't perform lexical analysis, it only implements Section 4 of the CSS Syntax Level 3 specification. See Section 5 for the parsing rules.

Index ¶

Constants
Variables
func Fuzz(b []byte) int
type ParseError
- func (e *ParseError) Error() string
type Token
- func (t *Token) Render() string
- func (t *Token) WriteTo(w io.Writer) (n int64, err error)
type TokenExtra
type TokenExtraError
- func (e *TokenExtraError) Cause() error
- func (e *TokenExtraError) Error() string
- func (e *TokenExtraError) ParseError() *ParseError
- func (e *TokenExtraError) String() string
type TokenExtraHash
- func (e *TokenExtraHash) String() string
type TokenExtraNumeric
- func (e *TokenExtraNumeric) String() string
type TokenExtraUnicodeRange
- func (e *TokenExtraUnicodeRange) String() string
type TokenRenderer
- func (r *TokenRenderer) WriteTokenTo(w io.Writer, t Token) (n int64, err error)
type TokenType
- func (t TokenType) StopToken() bool
- func (t TokenType) String() string
type Tokenizer
- func NewTokenizer(r io.Reader) *Tokenizer
- func (z *Tokenizer) Err() error
- func (z *Tokenizer) Next() Token
- func (z *Tokenizer) Scan()
- func (z *Tokenizer) Token() Token

Constants ¶

View Source

const TokenChar = TokenDelim

backwards compatibility

Variables ¶

View Source

var TokenExtraTypeLookup = map[TokenType]TokenExtra{
	TokenError:        &TokenExtraError{},
	TokenBadEscape:    &TokenExtraError{},
	TokenBadString:    &TokenExtraError{},
	TokenBadURI:       &TokenExtraError{},
	TokenHash:         &TokenExtraHash{},
	TokenNumber:       &TokenExtraNumeric{},
	TokenPercentage:   &TokenExtraNumeric{},
	TokenDimension:    &TokenExtraNumeric{},
	TokenUnicodeRange: &TokenExtraUnicodeRange{},
}

TokenExtraTypeLookup provides a handy check for whether a given token type should contain extra data.

Functions ¶

func Fuzz ¶

func Fuzz(b []byte) int

Entry point for fuzz testing.

Types ¶

type ParseError ¶

type ParseError struct {
	Type    TokenType
	Message string
	Loc     int
}

ParseError represents a CSS syntax error.

func (*ParseError) Error ¶

func (e *ParseError) Error() string

implements error

type Token ¶

type Token struct {
	Type TokenType
	// A string representation of the token value that depends on the type.
	// For example, for a TokenURI, the Value is the URI itself.  For a
	// TokenPercentage, the Value is the number without the percent sign.
	Value string
	// Extra data for the token beyond a simple string.  Will always be a
	// pointer to a "TokenExtra*" type in this package.
	Extra TokenExtra
}

Token represents a token in the CSS syntax.

func (*Token) Render ¶

func (t *Token) Render() string

Return the CSS source representation of the token. (Wrapper around WriteTo.)

func (*Token) WriteTo ¶

func (t *Token) WriteTo(w io.Writer) (n int64, err error)

Write the CSS source representation of the token to the provided writer. If you are attempting to render a series of tokens, see the TokenRenderer type to handle comment insertion rules.

Tokens with type TokenError do not write anything.

type TokenExtra ¶

type TokenExtra interface {
	String() string
}

TokenExtra fills the .Extra field of a token. Consumers should perform a type cast to the proper type to inspect its data.

type TokenExtraError ¶

type TokenExtraError struct {
	Err error
}

TokenExtraError is attached to a TokenError and contains the same value as Tokenizer.Err(). See also the ParseError type and ParseError.Recoverable().

func (*TokenExtraError) Cause ¶

func (e *TokenExtraError) Cause() error

Cause implements errors.Causer.

func (*TokenExtraError) Error ¶

func (e *TokenExtraError) Error() string

Error implements error.

func (*TokenExtraError) ParseError ¶

func (e *TokenExtraError) ParseError() *ParseError

Returns the ParseError object, if present.

func (*TokenExtraError) String ¶

func (e *TokenExtraError) String() string

Returns Err.Error().

type TokenExtraHash ¶

type TokenExtraHash struct {
	IsIdentifier bool
}

TokenExtraHash is attached to TokenHash.

func (*TokenExtraHash) String ¶

func (e *TokenExtraHash) String() string

Returns a descriptive string, either "unrestricted" or "id".

type TokenExtraNumeric ¶

type TokenExtraNumeric struct {
	// Value float64 // omitted from this implementation
	NonInteger bool
	Dimension  string
}

TokenExtraNumeric is attached to TokenNumber, TokenPercentage, and TokenDimension.

func (*TokenExtraNumeric) String ¶

func (e *TokenExtraNumeric) String() string

Returns the Dimension field.

type TokenExtraUnicodeRange ¶

type TokenExtraUnicodeRange struct {
	Start rune
	End   rune
}

TokenExtraUnicodeRange is attached to a TokenUnicodeRange.

func (*TokenExtraUnicodeRange) String ¶

func (e *TokenExtraUnicodeRange) String() string

Returns a valid CSS representation of the token.

type TokenRenderer ¶

type TokenRenderer struct {
	// contains filtered or unexported fields
}

TokenRenderer takes care of the comment insertion rules for serialization. This type is mostly intended for the fuzz test and not for general consumption, but it can be used by consumers that want to re-render a parse stream.

func (*TokenRenderer) WriteTokenTo ¶

func (r *TokenRenderer) WriteTokenTo(w io.Writer, t Token) (n int64, err error)

Write a token to the given io.Writer, potentially inserting an empty comment in front based on what the previous token was.

type TokenType ¶

type TokenType int

TokenType identifies the type of lexical tokens.

const (
	// Scanner flags.
	TokenError TokenType = iota
	TokenEOF

	// Tokens
	TokenIdent
	TokenFunction
	TokenURI
	TokenDelim // Single character
	TokenAtKeyword
	TokenString
	TokenS // Whitespace
	// CSS Syntax Level 3 removes comments from the token stream, but they are
	// preserved here.
	TokenComment

	// Extra data: TokenExtraHash
	TokenHash
	// Extra data: TokenExtraNumeric
	TokenNumber
	TokenPercentage
	TokenDimension
	// Extra data: TokenExtraUnicodeRange
	TokenUnicodeRange

	// Error tokens
	TokenBadString
	TokenBadURI
	TokenBadEscape // a '\' right before a newline

	// Fixed-string tokens
	TokenIncludes
	TokenDashMatch
	TokenPrefixMatch
	TokenSuffixMatch
	TokenSubstringMatch
	TokenColumn
	TokenColon
	TokenSemicolon
	TokenComma
	TokenOpenBracket
	TokenCloseBracket
	TokenOpenParen
	TokenCloseParen
	TokenOpenBrace
	TokenCloseBrace
	TokenCDO
	TokenCDC
)

The complete list of tokens in CSS Syntax Level 3.

func (TokenType) StopToken ¶

func (t TokenType) StopToken() bool

Stop tokens are TokenError, TokenEOF, TokenBadEscape, TokenBadString, TokenBadURI. A consumer that does not want to tolerate parsing errors should stop parsing when this returns true.

func (TokenType) String ¶

func (t TokenType) String() string

String returns a string representation of the token type.

type Tokenizer ¶

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer scans an input and emits tokens following the CSS Syntax Level 3 specification.

func NewTokenizer ¶

func NewTokenizer(r io.Reader) *Tokenizer

Construct a Tokenizer from the given input. Input need not be 'normalized' according to the spec (newlines changed to \n, zero bytes changed to U+FFFD).

func (*Tokenizer) Err ¶

func (z *Tokenizer) Err() error

Err returns the last input reading error to be encountered. It is filled when TokenError is returned.

func (*Tokenizer) Next ¶

func (z *Tokenizer) Next() Token

Scan for the next token and return it.

func (*Tokenizer) Scan ¶

func (z *Tokenizer) Scan()

Scan for the next token. If the tokenizer is in an error state, no input will be consumed.

func (*Tokenizer) Token ¶

func (z *Tokenizer) Token() Token

Get the most recently scanned token.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL