parse

package module
v2.7.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2024 License: MIT Imports: 12 Imported by: 103

README

Parse API reference Go Report Card Coverage Status Donate

This package contains several lexers and parsers written in Go. All subpackages are built to be streaming, high performance and to be in accordance with the official (latest) specifications.

The lexers are implemented using buffer.Lexer in https://github.com/tdewolff/parse/buffer and the parsers work on top of the lexers. Some subpackages have hashes defined (using Hasher) that speed up common byte-slice comparisons.

Buffer

Reader

Reader is a wrapper around a []byte that implements the io.Reader interface. It is comparable to bytes.Reader but has slightly different semantics (and a slightly smaller memory footprint).

Writer

Writer is a buffer that implements the io.Writer interface and expands the buffer as needed. The reset functionality allows for better memory reuse. After calling Reset, it will overwrite the current buffer and thus reduce allocations.

Lexer

Lexer is a read buffer specifically designed for building lexers. It keeps track of two positions: a start and end position. The start position is the beginning of the current token being parsed, the end position is being moved forward until a valid token is found. Calling Shift will collapse the positions to the end and return the parsed []byte.

Moving the end position can go through Move(int) which also accepts negative integers. One can also use Pos() int to try and parse a token, and if it fails rewind with Rewind(int), passing the previously saved position.

Peek(int) byte will peek forward (relative to the end position) and return the byte at that location. PeekRune(int) (rune, int) returns UTF-8 runes and its length at the given byte position. Upon an error Peek will return 0, the user must peek at every character and not skip any, otherwise it may skip a 0 and panic on out-of-bounds indexing.

Lexeme() []byte will return the currently selected bytes, Skip() will collapse the selection. Shift() []byte is a combination of Lexeme() []byte and Skip().

When the passed io.Reader returned an error, Err() error will return that error even if not at the end of the buffer.

StreamLexer

StreamLexer behaves like Lexer but uses a buffer pool to read in chunks from io.Reader, retaining old buffers in memory that are still in use, and re-using old buffers otherwise. Calling Free(n int) frees up n bytes from the internal buffer(s). It holds an array of buffers to accommodate for keeping everything in-memory. Calling ShiftLen() int returns the number of bytes that have been shifted since the previous call to ShiftLen, which can be used to specify how many bytes need to be freed up from the buffer. If you don't need to keep returned byte slices around, call Free(ShiftLen()) after every Shift call.

Strconv

This package contains string conversion function much like the standard library's strconv package, but it is specifically tailored for the performance needs within the minify package.

For example, the floating-point to string conversion function is approximately twice as fast as the standard library, but it is not as precise.

CSS

This package is a CSS3 lexer and parser. Both follow the specification at CSS Syntax Module Level 3. The lexer takes an io.Reader and converts it into tokens until the EOF. The parser returns a parse tree of the full io.Reader input stream, but the low-level Next function can be used for stream parsing to returns grammar units until the EOF.

See README here.

HTML

This package is an HTML5 lexer. It follows the specification at The HTML syntax. The lexer takes an io.Reader and converts it into tokens until the EOF.

See README here.

JS

This package is a JS lexer (ECMA-262, edition 6.0). It follows the specification at ECMAScript Language Specification. The lexer takes an io.Reader and converts it into tokens until the EOF.

See README here.

JSON

This package is a JSON parser (ECMA-404). It follows the specification at JSON. The parser takes an io.Reader and converts it into tokens until the EOF.

See README here.

SVG

This package contains common hashes for SVG1.1 tags and attributes.

XML

This package is an XML1.0 lexer. It follows the specification at Extensible Markup Language (XML) 1.0 (Fifth Edition). The lexer takes an io.Reader and converts it into tokens until the EOF.

See README here.

License

Released under the MIT license.

Documentation

Overview

Package parse contains a collection of parsers for various formats in its subpackages.

Index

Constants

This section is empty.

Variables

View Source
var DataURIEncodingTable = [256]bool{}/* 256 elements not displayed */

DataURIEncodingTable is a charmap for which characters need escaping in the Data URI encoding scheme Escape only non-printable characters, unicode and %, #, &. IE11 additionally requires encoding of \, [, ], ", <, >, `, {, }, |, ^ which is not required by Chrome, Firefox, Opera, Edge, Safari, Yandex To pass the HTML validator, restricted URL characters must be escaped: non-printable characters, space, <, >, #, %, "

View Source
var ErrBadDataURI = errors.New("not a data URI")

ErrBadDataURI is returned by DataURI when the byte slice does not start with 'data:' or is too short.

View Source
var URLEncodingTable = [256]bool{}/* 256 elements not displayed */

URLEncodingTable is a charmap for which characters need escaping in the URL encoding scheme

Functions

func Copy

func Copy(src []byte) (dst []byte)

Copy returns a copy of the given byte slice.

func DataURI

func DataURI(dataURI []byte) ([]byte, []byte, error)

DataURI parses the given data URI and returns the mediatype, data and ok.

func DecodeURL added in v2.3.15

func DecodeURL(b []byte) []byte

DecodeURL decodes an URL encoded using the URL encoding scheme

func Dimension

func Dimension(b []byte) (int, int)

Dimension parses a byte-slice and returns the length of the number and its unit.

func EncodeURL added in v2.3.15

func EncodeURL(b []byte, table [256]bool) []byte

EncodeURL encodes bytes using the URL encoding scheme

func EqualFold

func EqualFold(s, targetLower []byte) bool

EqualFold returns true when s matches case-insensitively the targetLower (which must be lowercase).

func IsAllWhitespace

func IsAllWhitespace(b []byte) bool

IsAllWhitespace returns true when the entire byte slice consists of space, \n, \r, \t, \f.

func IsNewline

func IsNewline(c byte) bool

IsNewline returns true for \n, \r.

func IsWhitespace

func IsWhitespace(c byte) bool

IsWhitespace returns true for space, \n, \r, \t, \f.

func Mediatype

func Mediatype(b []byte) ([]byte, map[string]string)

Mediatype parses a given mediatype and splits the mimetype from the parameters. It works similar to mime.ParseMediaType but is faster.

func Number

func Number(b []byte) int

Number returns the number of bytes that parse as a number of the regex format (+|-)?([0-9]+(\.[0-9]+)?|\.[0-9]+)((e|E)(+|-)?[0-9]+)?.

func Position

func Position(r io.Reader, offset int) (line, col int, context string)

Position returns the line and column number for a certain position in a file. It is useful for recovering the position in a file that caused an error. It only treates \n, \r, and \r\n as newlines, which might be different from some languages also recognizing \f, \u2028, and \u2029 to be newlines.

func Printable added in v2.5.0

func Printable(r rune) string

Printable returns a printable string for given rune

func QuoteEntity

func QuoteEntity(b []byte) (quote byte, n int)

QuoteEntity parses the given byte slice and returns the quote that got matched (' or ") and its entity length. TODO: deprecated

func ReplaceEntities added in v2.3.11

func ReplaceEntities(b []byte, entitiesMap map[string][]byte, revEntitiesMap map[byte][]byte) []byte

ReplaceEntities replaces all occurrences of entites (such as &quot;) to their respective unencoded bytes.

func ReplaceMultipleWhitespace

func ReplaceMultipleWhitespace(b []byte) []byte

ReplaceMultipleWhitespace replaces character series of space, \n, \t, \f, \r into a single space or newline (when the serie contained a \n or \r).

func ReplaceMultipleWhitespaceAndEntities added in v2.3.13

func ReplaceMultipleWhitespaceAndEntities(b []byte, entitiesMap map[string][]byte, revEntitiesMap map[byte][]byte) []byte

ReplaceMultipleWhitespaceAndEntities is a combination of ReplaceMultipleWhitespace and ReplaceEntities. It is faster than executing both sequentially.

func ToLower

func ToLower(src []byte) []byte

ToLower converts all characters in the byte slice from A-Z to a-z.

func TrimWhitespace

func TrimWhitespace(b []byte) []byte

TrimWhitespace removes any leading and trailing whitespace characters.

Types

type BinaryFileReader added in v2.7.14

type BinaryFileReader struct {
	Endianness binary.ByteOrder
	// contains filtered or unexported fields
}

func NewBinaryFileReader added in v2.7.14

func NewBinaryFileReader(f *os.File, chunk int) (*BinaryFileReader, error)

func (*BinaryFileReader) BufferLen added in v2.7.14

func (r *BinaryFileReader) BufferLen() int

BufferLen returns the length of the buffer.

func (*BinaryFileReader) Len added in v2.7.14

func (r *BinaryFileReader) Len() uint64

Len returns the remaining length of the buffer.

func (*BinaryFileReader) Offset added in v2.7.14

func (r *BinaryFileReader) Offset() uint64

Offset returns the offset of the buffer.

func (*BinaryFileReader) Pos added in v2.7.14

func (r *BinaryFileReader) Pos() uint64

Pos returns the reader's position.

func (*BinaryFileReader) Read added in v2.7.14

func (r *BinaryFileReader) Read(b []byte) (int, error)

Read complies with io.Reader.

func (*BinaryFileReader) ReadByte added in v2.7.14

func (r *BinaryFileReader) ReadByte() byte

ReadByte reads a single byte.

func (*BinaryFileReader) ReadBytes added in v2.7.14

func (r *BinaryFileReader) ReadBytes(n int) []byte

ReadBytes reads n bytes.

func (*BinaryFileReader) ReadInt16 added in v2.7.14

func (r *BinaryFileReader) ReadInt16() int16

ReadInt16 reads a int16.

func (*BinaryFileReader) ReadInt32 added in v2.7.14

func (r *BinaryFileReader) ReadInt32() int32

ReadInt32 reads a int32.

func (*BinaryFileReader) ReadInt64 added in v2.7.14

func (r *BinaryFileReader) ReadInt64() int64

ReadInt64 reads a int64.

func (*BinaryFileReader) ReadInt8 added in v2.7.14

func (r *BinaryFileReader) ReadInt8() int8

ReadInt8 reads a int8.

func (*BinaryFileReader) ReadString added in v2.7.14

func (r *BinaryFileReader) ReadString(n int) string

ReadString reads a string of length n.

func (*BinaryFileReader) ReadUint16 added in v2.7.14

func (r *BinaryFileReader) ReadUint16() uint16

ReadUint16 reads a uint16.

func (*BinaryFileReader) ReadUint32 added in v2.7.14

func (r *BinaryFileReader) ReadUint32() uint32

ReadUint32 reads a uint32.

func (*BinaryFileReader) ReadUint64 added in v2.7.14

func (r *BinaryFileReader) ReadUint64() uint64

ReadUint64 reads a uint64.

func (*BinaryFileReader) ReadUint8 added in v2.7.14

func (r *BinaryFileReader) ReadUint8() uint8

ReadUint8 reads a uint8.

func (*BinaryFileReader) Seek added in v2.7.14

func (r *BinaryFileReader) Seek(pos uint64) error

Seek set the reader position in the buffer.

type BinaryReader added in v2.7.14

type BinaryReader struct {
	Endianness binary.ByteOrder
	// contains filtered or unexported fields
}

BinaryReader is a binary big endian file format reader.

func NewBinaryReader added in v2.7.14

func NewBinaryReader(buf []byte) *BinaryReader

NewBinaryReader returns a big endian binary file format reader.

func NewBinaryReaderLE added in v2.7.14

func NewBinaryReaderLE(buf []byte) *BinaryReader

NewBinaryReaderLE returns a little endian binary file format reader.

func (*BinaryReader) EOF added in v2.7.14

func (r *BinaryReader) EOF() bool

EOF returns true if we reached the end-of-file.

func (*BinaryReader) Len added in v2.7.14

func (r *BinaryReader) Len() uint32

Len returns the remaining length of the buffer.

func (*BinaryReader) Pos added in v2.7.14

func (r *BinaryReader) Pos() uint32

Pos returns the reader's position.

func (*BinaryReader) Read added in v2.7.14

func (r *BinaryReader) Read(b []byte) (int, error)

Read complies with io.Reader.

func (*BinaryReader) ReadByte added in v2.7.14

func (r *BinaryReader) ReadByte() byte

ReadByte reads a single byte.

func (*BinaryReader) ReadBytes added in v2.7.14

func (r *BinaryReader) ReadBytes(n uint32) []byte

ReadBytes reads n bytes.

func (*BinaryReader) ReadInt16 added in v2.7.14

func (r *BinaryReader) ReadInt16() int16

ReadInt16 reads a int16.

func (*BinaryReader) ReadInt32 added in v2.7.14

func (r *BinaryReader) ReadInt32() int32

ReadInt32 reads a int32.

func (*BinaryReader) ReadInt64 added in v2.7.14

func (r *BinaryReader) ReadInt64() int64

ReadInt64 reads a int64.

func (*BinaryReader) ReadInt8 added in v2.7.14

func (r *BinaryReader) ReadInt8() int8

ReadInt8 reads a int8.

func (*BinaryReader) ReadString added in v2.7.14

func (r *BinaryReader) ReadString(n uint32) string

ReadString reads a string of length n.

func (*BinaryReader) ReadUint16 added in v2.7.14

func (r *BinaryReader) ReadUint16() uint16

ReadUint16 reads a uint16.

func (*BinaryReader) ReadUint32 added in v2.7.14

func (r *BinaryReader) ReadUint32() uint32

ReadUint32 reads a uint32.

func (*BinaryReader) ReadUint64 added in v2.7.14

func (r *BinaryReader) ReadUint64() uint64

ReadUint64 reads a uint64.

func (*BinaryReader) ReadUint8 added in v2.7.14

func (r *BinaryReader) ReadUint8() uint8

ReadUint8 reads a uint8.

func (*BinaryReader) Seek added in v2.7.14

func (r *BinaryReader) Seek(pos uint32) error

Seek set the reader position in the buffer.

type BinaryWriter added in v2.7.14

type BinaryWriter struct {
	// contains filtered or unexported fields
}

BinaryWriter is a big endian binary file format writer.

func NewBinaryWriter added in v2.7.14

func NewBinaryWriter(buf []byte) *BinaryWriter

NewBinaryWriter returns a big endian binary file format writer.

func (*BinaryWriter) Bytes added in v2.7.14

func (w *BinaryWriter) Bytes() []byte

Bytes returns the buffer's bytes.

func (*BinaryWriter) Len added in v2.7.14

func (w *BinaryWriter) Len() uint32

Len returns the buffer's length in bytes.

func (*BinaryWriter) Write added in v2.7.14

func (w *BinaryWriter) Write(b []byte) (int, error)

Write complies with io.Writer.

func (*BinaryWriter) WriteByte added in v2.7.14

func (w *BinaryWriter) WriteByte(v byte)

WriteByte writes the given byte to the buffer.

func (*BinaryWriter) WriteBytes added in v2.7.14

func (w *BinaryWriter) WriteBytes(v []byte)

WriteBytes writes the given bytes to the buffer.

func (*BinaryWriter) WriteInt16 added in v2.7.14

func (w *BinaryWriter) WriteInt16(v int16)

WriteInt16 writes the given int16 to the buffer.

func (*BinaryWriter) WriteInt32 added in v2.7.14

func (w *BinaryWriter) WriteInt32(v int32)

WriteInt32 writes the given int32 to the buffer.

func (*BinaryWriter) WriteInt64 added in v2.7.14

func (w *BinaryWriter) WriteInt64(v int64)

WriteInt64 writes the given int64 to the buffer.

func (*BinaryWriter) WriteInt8 added in v2.7.14

func (w *BinaryWriter) WriteInt8(v int8)

WriteInt8 writes the given int8 to the buffer.

func (*BinaryWriter) WriteString added in v2.7.14

func (w *BinaryWriter) WriteString(v string)

WriteString writes the given string to the buffer.

func (*BinaryWriter) WriteUint16 added in v2.7.14

func (w *BinaryWriter) WriteUint16(v uint16)

WriteUint16 writes the given uint16 to the buffer.

func (*BinaryWriter) WriteUint32 added in v2.7.14

func (w *BinaryWriter) WriteUint32(v uint32)

WriteUint32 writes the given uint32 to the buffer.

func (*BinaryWriter) WriteUint64 added in v2.7.14

func (w *BinaryWriter) WriteUint64(v uint64)

WriteUint64 writes the given uint64 to the buffer.

func (*BinaryWriter) WriteUint8 added in v2.7.14

func (w *BinaryWriter) WriteUint8(v uint8)

WriteUint8 writes the given uint8 to the buffer.

type BitmapReader added in v2.7.14

type BitmapReader struct {
	// contains filtered or unexported fields
}

BitmapReader is a binary bitmap reader.

func NewBitmapReader added in v2.7.14

func NewBitmapReader(buf []byte) *BitmapReader

NewBitmapReader returns a binary bitmap reader.

func (*BitmapReader) EOF added in v2.7.14

func (r *BitmapReader) EOF() bool

EOF returns if we reached the buffer's end-of-file.

func (*BitmapReader) Pos added in v2.7.14

func (r *BitmapReader) Pos() uint32

Pos returns the current bit position.

func (*BitmapReader) Read added in v2.7.14

func (r *BitmapReader) Read() bool

Read reads the next bit.

type BitmapWriter added in v2.7.14

type BitmapWriter struct {
	// contains filtered or unexported fields
}

BitmapWriter is a binary bitmap writer.

func NewBitmapWriter added in v2.7.14

func NewBitmapWriter(buf []byte) *BitmapWriter

NewBitmapWriter returns a binary bitmap writer.

func (*BitmapWriter) Bytes added in v2.7.14

func (w *BitmapWriter) Bytes() []byte

Bytes returns the buffer's bytes.

func (*BitmapWriter) Len added in v2.7.14

func (w *BitmapWriter) Len() uint32

Len returns the buffer's length in bytes.

func (*BitmapWriter) Write added in v2.7.14

func (w *BitmapWriter) Write(bit bool)

Write writes the next bit.

type Error

type Error struct {
	Message string
	Line    int
	Column  int
	Context string
}

Error is a parsing error returned by parser. It contains a message and an offset at which the error occurred.

func NewError

func NewError(r io.Reader, offset int, message string, a ...interface{}) *Error

NewError creates a new error

func NewErrorLexer

func NewErrorLexer(l *Input, message string, a ...interface{}) *Error

NewErrorLexer creates a new error from an active Lexer.

func (*Error) Error

func (e *Error) Error() string

Error returns the error string, containing the context and line + column number.

func (*Error) Position

func (e *Error) Position() (int, int, string)

Position returns the line, column, and context of the error. Context is the entire line at which the error occurred.

type Indenter added in v2.7.13

type Indenter struct {
	io.Writer
	// contains filtered or unexported fields
}

func NewIndenter added in v2.7.13

func NewIndenter(w io.Writer, n int) Indenter

func (Indenter) Write added in v2.7.13

func (in Indenter) Write(b []byte) (int, error)

type Input added in v2.5.0

type Input struct {
	// contains filtered or unexported fields
}

Input is a buffered reader that allows peeking forward and shifting, taking an io.Input. It keeps data in-memory until Free, taking a byte length, is called to move beyond the data.

func NewInput added in v2.5.0

func NewInput(r io.Reader) *Input

NewInput returns a new Input for a given io.Input and uses ioutil.ReadAll to read it into a byte slice. If the io.Input implements Bytes, that is used instead. It will append a NULL at the end of the buffer.

func NewInputBytes added in v2.5.0

func NewInputBytes(b []byte) *Input

NewInputBytes returns a new Input for a given byte slice and appends NULL at the end. To avoid reallocation, make sure the capacity has room for one more byte.

func NewInputString added in v2.5.0

func NewInputString(s string) *Input

NewInputString returns a new Input for a given string and appends NULL at the end.

func (*Input) Bytes added in v2.5.0

func (z *Input) Bytes() []byte

Bytes returns the underlying buffez.

func (*Input) Err added in v2.5.0

func (z *Input) Err() error

Err returns the error returned from io.Input or io.EOF when the end has been reached.

func (*Input) Len added in v2.5.0

func (z *Input) Len() int

Len returns the length of the underlying buffez.

func (*Input) Lexeme added in v2.5.0

func (z *Input) Lexeme() []byte

Lexeme returns the bytes of the current selection.

func (*Input) Move added in v2.5.0

func (z *Input) Move(n int)

Move advances the position.

func (*Input) MoveRune added in v2.7.14

func (z *Input) MoveRune()

MoveRune advances the position by the length of the current rune.

func (*Input) Offset added in v2.5.0

func (z *Input) Offset() int

Offset returns the character position in the buffez.

func (*Input) Peek added in v2.5.0

func (z *Input) Peek(pos int) byte

Peek returns the ith byte relative to the end position. Peek returns 0 when an error has occurred, Err returns the erroz.

func (*Input) PeekErr added in v2.5.0

func (z *Input) PeekErr(pos int) error

PeekErr returns the error at position pos. When pos is zero, this is the same as calling Err().

func (*Input) PeekRune added in v2.5.0

func (z *Input) PeekRune(pos int) (rune, int)

PeekRune returns the rune and rune length of the ith byte relative to the end position.

func (*Input) Pos added in v2.5.0

func (z *Input) Pos() int

Pos returns a mark to which can be rewinded.

func (*Input) Reset added in v2.5.0

func (z *Input) Reset()

Reset resets position to the underlying buffez.

func (*Input) Restore added in v2.5.0

func (z *Input) Restore()

Restore restores the replaced byte past the end of the buffer by NULL.

func (*Input) Rewind added in v2.5.0

func (z *Input) Rewind(pos int)

Rewind rewinds the position to the given position.

func (*Input) Shift added in v2.5.0

func (z *Input) Shift() []byte

Shift returns the bytes of the current selection and collapses the position to the end of the selection.

func (*Input) Skip added in v2.5.0

func (z *Input) Skip()

Skip collapses the position to the end of the selection.

Directories

Path Synopsis
Package buffer contains buffer and wrapper types for byte slices.
Package buffer contains buffer and wrapper types for byte slices.
Package css is a CSS3 lexer and parser following the specifications at http://www.w3.org/TR/css-syntax-3/.
Package css is a CSS3 lexer and parser following the specifications at http://www.w3.org/TR/css-syntax-3/.
Package html is an HTML5 lexer following the specifications at http://www.w3.org/TR/html5/syntax.html.
Package html is an HTML5 lexer following the specifications at http://www.w3.org/TR/html5/syntax.html.
Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.
Package js is an ECMAScript5.1 lexer following the specifications at http://www.ecma-international.org/ecma-262/5.1/.
Package json is a JSON parser following the specifications at http://json.org/.
Package json is a JSON parser following the specifications at http://json.org/.
Package xml is an XML1.0 lexer following the specifications at http://www.w3.org/TR/xml/.
Package xml is an XML1.0 lexer following the specifications at http://www.w3.org/TR/xml/.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL