parse

package module
v0.0.0-...-2b12e0a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2016 License: MIT Imports: 10 Imported by: 0

README

parse is Go implementation of PEG parser.

This is simple Go parser that uses mapping from Go types to PEG language definitions.

Simple example:

type Hello struct {
	Hello string `regexp:"[hH]ello"`
	_     string `literal:","`
	Name  string `regexp:"[a-zA-Z]+"`
}
...
var hello Hello
new_location, err := parse.Parse(&hello, []byte("Hello, user"), nil)

Documentation is here: https://godoc.org/github.com/rymis/parse And user-friendly examples and book are placed here.

Go Report Card

Documentation

Overview

Package parse - easy to use PEG implementation with Go.

This package contains PEG (Parsing Expressions Grammar) implementation that could be used with Go. This library is much different from other libraries because grammar mapped to Go types, so you don't need to use external grammar files nor expressions to specify one like with pyparsing or Boost.Spirit.

For example you can parse hello world using this structure:

type HelloWorld struct {
	Hello string `regexp:"[hH]ello"`
	_     string `literal:","`
	World string `regexp:"[a-zA-Z]+"`
	_     string `regexp:"!?"`
}

And the only thing you need to do is call Parse function:

var hello HelloWorld
newLocation, err := parse.Parse(&hello, []byte("Hello, World!"), nil)

You can also specify whitespace skipping function (default is to skip all spaces, tabulations, new-lines and carier returns) packrat using, grammar debugging options et. cetera.

One of the interesting features of this library is ability to parse Go base data types using Go grammar. For example you can simply parse int64 with Parse:

var i int64
newLocation, err := parse.Parse(&i, []byte("123"), nil)

If you need to parse variant types you need to insert FirstOf as first field in your structure:

type StringOrInt struct {
	FirstOf
	Str     string
	Int     int
}
newLocation, err := parse.Parse(new(StringOrInt), `"I can parse Go string!"`, nil)

Optional fields must be of pointer type and contain `optional:"true"` tag. You can use slices that will be parsed as ELEMENT* or ELEMENT+ (if `repeat:"+"` was set in tag). You can specify another tags and types listed bellow.

+-------------+-------------+----------------------------------------------------+
| Type        | Tag         | Description                                        |
+-------------+-------------+----------------------------------------------------+
| string      |             | Parse Go string. `string` and "string" are both    |
|             |             | supported.                                         |
+-------------+-------------+----------------------------------------------------+
| string      | regexp      | Parse regular expression in regexp module syntax.  |
+-------------+-------------+----------------------------------------------------+
| string      | literal     | Parse literal specified in tag. If there are both  |
|             |             | regexp and literal specified regexp will be used.  |
+-------------+-------------+----------------------------------------------------+
| int*        |             | Parse integer constant. Hexadecimal, Octal and     |
|             |             | decimal constants supported. int32 and rune types  |
|             |             | are the same type in Go, so int32 parse characters |
|             |             | in Go syntax.                                      |
+-------------+-------------+----------------------------------------------------+
| int*        | parse       | If tag parse:"#" was set parser will save current  |
|             |             | location in this field and will not advance one.   |
+-------------+-------------+----------------------------------------------------+
| uint*       |             | Same as int* but unsigned constant.                |
+-------------+-------------+----------------------------------------------------+
| float*      |             | Parse floating point number.                       |
+-------------+-------------+----------------------------------------------------+
| bool        |             | Parse boolean constant (true or false)             |
+-------------+-------------+----------------------------------------------------+
| []type      | parse       | Parse sequence of type. If parse is not specified  |
|             |             | or parse is '*' here could be zero or more         |
|             |             | elements. If parse is '+' here could be one or     |
|             |             | more elements.                                     |
+-------------+-------------+----------------------------------------------------+
| []type      | delimiter   | Parse list with delimiter literal. It is very      |
|             |             | common situation to have a DELIMITER b DELIMITER...|
|             |             | like lists so I think that it is good idea to      |
|             |             | support such lists out of the box.                 |
+-------------+-------------+----------------------------------------------------+
| *type       | parse       | Parse type. Element will be allocated or set to nil|
|             |             | for optional elements that doesn't present. If     |
|             |             | parse was specified and set to '?' element is      |
|             |             | optional: if it is not present in the input field  |
|             |             | will be nil.                                       |
+-------------+-------------+----------------------------------------------------+
| any         | parse       | If parse == "skip" field will be skipped while     |
|             |             | parsing or encoding. If parse == "&" it is followed|
|             |             | by element: it will be parsed but position will not|
|             |             | be increased. If parse == "!" it is not predicate: |
|             |             | element must not be present at this position.      |
+-------------+-------------+----------------------------------------------------+
| any         | set         | If present this tag contains name of the method to |
|             |             | call after parsing of element. Method must have    |
|             |             | signature func (x element-type) error.             |
+-------------+-------------+----------------------------------------------------+

Parser supports left recursion out of the box so you can parse expressions without a problem. For example you can parse this grammar:

X <- E
E <- X '-' Number / Number

with

type X struct {
	Expr E
}
type E struct {
	FirstOf
	Expr struct {
		Expr *X
		_ string `regexp:"-"`
		N uint64
	}
	N uint64

}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Append

func Append(array []byte, value interface{}) ([]byte, error)

Append encoded value to slice. Function returns new slice.

func Parse

func Parse(result interface{}, str []byte, params *Options) (newLocation int, err error)

Parse value from string and return position after parsing and error. This function parses value using PEG parser. Here: result is pointer to value, str is string to parse, params is parsing parameters. Function returns newLocation - location after the parsed string. On errors err != nil.

func SkipAdaComment

func SkipAdaComment(str []byte, loc int) int

SkipAdaComment skips Ada style comment: "-- .... \n"

func SkipAll

func SkipAll(str []byte, loc int, funcs ...func([]byte, int) int) int

SkipAll skips any count of any substrings defined by skip functions.

func SkipCComment

func SkipCComment(str []byte, loc int) int

SkipCComment skips C style comment: "/* ..... */"

func SkipCPPComment

func SkipCPPComment(str []byte, loc int) int

SkipCPPComment skips C++ style comment: "// ..... \n"

func SkipHTMLComment

func SkipHTMLComment(str []byte, loc int) int

SkipHTMLComment skips HTML style comment: "<!-- ... -->"

func SkipLispComment

func SkipLispComment(str []byte, loc int) int

SkipLispComment skips Lisp style comment: "; .... \n"

func SkipMultilineComment

func SkipMultilineComment(str []byte, loc int, begin, end string, recursive bool) int

SkipMultilineComment skips multiline comment that starts from begin and ends with end. If you are allowing nested comments recursive must be set to true.

func SkipOneLineComment

func SkipOneLineComment(str []byte, loc int, begin string) int

SkipOneLineComment skips one-line comment that starts from begin and ends with newline or end of string

func SkipPascalComment

func SkipPascalComment(str []byte, loc int) int

SkipPascalComment skips Pascal style comment: "(* ... *)"

func SkipShellComment

func SkipShellComment(str []byte, loc int) int

SkipShellComment skips shell style comment: "# .... \n"

func SkipSpaces

func SkipSpaces(str []byte, loc int) int

SkipSpaces skips spaces, tabulations and newlines:

func SkipTeXComment

func SkipTeXComment(str []byte, loc int) int

SkipTeXComment skips TeX style comment: "% .... \n"

func Write

func Write(out io.Writer, value interface{}) error

Write encoded value into output stream.

Types

type Error

type Error struct {
	// Original string
	Str []byte
	// Location of this error in the original string
	Location int
	// Error message
	Message string
}

Error is parse error representation. Error implements error interface. Error message contains message, position information and marked error line.

func (Error) Error

func (e Error) Error() string

Returns error string of parse error. It is well-formed version of error so you can simply write it to user.

type FirstOf

type FirstOf struct {
	// Name of parsed field
	Field string
}

FirstOf is structure that indicates that we need to parse first expression of the fields of structure. After pasring Field contains name of parsed field.

type Options

type Options struct {
	// Function to skip whitespaces. If nil will not skip anything.
	SkipWhite func(str []byte, loc int) int
	// Flag to enable packrat parsing. If not set packrat table is used only for left recursion detection and processing.
	PackratEnabled bool
	// Enable grammar debugging messages. It is useful if you have some problems with grammar but produces a lot of output.
	Debug bool
}

Options is structure containing parameters of the parsing process.

func NewOptions

func NewOptions() *Options

NewOptions creates new default parameters object.

type Parser

type Parser interface {
	// This function must parse value from buffer and return length or error
	ParseValue(buf []byte, loc int) (newLocation int, err error)
	// This function must write value into the output stream.
	WriteValue(out io.Writer) error
}

Parser interface. Parser will call ParseValue method to parse values of this types.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL