parsec

package module
v0.0.0-...-23b21c2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2017 License: Apache-2.0 Imports: 7 Imported by: 0

README

Parser combinator library in Golang.
-----------------------------------

A library to construct top-down recursive backtracking parsers using
parser-combinators.  To know more about theory of parser
combinators, refer to : http://en.wikipedia.org/wiki/Parser_combinator

This package contains following components,

* standard set of combinators, parsing algorithms can create input
  specific combinator functions on top of the standard set.
* regular expression based scanner.
* standard set of terminal tokens.

**API reference**: https://godoc.org/github.com/tompaton/goparsec

combinators
~~~~~~~~~~~

Every combinator should confirm to the following signature,

.. code-block:: go

    // ParsecNode type defines a node in the AST
    type ParsecNode interface{}

    // Parser function parses input text, higher order parsers are
    // constructed using combinators.
    type Parser func(Scanner) (ParsecNode, Scanner)

    // Nodify callback function to construct custom ParsecNode.
    type Nodify func([]ParsecNode) ParsecNode

combinators take a variable number of parser functions and
return a new parser function.

`Nodify` is user supplied callback function, called by combinator when the
rule matches current input text. The callback function receives a
collection of parsecNode interface generated by `parsers`.

A `Parser` function must take a new copy of `Scanner` instance, with current
cursor, as argument, and consume the scanner as long as there is a match. If
the parser could not match with input text it returns the scanner without
consuming any input. For example, the following snippet is from `And`
combinator,

.. code-block:: go

    var ns = make([]ParsecNode, 0, len(parsers))
    var n ParsecNode
    news := s.Clone()
    for _, parser := range parsers {
        n, news = doParse(parser, news)
        if n == nil {
            return nil, s
        }
        ns = append(ns, n)
    }
    return docallback(callb, ns), news

we can see that a new instance of `s` is passed to each `parser` and when one
of the parser returns failure (where n==nil), it simply returns the scanner
without consuming any tokens. Otherwise, it returns the new-scanner `news`
returned by the last parser.

List of combinators
-------------------

And
~~~

.. code-block:: go
    func And(callb Nodify, parsers ...interface{}) Parser {

accepts a collection of parsers that must sequentially match current
input text, combinator fails when any of the parser fails to match.

nodify callback is called with a slice of ParsecNodes obtained from each
parser, otherwise callback is ignored.

OrdChoice
~~~~~~~~~

.. code-block:: go
    func OrdChoice(callb Nodify, parsers ...interface{}) Parser {

accepts a collection of parsers, where atleast one of the parser should
match current input text, combinator fails when all the parser fail to
match.

nodify callback is called with a slice of single parsecNode element when
one of the input parser matches with input text, otherwise callback is
ignored.

when a parser matches the input text remaining parsers are not tried.

Kleene
~~~~~~

.. code-block:: go
    func Kleene(callb Nodify, parsers ...interface{}) Parser {

accepts a pair of parser, where the first element must match `zero or more
times` with current input text and the second optional element acts as token
separator, kleene combinator will exit when the first parser or the
second parser, if specified, fails.

nodify callback is called with a slice of ParsecNodes obtained from every
match of the first parser, otherwise called with empty-slice of ParsecNodes.

Many
~~~~

.. code-block:: go
    func Many(callb Nodify, parsers ...interface{}) Parser {

accepts a pair of parser, where the first element must match `one or more
times` with current input text and the second optional element acts as token
separator. Note that the Many repeatition will exit when first parser or
second parser, if specified, fails.

nodify callback is called with a slice of ParsecNodes obtained from every
match of the first parser, otherwise callback is ignored.

ManyUntil
~~~~

.. code-block:: go
    func ManyUntil(callb Nodify, parsers ...interface{}) Parser {

accepts a two or three parsers, where the first element must match `one or more
times` with current input text and the second optional element acts as token
separator. The last parser specifies a final token to stop matching. Note that
the ManyUntil repetition will exit when first parser or second parser, if
specified, fails or the last parser succeeds.

nodify callback is called with a slice of ParsecNodes obtained from every
match of the first parser, otherwise callback is ignored.

Maybe
~~~~~

.. code-block:: go
    func Maybe(callb Nodify, parser interface{}) Parser {

accepts a parser that can either match or does-not-match with current
input text.

nodify callback is called with a slice of single parsecNode element if
`Maybe` succeeds, otherwise callback is ignored.

Optional
~~~~~

.. code-block:: go
    func Optional(callb Nodify, parser interface{}, default ParsecNode) Parser {

accepts a parser that can either match or does-not-match with current
input text.

nodify callback is called with a slice of single parsecNode element if
`Optional` succeeds, otherwise the default parsecNode is returned.

using the builtin scanner
-------------------------

The builtin scanner library manages the input buffer and a cursor into the
buffer. Create a new scanner instance,

.. code-block:: go

    s := parsec.NewScanner(text)

the scanner library supplies method receivers like `Match()`, `SkipWS()` and
`Endof()`. refer to scanner.go for more information on each of these methods.

Examples
~~~~~~~~

- expr/expr.go, implements a parsec grammer to parse arithmetic expressions.
- json/json.go, implements a parsec grammer to parse JSON document.

clone the repository run the benchmark suite

.. code-block:: bash

    $ cd expr/
    $ go test -test.bench=. -test.benchmem=true
    $ cd json/
    $ go test -test.bench=. -test.benchmem=true

to run the example program,

.. code-block:: bash

    # to parse expression
    $ go run tools/parsec/parsec.go -expr "10 + 29"

    # to parse JSON string
    $ go run tools/parsec/parsec.go -json '{ "key1" : [10, "hello", true, null, false] }'

Documentation

Overview

Package parsec implements a library of parser-combinators using basic recognizers like,

And OrdChoice Kleene Many Maybe

Parser combinators can be used to construct higher-order parsers using basic parser function. All parser functions are expected to follow the `Parser` type signature, accepting a `Scanner` interface and returning a `ParsecNode` and a new scanner. If a parser fails to match the input string according to its rules, then it must return nil for ParsecNode, and a new Scanner.

ParsecNode can either be a Terminal structure or NonTerminal structure or a list of Terminal/NonTerminal structure. The AST output is expected to be made up of ParsecNode.

Nodify is a callback function that every combinators use as a callback to construct a ParsecNode.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Nodify

type Nodify func([]ParsecNode) ParsecNode

Nodify callback function to construct custom ParsecNode.

type NonTerminal

type NonTerminal struct {
	Name     string       // contains terminal's token type
	Value    string       // value of the terminal
	Children []ParsecNode // list of children to this node.
}

NonTerminal structure can be used to construct a non-terminal ParsecNode.

type ParsecNode

type ParsecNode interface{}

ParsecNode type defines a node in the AST

type Parser

type Parser func(Scanner) (ParsecNode, Scanner)

Parser function parses input text, higher order parsers are constructed using combinators.

func And

func And(callb Nodify, parsers ...interface{}) Parser

And combinator accepts a list of `Parser`, or reference to a parser, that must match the input string, atleast until the last Parser argument. Returns a parser function that can be used to construct higher-level parsers.

If all parser matches, a list of ParsecNode, where each ParsecNode is constructed by matching parser, will be passed as argument to Nodify callback. Even if one of the input parser function fails then empty slice of ParsecNode will be supplied as argument to Nodify callback.

func Char

func Char() Parser

Char returns a parser function to match a single character in the input stream.

func End

func End() Parser

End is a parser function to detect end of scanner output.

func Float

func Float() Parser

Float returns a parser function to match a float literal in the input stream.

func Hex

func Hex() Parser

Hex returns a parser function to match a hexadecimal literal in the input stream.

func Ident

func Ident() Parser

Ident returns a parser function to match an identifier token in the input stream, an identifier is matched with the following pattern,

`^[A-Za-z][0-9a-zA-Z_]*`

func Int

func Int() Parser

Int returns a parser function to match an integer literal in the input stream.

func Kleene

func Kleene(callb Nodify, parsers ...interface{}) Parser

Kleene combinator accepts two parsers, or reference to parsers, namely opScan and sepScan, where opScan parser will be used to match input string and contruct ParsecNode and sepScan parser will be used to match input string and ignore the matched string. If sepScan parser is not supplied, then opScan parser will be applied on the input until it fails.

The process of matching opScan parser and sepScan parser will continue in a loop until either one of them fails on the input stream.

For every successful match of opScan, the returned ParsecNode from matching parser will be accumulated and passed as argument to Nodify callback. If there is not a single match for opScan, then an empty slice of ParsecNode will be passed as argument to Nodify callback.

func Many

func Many(callb Nodify, parsers ...interface{}) Parser

Many combinator accepts two parsers, or reference to parsers, namely opScan and sepScan, where opScan parser will be used to match input string and contruct ParsecNode and sepScan parser will be used to match input string and ignore the matched string. If sepScan parser is not supplied, then opScan parser will be applied on the input until it fails.

The process of matching opScan parser and sepScan parser will continue in a loop until either one of them fails on the input stream.

The difference between `Many` combinator and `Kleene` combinator is that there shall atleast be one match of opScan.

For every successful match of opScan, the returned ParsecNode from matching parser will be accumulated and passed as argument to Nodify callback. If there is not a single match for opScan, then `nil` will be returned for ParsecNode.

func ManyUntil

func ManyUntil(callb Nodify, parsers ...interface{}) Parser

ManyUntil combinator accepts three parsers, or references to parsers, namely opScan, sepScan and untilScan, where opScan parser will be used to match input string and contruct ParsecNode and sepScan parser will be used to match input string and ignore the matched string. If sepScan parser is not supplied, then opScan parser will be applied on the input until it fails.

The process of matching opScan parser and sepScan parser will continue in a loop until either one of them fails on the input stream or untilScan matches.

For every successful match of opScan, the returned ParsecNode from matching parser will be accumulated and passed as argument to Nodify callback. If there is not a single match for opScan, then `nil` will be returned for ParsecNode.

func Maybe

func Maybe(callb Nodify, parser interface{}) Parser

Maybe combinator accepts a single parser, or reference to a parser, and tries to match the input stream with it.

func NoEnd

func NoEnd() Parser

NoEnd is a parser function to detect not-an-end of scanner output.

func Oct

func Oct() Parser

Oct returns a parser function to match an octal number literal in the input stream.

func Optional

func Optional(callb Nodify, parser interface{}, default_node ParsecNode) Parser

Optional combinator accepts a single parser, or reference to a parser, and tries to match the input stream with it. If the input stream doesn't match, the default terminal node is returned.

func OrdChoice

func OrdChoice(callb Nodify, parsers ...interface{}) Parser

OrdChoice combinator accepts a list of `Parser`, or reference to a parser, where atleast one of the parser must match the input string. Returns a parser function that can be used to construct higher level parsers.

The first matching parser function's output is passed as argument to Nodify callback. If non of the parsers match the input, then `nil` is returned for ParsecNode

func OrdTokens

func OrdTokens(patterns []string, names []string) Parser

OrdTokens to parse a single token based on one of the specified `patterns`.

func OrdTokensCI

func OrdTokensCI(patterns []string, names []string) Parser

Case-insensitive variation of OrdTokens - automatically prefixes patterns with (?i) flag.

func String

func String() Parser

String returns a parser function to match a double quoted string in the input stream.

func Token

func Token(pattern string, name string) Parser

Token takes a pattern and returns a parser that will match the pattern with input stream. Input stream will be supplied via Scanner interface.

func TokenCI

func TokenCI(pattern string, name string) Parser

Case-insensitive variation of Token - automatically prefixes pattern with (?i) flag.

type Scanner

type Scanner interface {
	// Clone will return new clone of the underlying scanner structure.
	// This will be used by combinators to _backtrack_.
	Clone() Scanner

	// GetCursor gets the current cursor position inside input text.
	GetCursor() int

	// Match the input stream with `pattern` and return
	// matching string after advancing the cursor.
	Match(pattern string) ([]byte, Scanner)

	// SubmatchAll the input stream with a choice of `patterns`
	// and return matching string and submatches, after
	// advancing the cursor.
	SubmatchAll(pattern string) (map[string][]byte, Scanner)

	// SkipWs skips white space characters in the input stream.
	// Return skipped whitespaces as byte-slice and advance the cursor.
	SkipWS() ([]byte, Scanner)

	// Endof detects whether end-of-file is reached in the input
	// stream and return a boolean indicating the same.
	Endof() bool
}

Scanner interface supplies necessary methods to match the input stream.

func NewScanner

func NewScanner(text []byte) Scanner

NewScanner creates and returns a reference to new instance of SimpleScanner object.

type SimpleScanner

type SimpleScanner struct {
	// contains filtered or unexported fields
}

SimpleScanner implements Scanner interface based on golang's regexp module.

func (*SimpleScanner) Clone

func (s *SimpleScanner) Clone() Scanner

Clone method receiver in Scanner{} interface.

func (*SimpleScanner) Endof

func (s *SimpleScanner) Endof() bool

Endof method receiver in Scanner{} interface.

func (*SimpleScanner) GetCursor

func (s *SimpleScanner) GetCursor() int

GetCursor method receiver in Scanner{} interface.

func (*SimpleScanner) Match

func (s *SimpleScanner) Match(pattern string) ([]byte, Scanner)

Match method receiver in Scanner{} interface.

func (*SimpleScanner) SkipWS

func (s *SimpleScanner) SkipWS() ([]byte, Scanner)

SkipWS method receiver in Scanner{} interface.

func (*SimpleScanner) SubmatchAll

func (s *SimpleScanner) SubmatchAll(
	pattern string) (map[string][]byte, Scanner)

SubmatchAll method receiver in Scanner{} interface.

type Terminal

type Terminal struct {
	Name     string // contains terminal's token type
	Value    string // value of the terminal
	Position int    // Offset into the text stream where token was identified
}

Terminal structure can be used to construct a terminal ParsecNode.

Directories

Path Synopsis
Package json provide a parser to parse JSON string.
Package json provide a parser to parse JSON string.
tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL