rebnf

package module
v0.0.0-...-13ae749 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 12, 2016 License: MIT Imports: 14 Imported by: 0

README

rebnf is a tool to generate random productions of arbitrary EBNF grammars.

Documentation

Installation

go get chrispennello.com/go/rebnf/cmd/rebnf

Tools

The tools directory contains a Python tool for fetching Unicode character ranges from fileformat.info and consolidating them into EBNF alternative range expressions, suitable for consumption by rebnf.

Test Languages

The testlang directory contains grammars for testing and experimentation. In particular, testlang/go contains a snapshot of the Go Programming Language Specification and an extracted grammar, including one in which the Unicode productions have been defined.

Future Work

  • More reliable termination soon after recursion depth limit exceeded. More reliable terminal path analysis. Graph analysis of grammar productions.
  • Non-uniform production weighting.
  • Mode to prefer non-weird characters (e.g., put it in "ASCII mode").

Documentation

Overview

Package rebnf implements random productions of arbitrary EBNF grammars.

It uses golang.org/x/exp/ebnf to parse the grammars. It also offers updated support for parsing EBNF grammars from HTML pages, a feature in exp/ebnf, but, at the time of writing, out of date with respect to the formatting of the Go Programming Language Specification HTML page.

Usage

Call Parse to extract a grammar from an input file or reader.

Create a context to specify several options to control the random productions.

First, recall EBNF repetitions.

Repetition = "{" Expression "}" .

These are expressions that can be repeated 0 or more times. The approach taken with these is to pick a random number of times to repeat such an expression, between 0 and the specified maximum number of repetitions. This is one of the arguments you must provide when creating a context. A reasonable value is 100.

The grammar is randomly explored recursively. Thus, another way you may control the random productions is via a recursion depth limit. Recall that exp/ebnf EBNF grammars distinguish between non-terminal and terminal productions by using capitalized or uncapitalized names. When the recursion depth limit is exceeded, the algorithm will make an effort to favor terminal symbols over non-terminal ones. Note that this does NOT protect, you, however from pathological grammars such as "S = S", or other grammars that necessitate infinite productions. This recursion depth limit is another one of the arguments you must specify when creating a context. A reasonable value is 30.

Several other context parameters include a whitespace padding feature for non-terminals and debug output. See the Ctx documentation for more details.

Pass the grammar and the name of the start production into Random, a function defined on contexts. The output instance of the language will be written to the given destination writer.

A command-line implementation is made available at cmd/rebnf.

Index

Constants

This section is empty.

Variables

View Source
var ErrNoStart = errors.New("start production not found")

ErrNoStart is returned by Random when the specified start production cannot be found in the grammar.

Functions

func CheckExtract

func CheckExtract(filename string, src []byte) []byte

CheckExtract checks the input filename to see if it ends in ".html" or, if not, then checks if an opening token is present. If either is true, then the source byte array is treated as an HTML document and EBNF text is extracted according to the following opening and closing tokens.

<pre class="ebnf">
</pre>

In this case, the extracted byte slice is returned instead of the original. In addition, it strips out A tags and unescapes any HTML escape sequences, such as "&amp;".

If the filename doesn't end with ".html", or the opening token isn't present, the original byte slice is returned instead.

func ExtractEBNF

func ExtractEBNF(src []byte) []byte

ExtractEBNF extracts the EBNF text from a file in which grammar productions are grouped in boxes demarcated by the following HTML elements.

<pre class="ebnf">
</pre>

An example of such a file is the Go Programming Language Specification HTML page.

ExtractEBNF was itself initially extracted from golang.org/x/exp/ebnflint. Unfortunately, it's not exported there, so we duplicate it here and export it.

func IsCapital

func IsCapital(s string) bool

IsCapital returns a boolean indicating whether or not the first rune of the given string is upper case.

func IsTerminal

func IsTerminal(expr ebnf.Expression) bool

IsTerminal returns a boolean that indicates whether the given Expression is a terminal one. Ranges and Tokens are unconditionally considered to be terminal, and Names are terminal iff they're capitalized. Productions are not considered because Alternatives contain Names, and you have to loo up the production by name in the grammar--it's just not a use case handled by this library, but could be added easily if needed.

func Parse

func Parse(filename string, r io.Reader) (ebnf.Grammar, error)

Parse parses an EBNF grammar from the file with the given filename and an optional io.Reader. If the reader is not provided, the code will attempt to open the file specified by name. In either case, the filename will be used in error output and debug messages.

The logic in Parse is extracted from golang.org/x/exp/ebnflint. Unfortunately, it's not exported there, so we duplicate it here and export it. It is modified a bit, though, to be more generic.

func StripTag

func StripTag(tagname string, src []byte) []byte

StripTag strips out the named tag from a slice of bytes. The tag is expected to open with a string of the following form.

<tagname ...>

Where ... can be anything. If any newlines appear in the ..., they will be preserved.

The tag is expected to close with a string of the following form.

</tagname>

Types

type Ctx

type Ctx struct {
	// contains filtered or unexported fields
}

Ctx is a context for making Random calls. It provides several limits to the recursive grammar-walking algorithm to help limit its output. See NewCtx.

func NewCtx

func NewCtx(maxRepetitions, maxRecursionDepth int, padding string, debug bool) *Ctx

NewCtx creates a new Ctx, ready to make Random calls. Pass in two limits.

The first is the maximum repetitions--the largest number of times that an EBNF repetition will be repeated. In its production, when Random encounters a repetition, it will choose a random number of times to repeat it in [0, maxRepetitions]. A reasoanble value is 30.

The second is the recursion depth limit. It does not truly limit the depth of the recursion, but when this limit is surpassed, the algorithm will favor pursuing terminal symbols over non-terminal ones in order to wrap up production quickly while still ensuring that the production is an instance of the language specified by the grammar. A reasonable value is 100.

Finally, pass in a string of characters to pad non-terminal productions with. The string may be empty to disable such padding.

There is also a boolean to disable or enable debug output. This can be helpful if you don't understand why the algorithm is making the grammar traversals that it is. When debug is on, the algorithm will output what ebnf.Expressions it traverses on standard error.

func (*Ctx) Random

func (c *Ctx) Random(dst io.Writer, grammar ebnf.Grammar, start string) error

Random generates random productions of the given grammar starting at the given start production, and writes them into the destination io.Writer.

Directories

Path Synopsis
cmd
extractebnf
extractebnf extracts and prints EBNF grammars.
extractebnf extracts and prints EBNF grammars.
rebnf
rebnf outputs random productions of arbitrary EBNF grammars.
rebnf outputs random productions of arbitrary EBNF grammars.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL