xre

package module
v0.0.0-...-3a02eec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 4, 2020 License: BSD-2-Clause Imports: 11 Imported by: 0

README

xre is to sam as grep is to ed

xre exists to bring the awesome power of Rob Pike's Structural Regular Expressions beyond the reach of the sam editor (approriately/coincidentally/ironically it is implemented in Go, yielding more Rob Pike reference).

WARNING: It is still in a primordial / experimental phase, but works well as a proof of concept.

What?

A short comparison to the grep/ed model:

  • a new x/re/ command extracts structure matched by a regular expression
  • ... x[ x{ x( and x< extract a balanced pair of braces
  • a new y/re/ command extracts structure delimited by a regular expression
  • ... y"delim" extracts structure between occurrences of a static delimiter, e.g. y"\n" for classic UNIX line-orientation
  • ... y/start/end/ extracts structure between two regular expressions
  • ... y[ y{ y( and y< extract content within a balanced pair of braces
  • the g/re/ command filters the current buffer (as extracted by x or y) if the given pattern matches
  • the v/re/ command filters the current buffer (as extracted by x or y) if the given pattern doesn't matches
  • the p command prints
  • ... p"delim" prints with a delimiter, e.g. p"\n" to return to the warm embrace of classic UNIX tools
  • ... p%"format" prints with a format pattern, e.g. p"%q\n" is particularly useful while developing an xre program

Why?

Loosely quoting from Structural Regular Expressions:

...if the interesting quantum of information isn’t a line, most of the (UNIX) tools don’t help, or at best do poorly

Example: counting Go heap allocations

For example, it is sometimes useful to deal with things like paragraphs (bytes that are delimited by a blank line, i.e. "\n\n"). For maximal self reference, such a data set can be had from your nearest Go program form either its /debug/pprof/heap?debug=1 endpoint, or by calling pprof.Lookup("heap").WriteTo(f, 1) yourself.

For example, the following xre program extracts just the allocation bytes from heap allocations involving a call to bytes.makeSlice (i.e. when a bytes.Buffer needs to grow):

xre 'y"\n\n" v/bytes.makeSlice/ y"\n" v/^#|^$/ x[x/^\d: (\d+)/ p"\n"'

Breaking down the above command

  • extract paragraphs (buffers defined delimited by blank lines)
  • keep only the paragraphs that mention "bytes.makeSlice"
  • extract lines within those paragraphs
  • and keep only the lines that aren't blank and don't start with a "#"
  • on those lines, extract the contents of the first balanced [ ] pair
  • and then extract the "MMM" in a "NNN: MMM" match within it
  • finally, print those numbers delimited by new lines (the classic UNIX paradigm)

As always, summing a stream of numbers is left as an exercise to the reader.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// MinRead is the minimum number of bytes to attempt to read in each call
	// to io.Reader.
	MinRead = 64 * 1024
)
View Source
var Stdenv = FileEnv{
	DefaultInfile:  os.Stdin,
	DefaultOutfile: os.Stdout,
}

Stdenv is the default expected Environment that defaults to reading from os.Stdin and writes to os.Stdout.

Functions

func BuildReaderFrom

func BuildReaderFrom(cmd Command, env Environment) (io.ReaderFrom, error)

BuildReaderFrom builds a Command in an Environment. The returned io.ReaderFrom will perform the processing specified by the command by reading all bytes in a given io.Reader.

Any error constructing the command's Processor is returned. Furthermore, if the resulting Processor does not implement io.ReaderFrom, and the given Environment doesn't provide any default reader semantics, then an error is returned telling the user to specify match extraction semantics (e.g. line delimiting by adding a `y/\n/` prefix to the command).

func RunCommand

func RunCommand(prog string, env Environment) (rerr error)

RunCommand parses the command, and runs it over all readers received from the given channel, which are then closed after processing is done. It is a convenience around ParseCommand and BuildReaderFrom. The given environment is closed before returning.

func RunReaderFrom

func RunReaderFrom(rf io.ReaderFrom, env Environment) error

RunReaderFrom runs the given io.ReaderFrom over all inputs received from env.Inputs(). Each input reader is closed after having read from it. Processing stops on the first input, read, or close error, which is returned.

Types

type BufEnv

type BufEnv struct {
	Input         bytes.Buffer
	DefaultOutput bytes.Buffer
	// contains filtered or unexported fields
}

BufEnv is an Environment that reads input from an in-memory buffer, and collects all output in another in-memory buffer; useful mainly for testing.

func (*BufEnv) Close

func (be *BufEnv) Close() error

Close does nothing.

func (*BufEnv) Default

func (be *BufEnv) Default() Processor

Default returns a processor that will write to the DefaultOutput buffer.

func (*BufEnv) Inputs

func (be *BufEnv) Inputs() <-chan Input

Inputs returns a channel which will contain a single Input, wrapping the BufEnv.Input value. It returns the same channel every time, until Reset is called.

func (*BufEnv) Reset

func (be *BufEnv) Reset()

Reset all input and output state, preparing the BufEnv for re-use under a new command/input pair.

func (*BufEnv) RunProcessor

func (be *BufEnv) RunProcessor(proc Processor, input []byte) (out []byte, err error)

RunProcessor runs the given Processor with the given input bytes, and returns any output bytes and processing error.

func (*BufEnv) RunReaderFrom

func (be *BufEnv) RunReaderFrom(rf io.ReaderFrom, inputs ...io.Reader) (out []byte, err error)

RunReaderFrom runs the given io.ReaderFrom with any given input io.Readers, and returns any output bytes and processing error. If no inputs are given, then then rf is run only once with an empty io.Reader stream.

func (*BufEnv) SetInputs

func (be *BufEnv) SetInputs(rs ...io.Reader)

SetInputs stores the given io.Readers (upgraded or adapted to io.ReadCloser) for future reception under Inputs().

type Command

type Command interface {
	Create(next Command, env Environment) (Processor, error)
}

Command represents a piece of potential XRE processing which; combining it with an Environment realizes said potential, resulting in a Processor.

func ParseCommand

func ParseCommand(s string) (Command, error)

ParseCommand parses an XRE command from the given string, returning any parse error if the string is invalid.

type Environment

type Environment interface {
	Inputs() <-chan Input
	Default() Processor
	Close() error
}

Environment abstracts command runtime context; currently this only means where output goes.

var NullEnv Environment = _nullEnv{}

NullEnv is an Environment that discards all output, useful mainly for examining processor structure separate from any real environment.

type FileEnv

type FileEnv struct {
	DefaultInfile  *os.File
	DefaultOutfile *os.File
	// contains filtered or unexported fields
}

FileEnv is an Environment backed directly by files; there may be a default provided input file, and output goes into a single provided file.

func (*FileEnv) AddInput

func (fe *FileEnv) AddInput(f *os.File, err error)

AddInput allocates an inputs channel, and adds any non-nil file or error as given. The channel is allocated with minimal capacity (currently 1), and so will block to avoid eagerly opening a huge backlog of inputs.

If the caller intends to add an arbitrary number of inputs (e.g. from some user-given list), it should do so in a separate goroutine from the one running the command. This also means that it should at least call AddInput(nil, nil) before running the command, if not open and add the first input first.

func (*FileEnv) Close

func (fe *FileEnv) Close() error

Close flushes any open output buffer(s) and closes any open files.

func (*FileEnv) CloseInputs

func (fe *FileEnv) CloseInputs()

CloseInputs closes any input channel, allocating it if necessary first so that any future AddInput or CloseInputs call will panic.

func (*FileEnv) Default

func (fe *FileEnv) Default() Processor

Default returns the default output processor, which will write into the provided DefaultOutfile through a buffered writer.

func (*FileEnv) Inputs

func (fe *FileEnv) Inputs() <-chan Input

Inputs returns a channel that will contain any caller specified inputs. Returns the same channel every time, therefore it only makes sense to run a single command under a FileEnv.

Inputs may be specified in one of two ways:

  • if the caller calls AddInput() one or more times before the first call to Inputs(), then any such added inputs will be used
  • otherwise AddInput(DefaultInfile, nil) is called, and then CloseInputs() so that no other inputs maybe added

type Input

type Input struct {
	io.ReadCloser
	Err error
}

Input represents either a successfully acquired input stream, or a failure to acquire one under an Environment.

type Processor

type Processor interface {
	Process(buf []byte, last bool) error
}

Processor represents a piece of structure processing logic. Process gets called for each piece of matched sub-structure within some level of structure. The last flag indicates whether this is the last piece of sub-structure. After Process has been called with last=true, it may be called again to start processing the next (semantically sibling) structure to the one just ended.

If a Processor also implements io.ReaderFrom, then it can be used as a toplevel processor; without such a toplevel processor, the Environment must provide default stream extraction semantics.

type ProtoCommand

type ProtoCommand struct{ ProtoProcessor }

ProtoCommand implements Command around a ProtoProcessor; it's the simplest form of command, useful when everything is resolvable at parse time.

func (ProtoCommand) Create

func (pc ProtoCommand) Create(nc Command, env Environment) (Processor, error)

Create the next command and then pass it directly to the ProtoProcessor.

func (ProtoCommand) String

func (pc ProtoCommand) String() string

type ProtoProcessor

type ProtoProcessor interface {
	Create(next Processor) Processor
}

ProtoProcessor is a nearly constructed Processor. Useful for constructing generic Command implementations, to encapsulate some piece of processing that only needs to know the next step (doesn't need to control the creation of the next step, and doesn't need Environment access).

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL