ata

package module
v0.0.0-...-5ee21d0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 23, 2021 License: Unlicense Imports: 8 Imported by: 0

README

Action Triggering Automaton

This package was made for educational purposes, it shouldn't be used in production. It uses a DFA based regular expression engine, that does not backtrack, and is guaranteed to read the input in linear time. It's primary goal is to experiment with the concept of pattern -> action, used in some tools like AWK.

The inner workings are simple, it appends an action to the accepting state of every automaton, as the Run method goes through the automaton, these actions are activated. The automaton is built by first parsing the pattern into an Abstract Syntax Tree, then with this tree, a NFA is built using Thompson's Construction. This NFA is then converted into DFA by a variation of the Powerset Algorithm, beware that this conversion can take exponential time.

Installation

To use you can simply go get the package:

go get github.com/kazhmir/ATA

Regex Syntax

The Regex engine supports:

  • alternation: "a|b", "a|b|c"
  • concatenation: "ab"
  • zero or more: "a*"
  • empty string: "\e"
  • empty set: "", "[]"
  • grouping: "(a|b)c", "(ab)*"
  • escapes: "\t", "\n", "\|"
  • convenience groups: "\n", "\N", "\w", "\W", "\s", "\S", "\d", "\D"
  • dot: '.' ('[^\n\r]')
  • sets: "[abcdef]", "[0123456789]"
  • negated sets: "[^abc]", "[^cd]"
  • ranges: "[0-9]", "[^A-Za-z0-9_]"
  • unicode: "😉", "π|τ"

Will be added in the future:

  • 0 or 1: "a?" == "a|\e"
  • 1 or more: "a+" == "aa*"

Notes: - Whenever the pattern can match an empty string, it will also match EOF. - It matches ALL SUBMATCHES (not only the longest and smallest). This means the pattern "ab|bc" running on the string "abc" will match {"ab", "bc"}. I still didn't figure out a way to find the longest match linearly, this is the only thing impeding this package to be able to generate lexers. - Performance profile is quite different from the standard regexp package, orders of magnitude slower, but still linear.

Usage

To use the package, you have to compile the Machine. You can do this with one of the two functions:

func Build(map[string]Action) *Machine
func BuildOne(string, Action) *Machine

A Machine is an automaton, it simply holds the starting state, the pattern used to create it and the structure:

type Machine struct {
	Start *state
	Pattern string
	Syntax map[string]Action
}

The Action is simply a function of type func(*Match) (stop bool).

func (*Machine) Run(input io.RuneScanner) error
func (*Machine) Debug(input string) error

Two convenience functions are also provided, but you can write others easily, here's FindAllString

func FindAllString(pattern string, txt io.RuneReader) ([]string, error) {
	out := make([]string, 0)
	act := func(mat *Match) bool {
		out = append(out, mat.S)
		return false
	}
	m := BuildOne(pattern, act)
	err := m.Run(txt)
	return out, err
}

Further examples:

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Debug

func Debug(pattern string, inputStr string) error

Debug creates a machine that is able to step through every state of the automaton Press ENTER to procede to next steps.

func FindAllString

func FindAllString(pattern string, txt io.RuneReader) ([]string, error)

FindAllString returns all matching strings from the input.

func FindIndex

func FindIndex(pattern, input string) (start, end int, err error)

FindIndex returns the start and end index if the pattern exists in string, otherwise it returns -1 for both.

Types

type Action

type Action func(*Match) (stop bool)

Action is a function that receives a pointer to the machine Head Actions are called on accepting states, according to syntax

type Head struct {
	CurrStr string

	Start   *state
	Current *state
	// contains filtered or unexported fields
}

Head is a machine Head that holds the current state of the automaton. Multiple heads can go through the same Machine concurrently.

func (*Head) Consume

func (h *Head) Consume(r rune) (ok bool)

Consume receives a rune, and runs that rune through the machine. It returns true if the rune is valid in the current state, false if the head entered an error state.

func (*Head) Match

func (h *Head) Match() *Match

func (*Head) Reset

func (h *Head) Reset()

Reset returns the machine head to the start of the Machine. It's used when the Head finds an error or accepting state.

type Machine

type Machine struct {
	Start   *state
	Pattern string
	Syntax  map[string]Action
}

Machine is the automaton generated by the Build function It contains structural data regarding the underlying automaton

func Build

func Build(syntax map[string]Action) *Machine

Build creates a machine with many patterns and actions. The regexes are built one by one and joined through alternation '|'.

func BuildOne

func BuildOne(pattern string, act Action) *Machine

BuildOne returns a machine for a single pattern and action.

func (*Machine) Run

func (m *Machine) Run(txt io.RuneReader) error

Run creates a machine Head and runs the string through the automaton.

func (*Machine) RunStr

func (m *Machine) RunStr(str string) error

Run creates a machine Head and runs the string through the automaton.

type Match

type Match struct {
	S          string
	Start, End int
	// contains filtered or unexported fields
}

func (*Match) String

func (m *Match) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL