fastmatch

package module
v0.0.0-...-6bd661e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2017 License: BSD-3-Clause Imports: 7 Imported by: 0

README

pifke.org/fastmatch

GoDoc Build Status Test Coverage

Golang code generation tool for quickly comparing an input string to a set of possible matches, which are known at compile time.

A typical use of this would be generating a "reverse enum", such as in a parser which needs to compare a string to a list of keywords and return the corresponding lexer symbol.

For more information, see the documentation.

Downloading

If you use this library in your own code, please use the canonical URL in your Go code, instead of Github:

go get pifke.org/fastmatch

Or (until I finish setting up the self-hosted repository):

# From the root of your project:
git submodule add https://github.com/dpifke/golang-fastmatch vendor/pifke.org/fastmatch

Then:

import (
        "pifke.org/fastmatch"
)

As opposed to the pifke.org URL, I make no guarantee this Github repository will exist or be up-to-date in the future.

Documentation

Available on godoc.org.

License

Three-clause BSD. See LICENSE.txt.

Contact me if you want to use this code under different terms.

Author

Dave Pifke. My email address is my first name "at" my last name "dot org."

I'm @dpifke on Twitter. My PGP key is available on Keybase.

Documentation

Overview

Package fastmatch provides a code generation tool for quickly comparing an input string to a set of possible matches which are known at compile time.

A typical use of this would be a "reverse enum", such as in a parser which needs to compare a string to a list of keywords and return the corresponding lexer symbol.

Normally, the easiest way to do this would be with a switch statement, such as:

switch (input) {
case "foo":
	return foo
case "bar":
	return bar
case "baz":
	return baz
}

The compiled code for the above will compare the input to each string in sequence. If input doesn't match "foo", we try to match "bar", then "baz". The matching process starts anew for each case. If we have lots of possible matches, this can be a lot of wasted effort.

Another option would be to use a map, on the (probably valid) assumption that Go's map lookups are faster than executing a bunch of string comparisons in sequence:

match := map[string]int{
	"foo": foo,
	"bar": bar,
	"baz": baz,
}
return match[input]

The compiled code for the above will recreate the map at runtime. We thus have to hash each possible match every time the map is initialized, allocate memory, garbage collect it, etc. More wasted effort.

And this is all not to mention the potential complications related to case-insensitive matching, partial matches (e.g. strings.HasPrefix and strings.HasSuffix), Unicode normalization, or situations where we want to treat a class of characters (such as all numeric digits) as equivalent for matching purposes. You could use a regular expression, but now you'd have two problems, as the jwz quote goes.

The code generated by this package is theoretically more efficient than the preceding approaches. It supports partial matches, and can treat groups of characters (e.g. 'a' and 'A') as equivalent.

Under the hood, it works by partitioning the search space by the length of the input string, then updating a state machine based on each rune in the input. If the character at a given position in the input doesn't correspond to any possible match, we bail early. Otherwise, the final state is compared against possible matches using a final switch statement.

Is the code output by this package faster enough to matter? Maybe, maybe not. This is a straight port of a C code generation tool I've used on a couple of projects. In C, the difference was significant, due to strcmp() or strcasecmp() function call overhead, and GCC's ability to convert long switch statements into jump tables or binary searches.

Go (as of 1.7) doesn't yet do any optimization of switch statements. See https://github.com/golang/go/issues/5496 and https://github.com/golang/go/issues/15780. Thus, you may actually be worse off in the short-term for using this method instead of a map lookup. (Certainly in terms of code size.) But as the compiler improves, this code will become more relevant. I've played with having this package output assembler code, but it seems like the effort would be better spent improving the compiler instead.

Index

Constants

This section is empty.

Variables

View Source
var Alphanumeric = Range('0', '9', 'a', 'z', 'A', 'Z')

Alphanumeric is a predefined Range covering ASCII numeric digits and upper- and lower-case letters.

View Source
var HasPrefix = new(Flag)

HasPrefix is a flag, which can be passed to Generate, to specify that runes proceeding a match should be ignored.

Matching stops as soon as a match is found, thus "foo" and "f" are ambiguous cases when HasPrefix is specified. Generate returns an error if ambiguity is detected.

View Source
var HasSuffix = new(Flag)

HasSuffix is a flag, which can be passed to Generate, to match the end of the input string, in the same manner HasPrefix performs a match of the beginning of the string.

View Source
var Insensitive = new(Flag)

Insensitive is a flag, which can be passed to Generate, to specify that matching should be case-insensitive.

View Source
var Letters = Range('a', 'z', 'A', 'Z')

Letters is a predefined Range covering upper- and lower-case ASCII letters.

View Source
var Lowercase = Range('a', 'z')

Lowercase is a predefined Range covering lower-case ASCII letters.

View Source
var Normalize = new(Flag)

Normalize is a flag, which can be passed to Generate, to specify that matching should be done without regard to diacritics, accents, etc.

This is currently just a placeholder, and has no effect yet on the generated code.

View Source
var Numbers = Range('0', '9')

Numbers is a predefined Range covering the ASCII digits from 0 through 9.

View Source
var Uppercase = Range('A', 'Z')

Uppercase is a predefined Range covering upper-case ASCII letters.

Functions

func Generate

func Generate(w io.Writer, origCases map[string]string, none string, flags ...*Flag) error

Generate outputs Go code to compare a string to a set of possible matches which are known at compile-time.

Each entry in the map consists of a possible match as the key, and the corresponding expression to return as the value. none is the expression to return if no match is found.

Code to perform the match is written to the supplied io.Writer. Before calling this function, the caller is expected to write the method signature and any input pre-processing logic. The string to examine should be in a variable named "input".

If flags are specified, it's possible to generate ambiguous code, in which the same input string will match multiple entries in the cases map, with different return values. This function attempts to detect this and will return an error if ambiguity is detected.

The output is not buffered, and will be incomplete if an error is returned. If the caller cares about this, they should have a way to discard the written output on error. Errors writing to the supplied io.Writer will be passed back to the caller.

Example usage:

fmt.Fprintln(w, "func matchFoo(input string) int {")
fastmatch.Generate(w, map[string]string{
	"foo": "1",
	"bar": "2",
	"baz": "3",
}, "-1", fastmatch.Insensitive)

func GenerateReverse

func GenerateReverse(w io.Writer, cases map[string]string, none string, _ ...*Flag) error

GenerateReverse outputs Go code that returns the string value for a given match. The result from the generated function will be the reverse of that from a function generated with Generate.

If the supplied io.Writer is not valid, or if more than one string maps to the same value, an error is returned.

This function accepts flags (in order to match Generate's function signature), but they are currently ignored.

func GenerateTest

func GenerateTest(w io.Writer, fn, reverseFn string, cases map[string]string, _ ...*Flag) error

GenerateTest outputs a simple unit test which exercises the generated code.

An error is returned if the supplied io.Writer is not valid. As with Generate and GenerateReverse, the caller is expected to write the method signature (with a *testing.T argument named t) before calling this function.

fn and reverseFn should be the fmt.Printf-style format strings accepting a single argument, which will be replaced with the test input for the generated function. This is typically something like "Function(%q)" for the matcher and "%s.String()" for the reverse matcher. Passing "" causes the respective function to not be tested.

Flags should match what was passed to Generate, but are currently ignored. Future versions of this routine may output more sophisticated tests which take flags into account.

func Range

func Range(args ...rune) []rune

Range accepts zero or more pairs of runes, and returns a slice covering all runes between the even and odd arguments, inclusive. It can be used with flags which take a list of runes as arguments, such as Equivalent, StopUpon, Ignore, or IgnoreExcept.

For example:

fastmatch.Generate(w, cases, "nil",
	fastmatch.IgnoreExcept(Range('0', '9', 'a', 'z', 'A', 'Z')...))

If passed an odd number of arguments, this function will panic.

See also the predefined ranges: Numbers, Lowercase, Uppercase, Letters, and Alphanumeric.

Types

type ErrAmbiguous

type ErrAmbiguous struct {
	// contains filtered or unexported fields
}

ErrAmbiguous is returned when Generate or GenerateReverse is passed ambiguous matches: cases where it's possible for the same input string to match different return values.

func (*ErrAmbiguous) Error

func (e *ErrAmbiguous) Error() string

type ErrBadFlags

type ErrBadFlags struct {
	// contains filtered or unexported fields
}

ErrBadFlags is returned when nonsensical flags are passed to Generate.

func (*ErrBadFlags) Error

func (e *ErrBadFlags) Error() string

type Flag

type Flag struct {
	// contains filtered or unexported fields
}

Flag can be passed to Generate and GenerateReverse to modify the functions' behavior. Users of this package should not instantiate their own Flags. Rather, they should use one of HasPrefix, HasSuffix, Insensitive, Normalize, or the return value from Equivalent(), StopUpon(), Ignore(), or IgnoreExcept(). Unknown Flags are silently discarded.

func Equivalent

func Equivalent(runes ...rune) *Flag

Equivalent is a flag, which can be passed to Generate, to specify runes that should be treated identically when matching.

func Ignore

func Ignore(runes ...rune) *Flag

Ignore is a flag, which can be passed to Generate, to specify runes (including equivalents) which should be ignored for matching purposes.

func IgnoreExcept

func IgnoreExcept(runes ...rune) *Flag

IgnoreExcept is a flag, which can be passed to Generate, to specify which runes (including equivalents) should be examined when matching. This is similar to Ignore, except that when this flag is present, any runes not listed will be ignored.

Ignore and IgnoreExcept may not be combined.

func StopUpon

func StopUpon(runes ...rune) *Flag

StopUpon is a flag, which can be passed to Generate, to specify a set of runes (including equivalents) which get treated like a string boundary, i.e. cause matching to immediately cease.

In normal usage, this results in matches which must be immediately followed by either end-of-string or the stop character. This is similar to HasPrefix, except that it allows for matches that would be ambiguous if we stopped upon first match. Consider a matcher for URI schemes (RFC 7595), where we want to match either the bare scheme name or the beginning of a URL. We might generate a matcher as follows:

fmt.Fprintln(w, "func matchScheme(input string) Scheme {")
fastmatch.Generate(w, map[string]string{
	"http": "HTTP",
	"https": "HTTPS",
}, "nil", fastmatch.Insensitive, fastmatch.StopUpon(':'))

With the above, matchScheme("http") and matchScheme("http://example.com") both return HTTP, and matchScheme("https") and matchScheme("HTTPS://example.com") both return HTTPS. matchScheme("https+xml://example.com") would return nil.

When StopUpon is combined with HasSuffix, the stop character is treated as the beginning of the string. (This is obvious if one considers we match from right-to-left when HasSuffix is specified.) An example filename extension matcher could be generated as follows:

fmt.Fprintln(w, "func matchExt(input string) Extension {")
fastmatch.Generate(w, map[string]string{
	"exe": "EXE",
	"dll": "DLL",
}, "nil", fastmatch.StopUpon('.'), fastmatch.HasSuffix)

matchExt("foo.exe") and matchExt("exe") both return EXE, and matchExt("bar.dll") and matchExt("dll") both return DLL.

Runes from StopUpon may not also appear in Ignore. If IgnoreExcept is specified, runes from StopUpon will be treated as stop runes regardless of whether or not they appear in IgnoreExcept.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL