evalfilter

package module
v2.1.19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2022 License: GPL-2.0 Imports: 13 Imported by: 2

README

GoDoc Go Report Card license

eval-filter

The evalfilter package provides an embeddable evaluation-engine, which allows simple logic which might otherwise be hardwired into your golang application to be delegated to (user-written) script(s).

There is no shortage of embeddable languages which are available to the golang world, this library is intended to be something that is:

  • Simple to embed.
  • Simple to use, as there are only three methods you need to call:
  • Simple to understand.
  • As fast as it can be, without being too magical.

The scripting language is C-like, and is generally intended to allow you to filter objects, which means you might call the same script upon multiple objects, and the script will return either true or false as appropriate to denote whether some action might be taken by your application against that particular object.

It certainly is possible for you to handle arbitrary return-values from the script(s) you execute, and indeed the script itself could call back into your application to carry out tasks, via the addition of new primitives implemented and exported by your host application, which would make the return value almost irrelevant.

If you go down that route then this repository contains a general-purpose scripting-language, which can be used to execute user-supplied scripts.

My Google GMail message labeller uses the evalfilter in such a standalone manner, executing a script for each new/unread email by default. The script can then add labels to messages based upon their sender/recipients/subjects. etc. The notion of filtering there doesn't make sense, it just wants to execute flexible operations on messages.

However the ideal use-case, for which this was designed, is that your application receives objects of some kind, perhaps as a result of incoming webhook submissions, network events, or similar, and you wish to decide how to handle those objects in a flexible fashion.

Implementation

In terms of implementation the script to be executed is split into tokens by the lexer, then those tokens are parsed into an abstract-syntax-tree. Once the AST exists it is walked by the compiler and a series of bytecode instructions are generated.

Once the bytecode has been generated it can be executed multiple times, there is no state which needs to be maintained, which makes actually executing the script (i.e. running the bytecode) a fast process.

At execution-time the bytecode which was generated is interpreted by a naive stack-based virtual machine, with some runtime support to provide the built-in functions, as well as supporting the addition of host-specific functions.

The bytecode itself is documented briefly in BYTECODE.md, but it is not something you should need to understand to use the library, only if you're interested in debugging a misbehaving script.

Scripting Facilities

Types

The scripting-language this package presents supports the basic types you'd expect:

  • Arrays.
  • Floating-point numbers.
  • Hashes.
  • Integers.
  • Regular expressions.
  • Strings.
  • Time / Date values.
    • i.e. We can use reflection to handle time.Time values in any structure/map we're operating upon.

The types are supported both in the language itself, and in the reflection-layer which is used to allow the script access to fields in the Golang object/map you supply to it.

Built-In Functions

These are the built-in functions which are always available, though your users can write their own functions within the language (see functions).

You can also easily add new primitives to the engine, by defining a function in your golang application and exporting it to the scripting-environment. For example the print function to generate output from your script is just a simple function implemented in Golang and exported to the environment. (This is true of all the built-in functions, which are registered by default.)

  • between(value, min, max);
    • Return true if the specified value is between the specified range (inclusive, so between(1, 1, 10); will return true.)
  • float(value)
    • Tries to convert the value to a floating-point number, returns Null on failure.
    • e.g. float("3.13").
  • getenv(value)
    • Return the value of the named environmental variable, or "" if not found.
  • int(value)
    • Tries to convert the value to an integer, returns Null on failure.
    • e.g. int("3").
  • join(array,deliminator)
    • Return a string consisting of the array elements joined by the given string.
  • keys
    • Returns the available keys in the specified hash, in sorted order.
  • len(field | value)
    • Returns the length of the given value, or the contents of the given field.
    • For arrays it returns the number of elements, as you'd expect.
  • lower(field | value)
    • Return the lower-case version of the given input.
  • max(a, b)
    • Return the larger number of the two parameters.
  • min(a, b)
    • Return the smaller number of the two parameters.
  • panic() / panic("Your message here");
    • These will deliberately stop execution, and return a message to the caller.
  • print(field|value [, fieldN|valueN] )
    • Print the given values.
  • printf("Format string ..", arg1, arg2 .. argN);
    • Print the given values, with the specified golang format string
      • For example printf("%s %d %t\n", "Steve", 9 / 3 , ! false );
  • replace(input, /regexp/, value)
    • Perform a replacement with value of the matches of the given regexp in the input-value.
  • reverse(["Surname", "Forename"]);
    • Sorts the given array in reverse.
    • Add true as the second argument to ignore case.
  • sort(["Surname", "Forename"]);
    • Sorts the given array.
    • Add true as the second argument to ignore case.
  • split("string", "value");
    • Splits a string into an array, by the given substring.
  • sprintf("Format string ..", arg1, arg2 .. argN);
    • Format the given values, using the specified golang format string.
  • string( )
    • Converts a value to a string. e.g. "string(3/3.4)".
  • trim(field | string)
    • Returns the given string, or the contents of the given field, with leading/trailing whitespace removed.
  • type(field | value)
    • Returns the type of the given field, as a string.
      • For example string, integer, float, array, boolean, or null.
  • upper(field | value)
    • Return the upper-case version of the given input.
  • hour(field|value), minute(field|value), seconds(field|value)
    • Allow converting a time to HH:MM:SS.
  • day(field|value), month(field|value), year(field|value)
    • Allow converting a time to DD/MM/YYYY.
  • weekday(field|value)
    • Allow converting a time to "Saturday", "Sunday", etc.
  • now() & time() both return the current time.
Conditionals

As you'd expect the facilities are pretty normal/expected:

  • Perform comparisons of strings and numbers:
    • equality:
      • "if ( Message == "test" ) { return true; }"
    • inequality:
      • "if ( Count != 3 ) { return true; }"
    • size (<, <=, >, >=):
      • "if ( Count >= 10 ) { return false; }"
      • "if ( Hour >= 8 && Hour <= 17 ) { return false; }"
    • String matching against a regular expression:
      • "if ( Content ~= /needle/ )"
      • "if ( Content ~= /needle/i )"
        • With case insensitivity
    • Does not match a regular expression:
      • "if ( Content !~ /some text we don't want/ )"
    • Test if an array contains a value:
      • "return ( Name in [ "Alice", "Bob", "Chris" ] );"
  • Ternary expressions are also supported - but nesting them is a syntax error!
    • "a = Title ? Title : Subject;"
    • "return( result == 3 ? "Three" : "Four!" );"
Loops

Our script implements a golang-style loop, using either for or while as the keyword:

count = 0;
while ( count < 10 ) {
     print( "Count: ", count, "\n" );
     count++;
}

You could use either statement to iterate over an array contents, but that would be a little inefficient:

items = [ "Some", "Content", "Here" ];
i = 0;
for ( i < len(items) ) {
   print( items[i], "\n" );
   i++
}

A more efficient and readable approach is to iterate over arrays, and the characters inside a string, via foreach. You can receive both the index and the item at each step of the iteration like so:

foreach index, item in [ "My", "name", "is", "Steve" ] {
    printf( "%d: %s\n", index, item );
}

If you don't supply an index you'll receive just the item being iterated over instead, as you would expect (i.e. we don't default to returning the index, but the value in this case):

len = 0;
foreach char in "狐犬" {
    len++;
}
return( len == 2 );

The same kind of iteration works over hashes too (the single-argument version of the foreach loop iterates over values, rather than keys. Hash keys are available via keys so that seems like a more useful thing to return):

foreach key,value in { "Name": "Steve", "Location": "Finland" } {
  printf("Key %s has value %s\n", key, value );
}

The final helper is the ability to create arrays of integers via the .. primitive:

sum = 0;
foreach item in 1..10 {
    sum += item;
}
print( "Sum is ", sum, "\n" );

Here you note that len++ and sum += item; work as you'd expect. There is support for +=, -=, *=, and /=. The ++ and -- postfix operators are both available (for integers and floating-point numbers).

Functions

You can declare functions, for example:

function sum( input ) {
   local result;
   result = 0;
   foreach item in input {
     result = result + item;
   }
   return result;
}

printf("Sum is %d\n", sum(1..10));
return false;

See _examples/scripts/scope.in for another brief example, and discussion of scopes.

Case / Switch

We support the use of switch and case to simplify the handling of some control-flow. An example would look like this:

switch( Subject ) {
  case /^Re:/i {
     printf("Reply\n");
  }
  case /^fwd:/i {
     printf("Forwarded message\n");
  }
  case "DEAR" + "  " + WINNER" {
     printf("SPAM\n");
  }
  case "YOU HAVE WON" {
     printf("SPAM\n");
  }
  default {
     printf("New message!\n");
  }
}

Note that the case expression supports the following, as demonstrated in our switch example:

  • Expression matches.
  • Literal matches.
  • Regular expression matches.

To avoid fall-through-related bugs we've explicitly designed the case-statements to take blocks as arguments, rather than statements.

NOTE: Only the first matching case-statement will execute. In the following example only one message will be output:

count = 1;

switch( count ) {
  case 1 {
     printf("Match!\n");
  }
  case 1 {
     printf("This is not a match - the previous case statement won!\n");
  }
}

Use Cases

The motivation for this project came from a problem encountered while working:

  • I wanted to implement a simple "on-call notifier".
    • When messages were posted to Slack channels I wanted to sometimes trigger a phone-call to the on-call engineer.
    • Of course not all Slack-messages were worth waking up an engineer for.

The expectation was that non-developers might want to change the matching of messages to update the messages which were deemed worthy of waking up the on-call engineer. They shouldn't need to worry about rebuilding the on-call application, nor should they need to understand Go. So the logic was moved into a script and this evaluation engine was born.

Each time a Slack message was received it would be placed into a simple structure:

type Message struct {
    Author  string
    Channel string
    Message string
    Sent    time.Time
}

Then a simple script could then be executed against that object to decide whether to initiate a phone-call:

//
// You can see that comments are prefixed with "//".
//
// In my application a phone-call would be triggered if this
// script hit `return true;`.  If the return value was `false`
// then nothing would happen.
//

//
// If this is within office hours we'll assume somebody is around to
// handle the issue, so there is no need to raise a call.
//
if ( hour(Sent) >= 9 || hour(Sent) <= 17 ) {

    // 09AM - 5PM == Working day.  No need to notify anybody.

    // Unless it is a weekend, of course!
    if ( weekday(Sent) != "Saturday" && weekday(Sent) != "Sunday" ) {
       return false;
    }
}

//
// A service crashed with a panic?
//
// If so raise the engineer.
//
if ( Message ~=  /panic/i ) { return true; }


//
// At this point we decide the message is not important, so
// we ignore it.
//
// In a real-life implementation we'd probably work the other
// way round.  Default to triggering the call unless we knew
// it was a low-priority/irrelevant message.
//
return false;

You'll notice that we test fields such as Sent and Message here which come from the object we were given. That works due to the magic of reflection. Similarly we called a number of built-in functions related to time/date. These functions understand the golang time.Time type, from which the Sent value was read via reflection.

(All time.Time values are converted to seconds-past the Unix Epoch, but you can retrieve all the appropriate fields via hour(), minute(), day(), year(), weekday(), etc, as you would expect. Using them literally will return the Epoch value.)

Security

The user-supplied script is parsed and turned into a set of bytecode-instructions which are then executed. The bytecode instruction set is pretty minimal, and specifically has zero access to:

  • Your filesystem.
    • i.e. Reading files is not possible, neither is writing them.
  • The network.
    • i.e. Making outgoing network requests is not possible.

Of course you can export functions from your host-application to the scripting environment, to allow such things. If you do add primitives that have the possibility to cause security problems then the onus is definitely on you to make sure such accesses are either heavily audited or restricted appropriately.

Denial of Service

When it comes to security problems the most obvious issue we might suffer from is denial-of-service attacks; it is certainly possible for this library to be given faulty programs, for example invalid syntax, or references to undefined functions. Failures such as those would be detected at parse/run time, as appropriate.

In short running user-supplied scripts should be safe, but there is one obvious exception, the following program is valid:

print( "Hello, I'm wasting your time\n") ;

while( 1 ) {
  // Do nothing ..
}

print( "I'm never reached!\n" );

This program will never terminate! If you're handling untrusted user-scripts, you'll want to ensure that you explicitly setup a timeout period.

The following will do what you expect:

// Create the evaluator on the (malicious) script
eval := evalfilter.New(`while( 1 ) { } `)

// Setup a timeout period of five seconds
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
eval.SetContext(ctx)

// Now prepare as usual
err = eval.Prepare()
if ( err != nil ) { // handle error }

// Now execute as usual
ret, err = eval.Execute( object )
if ( err != nil ) { // handle error }

The program will be terminated with an error after five seconds, which means that your host application will continue to run rather than being blocked forever!

Misc.

You can find syntax-highlighters for evalfilter code beneath misc/.

  • A javascript syntax-highlighter for embedding scripts in web-pages.
  • An Emacs mode for working with evalfilter scripts.

Sample Usage

To give you a quick feel for how things look you could consult the following simple examples:

  • example_test.go.
    • This filters a list of people by their age.
  • example_function_test.go.
    • This exports a function from the golang-host application to the scripting environment.
      • This is a demonstration of how you'd provide extra functionality when embedding the engine in your own application.
    • The new function is then used to filter a list of people.
  • example_user_defined_function_test.go
    • Writing a function within the scripting-environment, and then calling it.
  • _examples/embedded/variable/
    • Shows how to pass a variable back and forth between your host application and the scripting environment

Additional Examples

Additional examples of using the library to embed scripting support into simple host applications are available beneath the _examples/embedded directory.

There are also sample scripts contained beneath _examples/scripts which you can examine.

The standalone driver located beneath cmd/evalfilter allows you to examine bytecode, tokens, and run the example scripts, as documented later in this README file.

Finally if you want to compile this library to webassembly, and use it in a web-page that is also possible! See wasm/ for details.

Standalone Use

If you wish to experiment with script-syntax, after looking at the example scripts you can install the standalone driver:

go install github.com/skx/evalfilter/v2/cmd/evalfilter@latest

This driver, contained within the repository at cmd/evalfilter has a number of sub-commands to allow you to experiment with the scripting environment:

  • Output a disassembly of the bytecode instructions the compiler generated when preparing your script.
  • Run a script.
    • Optionally with a JSON object as input.
  • View the lexer and parser outputs.

Help is available by running evalfilter help, and the sub-commands are documented thoroughly, along with sample output.

TAB-completion is supported if you're running bash, execute the following to enable it:

$ source <(evalfilter bash-completion)

Benchmarking

The scripting language should be fast enough for most purposes; it will certainly cope well with running simple scripts for every incoming HTTP-request, for example. If you wish to test the speed there are some local benchmarks available.

You can run the benchmarks as follows:

go test -test.bench=evalfilter_ -benchtime=10s -run=^t
goos: linux
goarch: amd64
pkg: github.com/skx/evalfilter/v2
Benchmark_evalfilter_complex_map-4   	 4426123	      2721 ns/op
Benchmark_evalfilter_complex_obj-4   	 7657472	      1561 ns/op
Benchmark_evalfilter_simple-4        	15309301	       818 ns/op
Benchmark_evalfilter_trivial-4       	100000000	       105 ns/op
PASS
ok  	github.com/skx/evalfilter/v2	52.258s

The examples there are not particularly representative, but they will give you an idea of the general speed. In the real world the speed of the evaluation engine is unlikely to be a significant bottleneck.

One interesting thing that shows up clearly is that working with a struct is significantly faster than working with a map. I can only assume that the reflection overhead is shorter there, but I don't know why.

Fuzz Testing

Fuzz-testing is basically magic - you run your program with random input, which stress-tests it and frequently exposes corner-cases you've not considered.

This project has been fuzz-tested repeatedly, and FUZZING.md contains notes on how you can carry out testing of your own with the integrated fuzz-testing available with the 1.18+ version of the golang release.

API Stability

The API will remain as-is for given major release number, so far we've had we've had two major releases:

  • 1.x.x
    • The initial implementation which parsed script into an AST then walked it.
  • 2.x.x
    • The updated design which parses the given script into an AST, then generates bytecode to execute when the script is actually run.

The second release was implemented to perform a significant speedup for the case where the same script might be reused multiple times.

See Also

This repository was put together after experimenting with a scripting language, and writing a BASIC interpreter along with a FORTH interpreter.

I've also played around with a couple of compilers which might be interesting to refer to:

Github Setup

This repository is configured to run tests upon every commit, and when pull-requests are created/updated. The testing is carried out via .github/run-tests.sh which is used by the github-action-tester action.

Steve

Documentation

Overview

Package evalfilter allows running a user-supplied script against an object.

We're constructed with a program, and internally we parse that to an abstract syntax-tree, then we walk that tree to generate a series of bytecodes.

The bytecode is then executed via the VM-package.

Example

Example is a function which will filter a list of people, to return only those members who are above a particular age, via the use of a simple script.

//
// This is the structure our script will operate upon.
//
type Person struct {
	Name string
	Age  int
}

//
// Here is a list of people.
//
people := []Person{
	{"Bob", 31},
	{"John", 42},
	{"Michael", 17},
	{"Jenny", 26},
}

//
// We'll run this script against each entry in the list
//
script := `

// Example filter - we only care about people over 30.
if ( Age > 30 ) { return true ; }

// Since we return false the caller will know to ignore people here.
return false;
`

//
// Create the evaluator
//
eval := New(script)

//
// Prepare the evaluator.
//
err := eval.Prepare()
if err != nil {
	fmt.Printf("Failed to compile the code:%s\n", err.Error())
	return
}

//
// Process each person.
//
for _, entry := range people {

	//
	// Call the filter
	//
	res, err := eval.Run(entry)

	//
	// Error-detection is important (!)
	//
	if err != nil {
		panic(err)
	}

	//
	// We only care about the people for whom the filter
	// returned `true`.
	//
	if res {
		fmt.Printf("%v\n", entry)
	}
}
Output:

{Bob 31}
{John 42}

Index

Examples

Constants

View Source
const (
	// Don't run the optimizer when generating bytecode.
	NoOptimize byte = iota
)

Flags which can be optionally passed to Prepare.

Variables

This section is empty.

Functions

This section is empty.

Types

type Eval

type Eval struct {
	// Script holds the script the user submitted in our constructor.
	Script string
	// contains filtered or unexported fields
}

Eval is our public-facing structure which stores our state.

func New

func New(script string) *Eval

New creates a new instance of the evaluator.

func (*Eval) AddFunction

func (e *Eval) AddFunction(name string, fun interface{})

AddFunction exposes a golang function from your host application to the scripting environment.

Once a function has been added it may be used by the filter script.

func (*Eval) Dump

func (e *Eval) Dump() error

Dump causes our bytecode to be dumped, along with the contents of the constant-pool

func (*Eval) Execute added in v2.1.9

func (e *Eval) Execute(obj interface{}) (out object.Object, error error)

Execute executes the program which the user passed in the constructor, and returns the object that the script finished with.

This function is very similar to the `Run` method, however the Run method only returns a binary/boolean result, and this method returns the actual object your script returned with.

Use of this method allows you to receive the `3` that a script such as `return 1 + 2;` would return.

func (*Eval) GetVariable

func (e *Eval) GetVariable(name string) object.Object

GetVariable retrieves the contents of a variable which has been set within a user-script.

If the variable hasn't been set then the null-value will be returned.

func (*Eval) Prepare

func (e *Eval) Prepare(flags ...[]byte) error

Prepare is the second function the caller must invoke, it compiles the user-supplied program to its final-form.

Internally this compilation process walks through the usual steps, lexing, parsing, and bytecode-compilation.

func (*Eval) Run

func (e *Eval) Run(obj interface{}) (bool, error)

Run executes the program which the user passed in the constructor.

The return value, assuming no error, is a binary/boolean result which suits the use of this package as a filter.

If you wish to return the actual value the script returned then you can use the `Execute` method instead. That doesn't attempt to determine whether the result of the script was "true" or not.

func (*Eval) SetContext added in v2.1.12

func (e *Eval) SetContext(ctx context.Context)

SetContext allows a context to be passed to the evaluator.

The context is passed down to the virtual machine, which allows you to setup a timeout/deadline for the execution of user-supplied scripts.

func (*Eval) SetVariable

func (e *Eval) SetVariable(name string, value object.Object)

SetVariable adds, or updates a variable which will be available to the filter script.

Directories

Path Synopsis
_examples
Package ast contains the structures we use for generating an AST of a program.
Package ast contains the structures we use for generating an AST of a program.
cmd
Package code contains definitions of the bytecode instructions our compiler emits, and our virtual machine executes.
Package code contains definitions of the bytecode instructions our compiler emits, and our virtual machine executes.
Package environment holds the state for our objects.
Package environment holds the state for our objects.
Package lexer contains our lexer.
Package lexer contains our lexer.
Package object contains the golang-implementation of our evalfilter object-types.
Package object contains the golang-implementation of our evalfilter object-types.
Package parser consumes tokens from the lexer and returns a program as a set of AST-nodes.
Package parser consumes tokens from the lexer and returns a program as a set of AST-nodes.
Package stack implements a stack which is used for our virtual machine.
Package stack implements a stack which is used for our virtual machine.
Package token contains identifiers for the various things we find in our source-scripts.
Package token contains identifiers for the various things we find in our source-scripts.
Package vm implements a simple stack-based virtual machine.
Package vm implements a simple stack-based virtual machine.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL