vfilter

package module
v0.0.0-...-f7fa24c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 17, 2024 License: Apache-2.0 Imports: 21 Imported by: 93

README

The veloci-filter (vfilter) library implements a generic SQL like query language.

Overview::

There are many applications in which it is useful to provide a flexible query language for the end user. Velocifilter has the following design goals:

  • It should be generic and easily adaptable to be used by any project.
  • It should be fast and efficient.

An example makes the use case very clear. Suppose you are writing an archiving application. Most archiving tools require a list of files to be archived (e.g. on the command line).

You launch your tool and a user requests a new flag that allows them to specify the files using a glob expression. For example, a user might wish to only select the files ending with the ".go" extension. While on a unix system one might use shell expansion to support this, on other operating systems shell expansion may not work (e.g. on windows).

You then add the ability to specify a glob expression directly to your tool (suppose you add the flag --glob). A short while later, a user requires filtering the files to archive by their size - suppose they want to only archive a file smaller than a certain size. You studiously add another set of flags (e.g. --size with a special syntax for greater than or less than semantics).

Now a user wishes to be able to combine these conditions logically (e.g. all files with ".go" extension newer than 5 days and smaller than 5kb).

Clearly this approach is limited, if we wanted to support every possible use case, our tool would add many flags with a complex syntax making it harder for our users. One approach is to simply rely on the unix "find" tool (with its many obscure flags) to support the file selection problem. This is not ideal either since the find tool may not be present on the system (E.g. on Windows) or may have varying syntax. It may also not support every possible condition the user may have in mind (e.g. files containing a RegExp or files not present in the archive).

There has to be a better way. You wish to provide your users with a powerful and flexible way to specify which files to archive, but we do not want to write complicated logic and make our tool more complex to use.

This is where velocifilter comes in. By using the library we can provide a single flag where the user may specify a flexible VQL query (Velocidex Query Language - a simplified SQL dialect) allowing the user to specify arbirarily complex filter expressions. For example:

SELECT file from glob(pattern=["*.go", "*.py"]) where file.Size < 5000
and file.Mtime < now() - "5 days"

Not only does VQL allow for complex logical operators, but it is also efficient and optimized automatically. For example, consider the following query:

SELECT file from glob(pattern="*") where grep(file=file,
pattern="foobar") and file.Size < 5k

The grep() function will open the file and search it for the pattern. If the file is large, this might take a long time. However velocifilter will automatically abort the grep() function if the file size is larger than 5k bytes. Velocifilter correctly handles such cancellations automatically in order to reduce query evaluation latency.

Protocols - supporting custom types::

Velocifilter uses a plugin system to allow clients to define how their own custom types behave within the VQL evaluator.

Note that this is necessary because Go does not allow an external package to add an interface to an existing type without creating a new type which embeds it. Clients who need to handle the original third party types must have a way to attach new protocols to existing types defined outside their own codebase. Velocifilter achieves this by implementing a registration systen in the Scope{} object.

For example, consider a client of the library wishing to pass custom types in queries:

    type Foo struct {
       ...
       bar Bar
    }  Where both Foo and Bar are defined and produced by some other library

which our client uses. Suppose our client wishes to allow addition of Foo objects. We would therefore need to implement the AddProtocol interface on Foo structs. Since Foo structs are defined externally we can not simply add a new method to Foo struct (we could embed Foo struct in a new struct, but then we would also need to wrap the bar field to produce an extended Bar. This is typically impractical and not maintainable for heavily nested complex structs). We define a FooAdder{} object which implements the Addition protocol on behalf of the Foo object.

          // This is an object which implements addition between two Foo objects.
          type FooAdder struct{}

          // This method will be run to see if this implementation is
          // applicable. We only want to run when we add two Foo objects together.
          func (self FooAdder) Applicable(a Any, b Any) bool {
                _, a_ok := a.(Foo)
                _, b_ok := b.(Foo)
                return a_ok && b_ok
          }

          // Actually implement the addition between two Foo objects.
          func (self FooAdder) Add(scope *Scope, a Any, b Any) Any {
            ... return new object (does not have to be Foo{}).
          }

Now clients can add this protocol to the scope before evaluating a query:

    scope := NewScope().AddProtocolImpl(FooAdder{})

Documentation

Overview

The veloci-filter (vfilter) library implements a generic SQL like query language.

Overview::

There are many applications in which it is useful to provide a flexible query language for the end user. Velocifilter has the following design goals:

- It should be generic and easily adaptable to be used by any project.

- It should be fast and efficient.

An example makes the use case very clear. Suppose you are writing an archiving application. Most archiving tools require a list of files to be archived (e.g. on the command line).

You launch your tool and a user requests a new flag that allows them to specify the files using a glob expression. For example, a user might wish to only select the files ending with the ".go" extension. While on a unix system one might use shell expansion to support this, on other operating systems shell expansion may not work (e.g. on windows).

You then add the ability to specify a glob expression directly to your tool (suppose you add the flag --glob). A short while later, a user requires filtering the files to archive by their size - suppose they want to only archive a file smaller than a certain size. You studiously add another set of flags (e.g. --size with a special syntax for greater than or less than semantics).

Now a user wishes to be able to combine these conditions logically (e.g. all files with ".go" extension newer than 5 days and smaller than 5kb).

Clearly this approach is limited, if we wanted to support every possible use case, our tool would add many flags with a complex syntax making it harder for our users. One approach is to simply rely on the unix "find" tool (with its many obscure flags) to support the file selection problem. This is not ideal either since the find tool may not be present on the system (E.g. on Windows) or may have varying syntax. It may also not support every possible condition the user may have in mind (e.g. files containing a RegExp or files not present in the archive).

There has to be a better way. You wish to provide your users with a powerful and flexible way to specify which files to archive, but we do not want to write complicated logic and make our tool more complex to use.

This is where velocifilter comes in. By using the library we can provide a single flag where the user may specify a flexible VQL query (Velocidex Query Language - a simplified SQL dialect) allowing the user to specify arbirarily complex filter expressions. For example:

SELECT file from glob(pattern=["*.go", "*.py"]) where file.Size < 5000 and file.Mtime < now() - "5 days"

Not only does VQL allow for complex logical operators, but it is also efficient and optimized automatically. For example, consider the following query:

SELECT file from glob(pattern="*") where grep(file=file, pattern="foobar") and file.Size < 5k

The grep() function will open the file and search it for the pattern. If the file is large, this might take a long time. However velocifilter will automatically abort the grep() function if the file size is larger than 5k bytes. Velocifilter correctly handles such cancellations automatically in order to reduce query evaluation latency.

Protocols - supporting custom types::

Velocifilter uses a plugin system to allow clients to define how their own custom types behave within the VQL evaluator.

Note that this is necessary because Go does not allow an external package to add an interface to an existing type without creating a new type which embeds it. Clients who need to handle the original third party types must have a way to attach new protocols to existing types defined outside their own codebase. Velocifilter achieves this by implementing a registration systen in the Scope{} object.

For example, consider a client of the library wishing to pass custom types in queries:

type Foo struct {
   ...
   bar Bar
}

Where both Foo and Bar are defined and produced by some other library which our client uses. Suppose our client wishes to allow addition of Foo objects. We would therefore need to implement the AddProtocol interface on Foo structs. Since Foo structs are defined externally we can not simply add a new method to Foo struct (we could embed Foo struct in a new struct, but then we would also need to wrap the bar field to produce an extended Bar. This is typically impractical and not maintainable for heavily nested complex structs). We define a FooAdder{} object which implements the Addition protocol on behalf of the Foo object.

  // This is an object which implements addition between two Foo objects.
  type FooAdder struct{}

  // This method will be run to see if this implementation is
  // applicable. We only want to run when we add two Foo objects together.
  func (self FooAdder) Applicable(a Any, b Any) bool {
	_, a_ok := a.(Foo)
	_, b_ok := b.(Foo)
	return a_ok && b_ok
  }

  // Actually implement the addition between two Foo objects.
  func (self FooAdder) Add(scope types.Scope, a Any, b Any) Any {
    ... return new object (does not have to be Foo{}).
  }

Now clients can add this protocol to the scope before evaluating a query:

scope := NewScope().AddProtocolImpl(FooAdder{})

Index

Constants

This section is empty.

Variables

View Source
var (
	ToStringOptions = FormatOptions{
		BreakLines:        false,
		MaxWidthThreshold: 1000000,
	}

	DefaultFormatOptions = FormatOptions{
		IndentWidthThreshold: 50,
		MaxWidthThreshold:    80,
		ArgsOnNewLine:        true,
		BreakLines:           true,
	}
)

Functions

func CopyFunction

func CopyFunction(in types.Any) types.FunctionInterface

func ExtractArgs

func ExtractArgs(scope types.Scope, args *ordereddict.Dict, value interface{}) error

func FormatToString

func FormatToString(scope types.Scope, node interface{}) string

func GetIntScope

func GetIntScope(scope_int types.Scope) *scope.Scope

func GetResponseChannel

func GetResponseChannel(
	vql *VQL,
	ctx context.Context,
	scope types.Scope,
	encoder RowEncoder,
	maxrows int,

	max_wait int) <-chan *VFilterJsonResult

Returns a channel over which multi part results are sent.

func MaterializedLazyRow

func MaterializedLazyRow(ctx context.Context, row Row, scope types.Scope) *ordereddict.Dict

Takes a row returned from a plugin and materialize it into basic types. Generally this should only be LazyRow as this is only called from the Transformer. NOTE: This function only materialized the columns - it does not recursively materialize all objects.

func NewLazyExpr

func NewLazyExpr(ctx context.Context,
	scope types.Scope, expr *_AndExpression) types.LazyExpr

func NewScope

func NewScope() types.Scope

func NewStoredQuery

func NewStoredQuery(query *_Select, name string) *_StoredQuery

func NewTimeThrottler

func NewTimeThrottler(rate float64) types.Throttler

func NewUnmarshaller

func NewUnmarshaller(ignore_vars []string) *marshal.Unmarshaller

func OutputJSON

func OutputJSON(
	vql *VQL,
	ctx context.Context,
	scope types.Scope,
	encoder RowEncoder) ([]byte, error)

A convenience function to generate JSON output from a VQL query.

func RowToDict

func RowToDict(
	ctx context.Context,
	scope types.Scope, row types.Row) *ordereddict.Dict

Types

type Any

type Any = types.Any

Aliases to public types.

type Empty

type Empty struct{}

type FormatOptions

type FormatOptions struct {
	// Threshold above which we indent more aggresively on new
	// lines. Below the threshold we try to keep lines together.
	IndentWidthThreshold int
	MaxWidthThreshold    int

	// Parameters are layed one on each line and indent at the first (
	ArgsOnNewLine bool
	BreakLines    bool
}

type FunctionInfo

type FunctionInfo = types.FunctionInfo

type FunctionInterface

type FunctionInterface = types.FunctionInterface

type GenericFunction

type GenericFunction = functions.GenericFunction

type GenericListPlugin

type GenericListPlugin = plugins.GenericListPlugin

type GroupbyActor

type GroupbyActor struct {
	// contains filtered or unexported fields
}

func (*GroupbyActor) GetNextRow

func (self *GroupbyActor) GetNextRow(ctx context.Context, scope types.Scope) (
	types.LazyRow, types.Row, string, types.Scope, error)

Pull the next row off the query possibly filtering it.

func (*GroupbyActor) MaterializeRow

func (self *GroupbyActor) MaterializeRow(ctx context.Context,
	row types.Row, scope types.Scope) *ordereddict.Dict

func (*GroupbyActor) Transform

func (self *GroupbyActor) Transform(ctx context.Context,
	scope types.Scope, row types.Row) (types.LazyRow, func())

type Lambda

type Lambda struct {
	Parameters  *_ParameterList ` @@ `
	LetOperator string          ` @"=>" `
	Expression  *_AndExpression ` @@ `
}

func ParseLambda

func ParseLambda(expression string) (*Lambda, error)

func (*Lambda) GetParameters

func (self *Lambda) GetParameters() []string

func (*Lambda) Reduce

func (self *Lambda) Reduce(ctx context.Context, scope types.Scope, parameters []Any) Any

type LazyExpr

type LazyExpr = types.LazyExpr

type LazyExprImpl

type LazyExprImpl struct {
	Value types.Any // Used to cache
	Expr  *_AndExpression
	// contains filtered or unexported fields
}

A LazyExpr may be passed into a plugin arg for later evaluation. The plugin may completely ignore the expression and so will not evaluate it at all. Once evaluated LazyExpr will cache the value and can be used again. NOTE that LazyExpr is used purely for caching and so it uses the local scope (at the point of definition) to evaluate the expression - not the scope at the point of reference!

func (*LazyExprImpl) Reduce

func (self *LazyExprImpl) Reduce(ctx context.Context) types.Any

func (*LazyExprImpl) ReduceWithScope

func (self *LazyExprImpl) ReduceWithScope(
	ctx context.Context, scope types.Scope) types.Any

type LazyRowImpl

type LazyRowImpl struct {
	// contains filtered or unexported fields
}

A LazyRow holds callbacks as columns. When a column is accessed, the LazyRow will call the callback to materialize it, then cache the results. LazyRows are used to avoid calling expensive functions when the query does not need them - LazyRows are created in the SELECT transformer to delay evaluation of column specifiers until they are accessed.

func NewLazyRow

func NewLazyRow(ctx context.Context, scope types.Scope) *LazyRowImpl

func (*LazyRowImpl) AddColumn

func (self *LazyRowImpl) AddColumn(
	name string, getter func(ctx context.Context, scope types.Scope) types.Any) types.LazyRow

func (*LazyRowImpl) Columns

func (self *LazyRowImpl) Columns() []string

func (*LazyRowImpl) Get

func (self *LazyRowImpl) Get(key string) (types.Any, bool)

func (*LazyRowImpl) Has

func (self *LazyRowImpl) Has(key string) bool

type MultiVQL

type MultiVQL struct {
	Comments  []*_Comment `{ @@ } `
	VQL1      *VQL        ` @@ `
	Comments2 []*_Comment `{ @@ } `
	VQL2      *MultiVQL   ` { @@ } `
}

func (*MultiVQL) GetStatements

func (self *MultiVQL) GetStatements() []*VQL

type Null

type Null = types.Null

type OrdereddictUnmarshaller

type OrdereddictUnmarshaller struct{}

func (OrdereddictUnmarshaller) Unmarshal

func (self OrdereddictUnmarshaller) Unmarshal(
	unmarshaller types.Unmarshaller,
	scope types.Scope, item *types.MarshalItem) (interface{}, error)

type Plugin

type Plugin struct {
	Name string `@Ident { @"." @Ident } `

	Call bool     `[ @"("`
	Args []*_Args ` [ @@  { "," @@ } ] ")" ]`
	// contains filtered or unexported fields
}

func (*Plugin) Eval

func (self *Plugin) Eval(ctx context.Context, scope types.Scope) <-chan Row

type PluginGeneratorInterface

type PluginGeneratorInterface = types.PluginGeneratorInterface

type PluginInfo

type PluginInfo = types.PluginInfo

type ReplayUnmarshaller

type ReplayUnmarshaller struct{}

func (ReplayUnmarshaller) Unmarshal

func (self ReplayUnmarshaller) Unmarshal(
	unmarshaller types.Unmarshaller,
	scope types.Scope, item *types.MarshalItem) (interface{}, error)

type Row

type Row = types.Row

type RowEncoder

type RowEncoder func(rows []Row) ([]byte, error)

type Scope

type Scope = types.Scope

type ScopeUnmarshaller

type ScopeUnmarshaller = scope.ScopeUnmarshaller

type StoredExpression

type StoredExpression struct {
	Expr *_AndExpression
	// contains filtered or unexported fields
}

Unlike the LazyExpr the value of StoredExpression is not cached - this means each time it is evaluated, the expression is fully expanded. NOTE: The StoredExpression is evaluated at the point of reference not at the point of definition - therefore when evaluated, we must provide the scope at that point.

func (*StoredExpression) Call

func (self *StoredExpression) Call(ctx context.Context,
	scope types.Scope, args *ordereddict.Dict) types.Any

Act as a function

func (*StoredExpression) Marshal

func (self *StoredExpression) Marshal(
	scope types.Scope) (*types.MarshalItem, error)

func (*StoredExpression) Reduce

func (self *StoredExpression) Reduce(
	ctx context.Context, scope types.Scope) types.Any

type StoredQuery

type StoredQuery = types.StoredQuery

type StoredQueryCallSite

type StoredQueryCallSite struct {
	// contains filtered or unexported fields
}

A wrapper around a stored query which captures its call site's parameters in a new scope. When the wrapper is evaluated, the call site's scope will be used.

func (*StoredQueryCallSite) Eval

func (self *StoredQueryCallSite) Eval(ctx context.Context, scope Scope) <-chan Row

type StoredQueryItem

type StoredQueryItem struct {
	Query      string   `json:"query,omitempty"`
	Name       string   `json:"name,omitempty"`
	Parameters []string `json:"parameters,omitempty"`
}

type TimeThrottler

type TimeThrottler struct {
	// contains filtered or unexported fields
}

func (*TimeThrottler) ChargeOp

func (self *TimeThrottler) ChargeOp()

func (*TimeThrottler) Close

func (self *TimeThrottler) Close()

type TypeMap

type TypeMap = types.TypeMap

type VFilterJsonResult

type VFilterJsonResult struct {
	Part      int
	TotalRows int
	Columns   []string
	Payload   []byte
}

A response from VQL queries.

type VQL

type VQL struct {
	Let         string          `LET  @Ident `
	Parameters  *_ParameterList `{ "(" @@ ")" }`
	LetOperator string          ` ( @"=" | @"<=" ) `
	StoredQuery *_Select        ` ( @@ |  `
	Expression  *_AndExpression ` @@ ) |`
	Query       *_Select        ` @@  `
	Comments    []*_Comment
}

An opaque object representing the VQL expression.

func MultiParse

func MultiParse(expression string) ([]*VQL, error)

Parse a string into multiple VQL statements.

func MultiParseWithComments

func MultiParseWithComments(expression string) ([]*VQL, error)

Parse a string into multiple VQL statements.

func Parse

func Parse(expression string) (*VQL, error)

Parse the VQL expression. Returns a VQL object which may be evaluated.

func (*VQL) Eval

func (self *VQL) Eval(ctx context.Context, scope types.Scope) <-chan Row

Evaluate the expression. Returns a channel which emits a series of rows.

func (*VQL) Type

func (self *VQL) Type() string

Returns the type of statement it is: LAZY_LET - A lazy stored query MATERIALIZED_LET - A stored meterialized query. SELECT - A query

type Visitor

type Visitor struct {
	Fragments []string
	// contains filtered or unexported fields
}

func NewVisitor

func NewVisitor(scope types.Scope, options FormatOptions) *Visitor

func (*Visitor) ToString

func (self *Visitor) ToString() string

func (*Visitor) Visit

func (self *Visitor) Visit(node interface{})

Directories

Path Synopsis
_examples
Utility functions for extracting and validating inputs to functions and plugins.
Utility functions for extracting and validating inputs to functions and plugins.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL