hyperscan

package
v0.0.0-...-690b3be Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 16, 2017 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Hyperscan (https://github.com/01org/hyperscan) is a software regular expression matching engine designed with high performance and flexibility in mind. It is implemented as a library that exposes a straightforward C API.

Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.

Hyperscan is typically used in a DPI library stack.

The Hyperscan API itself is composed of two major components:

Compilation

These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Hyperscan scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently.

If a pattern cannot be built into a database for any reason (such as the use of an unsupported expression construct, or the overflowing of a resource limit), an error will be returned by the pattern compiler.

Compiled databases can be serialized and relocated, so that they can be stored to disk or moved between hosts. They can also be targeted to particular platform features (for example, the use of Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions).

See Compiling Patterns for more detail. (http://01org.github.io/hyperscan/dev-reference/compilation.html)

Scanning

Once a Hyperscan database has been created, it can be used to scan data in memory. Hyperscan provides several scanning modes, depending on whether the data to be scanned is available as a single contiguous block, whether it is distributed amongst several blocks in memory at the same time, or whether it is to be scanned as a sequence of blocks in a stream.

Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match.

For a given database, Hyperscan provides several guarantees:

1. No memory allocations occur at runtime with the exception of two fixed-size allocations, both of which should be done ahead of time for performance-critical applications:

  • Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.
  • Stream state: in streaming mode only, some state space is required to store data that persists between scan calls for each stream. This allows Hyperscan to track matches that span multiple blocks of data.

2. The sizes of the scratch space and stream state (in streaming mode) required for a given database are fixed and determined at database compile time. This means that the memory requirements of the application are known ahead of time, and these structures can be pre-allocated if required for performance reasons.

3. Any pattern that has successfully been compiled by the Hyperscan compiler can be scanned against any input. There are no internal resource limits or other limitations at runtime that could cause a scan call to return an error.

See Scanning for Patterns for more detail. (http://01org.github.io/hyperscan/dev-reference/runtime.html)

Building a Database

The Hyperscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data.

Compilation allows the Hyperscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time.

When compiling expressions, a decision needs to be made whether the resulting compiled patterns are to be used in a streaming, block or vectored mode:

  • Streaming mode: the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream. In streaming mode, each stream requires a block of memory to store its state between scan calls.
  • Block mode: the target data is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.
  • Vectored mode: the target data consists of a list of non-contiguous blocks that are available all at once. As for block mode, no retention of state is required.

Index

Constants

View Source
const (
	// Genericindicates that the compiled database should not be tuned for any particular target platform.
	Generic TuneFlag = C.HS_TUNE_FAMILY_GENERIC
	// SandyBridge indicates that the compiled database should be tuned for the Sandy Bridge microarchitecture.
	SandyBridge = C.HS_TUNE_FAMILY_SNB
	// IvyBridge indicates that the compiled database should be tuned for the Ivy Bridge microarchitecture.
	IvyBridge = C.HS_TUNE_FAMILY_IVB
	// Haswell indicates that the compiled database should be tuned for the Haswell microarchitecture.
	Haswell = C.HS_TUNE_FAMILY_HSW
	// Silvermont indicates that the compiled database should be tuned for the Silvermont microarchitecture.
	Silvermont = C.HS_TUNE_FAMILY_SLM
	// Broadwell indicates that the compiled database should be tuned for the Broadwell microarchitecture.
	Broadwell = C.HS_TUNE_FAMILY_BDW
	// Skylake indicates that the compiled database should be tuned for the Skylake microarchitecture.
	Skylake = C.HS_TUNE_FAMILY_SKL
	// SkylakeServer indicates that the compiled database should be tuned for the Skylake Server microarchitecture.
	SkylakeServer = C.HS_TUNE_FAMILY_SKX
	// Goldmont indicates that the compiled database should be tuned for the Goldmont microarchitecture.
	Goldmont = C.HS_TUNE_FAMILY_GLM
)
View Source
const (
	// MinOffset is a flag indicating that the ExprExt.MinOffset field is used.
	MinOffset ExtFlag = C.HS_EXT_FLAG_MIN_OFFSET
	// MaxOffset is a flag indicating that the ExprExt.MaxOffset field is used.
	MaxOffset = C.HS_EXT_FLAG_MAX_OFFSET
	// MinLength is a flag indicating that the ExprExt.MinLength field is used.
	MinLength = C.HS_EXT_FLAG_MIN_LENGTH
	// EditDistance is a flag indicating that the ExprExt.EditDistance field is used.
	EditDistance = C.HS_EXT_FLAG_EDIT_DISTANCE
)
View Source
const (
	// ErrSuccess is the error returned if the engine completed normally.
	ErrSuccess HsError = C.HS_SUCCESS
	// ErrInvalid is the error returned if a parameter passed to this function was invalid.
	ErrInvalid = C.HS_INVALID
	// ErrNoMemory is the error returned if a memory allocation failed.
	ErrNoMemory = C.HS_NOMEM
	// ErrScanTerminated is the error returned if the engine was terminated by callback.
	ErrScanTerminated = C.HS_SCAN_TERMINATED
	// ErrCompileError is the error returned if the pattern compiler failed.
	ErrCompileError = C.HS_COMPILER_ERROR
	// ErrDatabaseVersionError is the error returned if the given database was built for a different version of Hyperscan.
	ErrDatabaseVersionError = C.HS_DB_VERSION_ERROR
	// ErrDatabasePlatformError is the error returned if the given database was built for a different platform (i.e., CPU type).
	ErrDatabasePlatformError = C.HS_DB_PLATFORM_ERROR
	// ErrDatabaseModeError is the error returned if the given database was built for a different mode of operation.
	ErrDatabaseModeError = C.HS_DB_MODE_ERROR
	// ErrBadAlign is the error returned if a parameter passed to this function was not correctly aligned.
	ErrBadAlign = C.HS_BAD_ALIGN
	// ErrBadAlloc is the error returned if the memory allocator did not correctly return memory suitably aligned.
	ErrBadAlloc = C.HS_BAD_ALLOC
	// ErrScratchInUse is the error returned if the scratch region was already in use.
	ErrScratchInUse = C.HS_SCRATCH_IN_USE
	// ErrArchError is the error returned if unsupported CPU architecture.
	ErrArchError = C.HS_ARCH_ERROR
)
View Source
const (
	FloatNumber = `(?:` +
		`[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?.)`

	IPv4Address = `(?:` +
		`(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}` +
		`(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))`

	EmailAddress = `(?:` +
		`^[A-Za-z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@` +
		`([A-Za-z0-9]+)(([\.\-]?[a-zA-Z0-9]+)*)\.([A-Za-z]{2,})$)`

	CreditCard = `(?:` +
		`4[0-9]{12}(?:[0-9]{3})?|` +
		`5[1-5][0-9]{14}|` +
		`3[47][0-9]{13}|` +
		`3(?:0[0-5]|[68][0-9])[0-9]{11}|` +
		`6(?:011|5[0-9]{2})[0-9]{12}|` +
		`(?:2131|1800|35\d{3})\d{11})` // JCB
)
View Source
const UnboundedMaxWidth = C.UINT_MAX

If the pattern expression has an unbounded maximum width

Variables

This section is empty.

Functions

func Match

func Match(pattern string, data []byte) (bool, error)

func MatchReader

func MatchReader(pattern string, reader io.Reader) (bool, error)

func MatchString

func MatchString(pattern string, s string) (matched bool, err error)

func Quote

func Quote(s string) string

func SerializedDatabaseSize

func SerializedDatabaseSize(data []byte) (int, error)

SerializedDatabaseSize reports the size that would be required by a database if it were deserialized.

func ValidPlatform

func ValidPlatform() error

ValidPlatform test the current system architecture.

func Version

func Version() string

Version identify this release version. The return version is a string containing the version number of this release build and the date of the build.

Types

type BlockDatabase

type BlockDatabase interface {
	Database
	BlockScanner
	BlockMatcher
}

BlockDatabase scan the target data that is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.

func NewBlockDatabase

func NewBlockDatabase(patterns ...*Pattern) (BlockDatabase, error)

func UnmarshalBlockDatabase

func UnmarshalBlockDatabase(data []byte) (BlockDatabase, error)

UnmarshalBlockDatabase reconstruct a block database from a stream of bytes.

type BlockMatcher

type BlockMatcher interface {
	// Find returns a slice holding the text of the leftmost match in b of the regular expression.
	// A return value of nil indicates no match.
	Find(data []byte) []byte

	// FindIndex returns a two-element slice of integers defining the location of the leftmost match in b of the regular expression.
	// The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindIndex(data []byte) (loc []int)

	// FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAll(data []byte, n int) [][]byte

	// FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllIndex(data []byte, n int) [][]int

	// FindString returns a string holding the text of the leftmost match in s of the regular expression.
	// If there is no match, the return value is an empty string, but it will also be empty
	// if the regular expression successfully matches an empty string. Use FindStringIndex if it is necessary to distinguish these cases.
	FindString(s string) string

	// FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s of the regular expression.
	// The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindStringIndex(s string) (loc []int)

	// FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllString(s string, n int) []string

	// FindAllStringIndex is the 'All' version of FindStringIndex; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllStringIndex(s string, n int) [][]int

	// Match reports whether the pattern database matches the byte slice b.
	Match(b []byte) bool

	// MatchString reports whether the pattern database matches the string s.
	MatchString(s string) bool
}

BlockMatcher implements regular expression search.

type BlockScanner

type BlockScanner interface {
	// This is the function call in which the actual pattern matching takes place for block-mode pattern databases.
	Scan(data []byte, scratch *Scratch, handler MatchHandler, context interface{}) error
}

BlockScanner is the block (non-streaming) regular expression scanner.

type CompileFlag

type CompileFlag uint

Pattern flags

const (
	Caseless        CompileFlag = C.HS_FLAG_CASELESS
	DotAll          CompileFlag = C.HS_FLAG_DOTALL       // Matching a `.` will not exclude newlines.
	MultiLine       CompileFlag = C.HS_FLAG_MULTILINE    // Set multi-line anchoring.
	SingleMatch     CompileFlag = C.HS_FLAG_SINGLEMATCH  // Set single-match only mode.
	AllowEmpty      CompileFlag = C.HS_FLAG_ALLOWEMPTY   // Allow expressions that can match against empty buffers.
	Utf8Mode        CompileFlag = C.HS_FLAG_UTF8         // Enable UTF-8 mode for this expression.
	UnicodeProperty CompileFlag = C.HS_FLAG_UCP          // Enable Unicode property support for this expression.
	PrefilterMode   CompileFlag = C.HS_FLAG_PREFILTER    // Enable prefiltering mode for this expression.
	SomLeftMost     CompileFlag = C.HS_FLAG_SOM_LEFTMOST // Enable leftmost start of match reporting.
)

func ParseCompileFlag

func ParseCompileFlag(s string) (CompileFlag, error)

Parse the compile pattern flags from string

i 	Caseless
s 	DotAll
m	MultiLine
o 	SingleMatch
e 	AllowEmpty
u 	Utf8Mode
p	UnicodeProperty
f 	PrefilterMode
l 	SomLeftMost

func (CompileFlag) String

func (flags CompileFlag) String() string

type CpuFeature

type CpuFeature int

CpuFeature is the CPU feature support flags

const (
	// AVX2 is a CPU features flag indicates that the target platform supports AVX2 instructions.
	AVX2 CpuFeature = C.HS_CPU_FEATURES_AVX2
	// AVX512 is a CPU features flag indicates that the target platform supports AVX512 instructions,
	// specifically AVX-512BW. Using AVX512 implies the use of AVX2.
	AVX512 = C.HS_CPU_FEATURES_AVX512
)

type Database

type Database interface {
	// Provides information about a database.
	Info() (DbInfo, error)

	// Provides the size of the given database in bytes.
	Size() (int, error)

	// Free a compiled pattern database.
	Close() error

	// Serialize a pattern database to a stream of bytes.
	Marshal() ([]byte, error)

	// Reconstruct a pattern database from a stream of bytes at a given memory location.
	Unmarshal([]byte) error
}

func Compile

func Compile(expr string) (Database, error)

Compile a regular expression and returns, if successful, a pattern database in the block mode that can be used to match against text.

func MustCompile

func MustCompile(expr string) Database

MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

func UnmarshalDatabase

func UnmarshalDatabase(data []byte) (Database, error)

UnmarshalDatabase reconstruct a pattern database from a stream of bytes.

type DatabaseBuilder

type DatabaseBuilder struct {
	// Array of patterns to compile.
	Patterns []*Pattern

	// Compiler mode flags that affect the database as a whole. (Default: block mode)
	Mode ModeFlag

	// If not nil, the platform structure is used to determine the target platform for the database.
	// If nil, a database suitable for running on the current host platform is produced.
	Platform Platform
}

A type to help to build up a database

func (*DatabaseBuilder) AddExpressionWithFlags

func (b *DatabaseBuilder) AddExpressionWithFlags(expr Expression, flags CompileFlag) *DatabaseBuilder

func (*DatabaseBuilder) AddExpressions

func (b *DatabaseBuilder) AddExpressions(exprs ...Expression) *DatabaseBuilder

func (*DatabaseBuilder) Build

func (b *DatabaseBuilder) Build() (Database, error)

type DbInfo

type DbInfo string

The version and platform information for the supplied database

func SerializedDatabaseInfo

func SerializedDatabaseInfo(data []byte) (DbInfo, error)

SerializedDatabaseInfo provides information about a serialized database.

func (DbInfo) Mode

func (i DbInfo) Mode() (ModeFlag, error)

func (DbInfo) String

func (i DbInfo) String() string

func (DbInfo) Version

func (i DbInfo) Version() (string, error)

type ExprExt

type ExprExt struct {
	// Flags governing which parts of this structure are to be used by the compiler.
	Flags ExtFlag
	// The minimum end offset in the data stream at which this expression should match successfully.
	MinOffset uint64
	// The maximum end offset in the data stream at which this expression should match successfully.
	MaxOffset uint64
	// The minimum match length (from start to end) required to successfully match this expression.
	MinLength uint64
	// Allow patterns to approximately match within this edit distance.
	EditDistance uint
}

ExprExt is a structure containing additional parameters related to an expression.

type ExprInfo

type ExprInfo struct {
	MinWidth        uint // The minimum length in bytes of a match for the pattern.
	MaxWidth        uint // The maximum length in bytes of a match for the pattern.
	ReturnUnordered bool // Whether this expression can produce matches that are not returned in order, such as those produced by assertions.
	AtEndOfData     bool // Whether this expression can produce matches at end of data (EOD).
	OnlyAtEndOfData bool // Whether this expression can *only* produce matches at end of data (EOD).
}

A type containing information related to an expression

type Expression

type Expression string

The expression of pattern

func (Expression) String

func (e Expression) String() string

type ExtFlag

type ExtFlag uint

type HsError

type HsError int

func (HsError) Error

func (e HsError) Error() string

type MatchContext

type MatchContext interface {
	Database() Database

	Scratch() Scratch

	UserData() interface{}
}

MatchContext assembles a DB, it's scratch space and user context

type MatchEvent

type MatchEvent interface {
	Id() uint

	From() uint64

	To() uint64

	Flags() ScanFlag
}

MatchEvent is produced for each match

type MatchHandler

type MatchHandler hsMatchEventHandler

MatchHandler is the identity of the function handling matches

type ModeFlag

type ModeFlag uint

Compile mode flags

const (
	BlockMode            ModeFlag = C.HS_MODE_BLOCK              // Block scan (non-streaming) database.
	NoStreamMode         ModeFlag = C.HS_MODE_NOSTREAM           // Alias for Block.
	StreamMode           ModeFlag = C.HS_MODE_STREAM             // Streaming database.
	VectoredMode         ModeFlag = C.HS_MODE_VECTORED           // Vectored scanning database.
	SomHorizonLargeMode  ModeFlag = C.HS_MODE_SOM_HORIZON_LARGE  // Use full precision to track start of match offsets in stream state.
	SomHorizonMediumMode ModeFlag = C.HS_MODE_SOM_HORIZON_MEDIUM // Use medium precision to track start of match offsets in stream state. (within 2^32 bytes)
	SomHorizonSmallMode  ModeFlag = C.HS_MODE_SOM_HORIZON_SMALL  // Use limited precision to track start of match offsets in stream state. (within 2^16 bytes)
)

func ParseModeFlag

func ParseModeFlag(s string) (ModeFlag, error)

func (ModeFlag) String

func (m ModeFlag) String() string

type Pattern

type Pattern struct {
	Expression             // The expression to parse.
	Flags      CompileFlag // Flags which modify the behaviour of the expression.
	Id         int         // The ID number to be associated with the corresponding pattern
	// contains filtered or unexported fields
}

func NewPattern

func NewPattern(expr string, flags CompileFlag) *Pattern

func ParsePattern

func ParsePattern(s string) (*Pattern, error)

Parse pattern from a formated string

/<expression>/[flags]

For example, the following pattern will match `test` in the caseless and multi-lines mode

/test/im

func (*Pattern) Info

func (p *Pattern) Info() (*ExprInfo, error)

Provides information about a regular expression.

func (*Pattern) IsValid

func (p *Pattern) IsValid() bool

func (*Pattern) String

func (p *Pattern) String() string

type Platform

type Platform interface {
	// Information about the target platform which may be used to guide the optimisation process of the compile.
	Tune() TuneFlag

	// Relevant CPU features available on the target platform
	CpuFeatures() CpuFeature
}

A type containing information on the target platform.

func NewPlatform

func NewPlatform(tune TuneFlag, cpu CpuFeature) Platform

func PopulatePlatform

func PopulatePlatform() Platform

Populates the platform information based on the current host.

type ScanFlag

type ScanFlag uint

type Scratch

type Scratch struct {
	// contains filtered or unexported fields
}

Scratch is a Hyperscan scratch space.

func NewScratch

func NewScratch(db Database) (*Scratch, error)

NewScratch allocate a "scratch" space for use by Hyperscan. This is required for runtime use, and one scratch space per thread, or concurrent caller, is required.

func (*Scratch) Clone

func (s *Scratch) Clone() (*Scratch, error)

Clone allocate a scratch space that is a clone of an existing scratch space.

func (*Scratch) Free

func (s *Scratch) Free() error

Free a scratch block previously allocated

func (*Scratch) Realloc

func (s *Scratch) Realloc(db Database) error

Realloc reallocate the scratch for another database.

func (*Scratch) Size

func (s *Scratch) Size() (int, error)

Size provides the size of the given scratch space.

type Stream

type Stream interface {
	Scan(data []byte) error

	Close() error

	Reset() error

	Clone() (Stream, error)
}

Stream exist in the Hyperscan library so that pattern matching state can be maintained across multiple blocks of target data

type StreamDatabase

type StreamDatabase interface {
	Database
	StreamScanner
	StreamMatcher

	StreamSize() (int, error)
}

StreamDatabase scan the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream.

func NewStreamDatabase

func NewStreamDatabase(patterns ...*Pattern) (StreamDatabase, error)

func UnmarshalStreamDatabase

func UnmarshalStreamDatabase(data []byte) (StreamDatabase, error)

UnmarshalStreamDatabase reconstruct a stream database from a stream of bytes.

type StreamMatcher

type StreamMatcher interface {
}

StreamMatcher implements regular expression search.

type StreamScanner

type StreamScanner interface {
	Open(flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error)
}

StreamScanner is the streaming regular expression scanner.

type TuneFlag

type TuneFlag int

TuneFlag is the tuning flags

type VectoredDatabase

type VectoredDatabase interface {
	Database
	VectoredScanner
	VectoredMatcher
}

VectoredDatabase scan the target data that consists of a list of non-contiguous blocks that are available all at once.

func NewVectoredDatabase

func NewVectoredDatabase(patterns ...*Pattern) (VectoredDatabase, error)

func UnmarshalVectoredDatabase

func UnmarshalVectoredDatabase(data []byte) (VectoredDatabase, error)

UnmarshalVectoredDatabase reconstruct a vectored database from a stream of bytes.

type VectoredMatcher

type VectoredMatcher interface {
}

VectoredMatcher implements regular expression search.

type VectoredScanner

type VectoredScanner interface {
	Scan(data [][]byte, scratch *Scratch, handler MatchHandler, context interface{}) error
}

VectoredScanner is the vectored regular expression scanner.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL