Documentation ¶
Overview ¶
Hyperscan (https://github.com/01org/hyperscan) is a software regular expression matching engine designed with high performance and flexibility in mind. It is implemented as a library that exposes a straightforward C API.
Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.
Hyperscan is typically used in a DPI library stack.
The Hyperscan API itself is composed of two major components:
Compilation ¶
These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Hyperscan scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently.
If a pattern cannot be built into a database for any reason (such as the use of an unsupported expression construct, or the overflowing of a resource limit), an error will be returned by the pattern compiler.
Compiled databases can be serialized and relocated, so that they can be stored to disk or moved between hosts. They can also be targeted to particular platform features (for example, the use of Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions).
See Compiling Patterns for more detail. (http://01org.github.io/hyperscan/dev-reference/compilation.html)
Scanning ¶
Once a Hyperscan database has been created, it can be used to scan data in memory. Hyperscan provides several scanning modes, depending on whether the data to be scanned is available as a single contiguous block, whether it is distributed amongst several blocks in memory at the same time, or whether it is to be scanned as a sequence of blocks in a stream.
Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match.
For a given database, Hyperscan provides several guarantees:
1. No memory allocations occur at runtime with the exception of two fixed-size allocations, both of which should be done ahead of time for performance-critical applications:
- Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.
- Stream state: in streaming mode only, some state space is required to store data that persists between scan calls for each stream. This allows Hyperscan to track matches that span multiple blocks of data.
2. The sizes of the scratch space and stream state (in streaming mode) required for a given database are fixed and determined at database compile time. This means that the memory requirements of the application are known ahead of time, and these structures can be pre-allocated if required for performance reasons.
3. Any pattern that has successfully been compiled by the Hyperscan compiler can be scanned against any input. There are no internal resource limits or other limitations at runtime that could cause a scan call to return an error.
See Scanning for Patterns for more detail. (http://01org.github.io/hyperscan/dev-reference/runtime.html)
Building a Database ¶
The Hyperscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data.
Compilation allows the Hyperscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time.
When compiling expressions, a decision needs to be made whether the resulting compiled patterns are to be used in a streaming, block or vectored mode:
- Streaming mode: the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream. In streaming mode, each stream requires a block of memory to store its state between scan calls.
- Block mode: the target data is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.
- Vectored mode: the target data consists of a list of non-contiguous blocks that are available all at once. As for block mode, no retention of state is required.
Index ¶
- Constants
- func Match(pattern string, data []byte) (bool, error)
- func MatchReader(pattern string, reader io.Reader) (bool, error)
- func MatchString(pattern string, s string) (matched bool, err error)
- func Quote(s string) string
- func SerializedDatabaseSize(data []byte) (int, error)
- func ValidPlatform() error
- func Version() string
- type BlockDatabase
- type BlockMatcher
- type BlockScanner
- type CompileFlag
- type CpuFeature
- type Database
- type DatabaseBuilder
- type DbInfo
- type ExprExt
- type ExprInfo
- type Expression
- type ExtFlag
- type HsError
- type MatchContext
- type MatchEvent
- type MatchHandler
- type ModeFlag
- type Pattern
- type Platform
- type ScanFlag
- type Scratch
- type Stream
- type StreamDatabase
- type StreamMatcher
- type StreamScanner
- type TuneFlag
- type VectoredDatabase
- type VectoredMatcher
- type VectoredScanner
Constants ¶
const ( // Genericindicates that the compiled database should not be tuned for any particular target platform. Generic TuneFlag = C.HS_TUNE_FAMILY_GENERIC // SandyBridge indicates that the compiled database should be tuned for the Sandy Bridge microarchitecture. SandyBridge = C.HS_TUNE_FAMILY_SNB // IvyBridge indicates that the compiled database should be tuned for the Ivy Bridge microarchitecture. IvyBridge = C.HS_TUNE_FAMILY_IVB // Haswell indicates that the compiled database should be tuned for the Haswell microarchitecture. Haswell = C.HS_TUNE_FAMILY_HSW // Silvermont indicates that the compiled database should be tuned for the Silvermont microarchitecture. Silvermont = C.HS_TUNE_FAMILY_SLM // Broadwell indicates that the compiled database should be tuned for the Broadwell microarchitecture. Broadwell = C.HS_TUNE_FAMILY_BDW // Skylake indicates that the compiled database should be tuned for the Skylake microarchitecture. Skylake = C.HS_TUNE_FAMILY_SKL // SkylakeServer indicates that the compiled database should be tuned for the Skylake Server microarchitecture. SkylakeServer = C.HS_TUNE_FAMILY_SKX // Goldmont indicates that the compiled database should be tuned for the Goldmont microarchitecture. Goldmont = C.HS_TUNE_FAMILY_GLM )
const ( // MinOffset is a flag indicating that the ExprExt.MinOffset field is used. MinOffset ExtFlag = C.HS_EXT_FLAG_MIN_OFFSET // MaxOffset is a flag indicating that the ExprExt.MaxOffset field is used. MaxOffset = C.HS_EXT_FLAG_MAX_OFFSET // MinLength is a flag indicating that the ExprExt.MinLength field is used. MinLength = C.HS_EXT_FLAG_MIN_LENGTH // EditDistance is a flag indicating that the ExprExt.EditDistance field is used. EditDistance = C.HS_EXT_FLAG_EDIT_DISTANCE )
const ( // ErrSuccess is the error returned if the engine completed normally. ErrSuccess HsError = C.HS_SUCCESS // ErrInvalid is the error returned if a parameter passed to this function was invalid. ErrInvalid = C.HS_INVALID // ErrNoMemory is the error returned if a memory allocation failed. ErrNoMemory = C.HS_NOMEM // ErrScanTerminated is the error returned if the engine was terminated by callback. ErrScanTerminated = C.HS_SCAN_TERMINATED // ErrCompileError is the error returned if the pattern compiler failed. ErrCompileError = C.HS_COMPILER_ERROR // ErrDatabaseVersionError is the error returned if the given database was built for a different version of Hyperscan. ErrDatabaseVersionError = C.HS_DB_VERSION_ERROR // ErrDatabasePlatformError is the error returned if the given database was built for a different platform (i.e., CPU type). ErrDatabasePlatformError = C.HS_DB_PLATFORM_ERROR // ErrDatabaseModeError is the error returned if the given database was built for a different mode of operation. ErrDatabaseModeError = C.HS_DB_MODE_ERROR // ErrBadAlign is the error returned if a parameter passed to this function was not correctly aligned. ErrBadAlign = C.HS_BAD_ALIGN // ErrBadAlloc is the error returned if the memory allocator did not correctly return memory suitably aligned. ErrBadAlloc = C.HS_BAD_ALLOC // ErrScratchInUse is the error returned if the scratch region was already in use. ErrScratchInUse = C.HS_SCRATCH_IN_USE // ErrArchError is the error returned if unsupported CPU architecture. ErrArchError = C.HS_ARCH_ERROR )
const ( FloatNumber = `(?:` + `[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?.)` IPv4Address = `(?:` + `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}` + `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))` EmailAddress = `(?:` + `^[A-Za-z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@` + `([A-Za-z0-9]+)(([\.\-]?[a-zA-Z0-9]+)*)\.([A-Za-z]{2,})$)` CreditCard = `(?:` + `4[0-9]{12}(?:[0-9]{3})?|` + `5[1-5][0-9]{14}|` + `3[47][0-9]{13}|` + `3(?:0[0-5]|[68][0-9])[0-9]{11}|` + `6(?:011|5[0-9]{2})[0-9]{12}|` + `(?:2131|1800|35\d{3})\d{11})` // JCB )
const UnboundedMaxWidth = C.UINT_MAX
If the pattern expression has an unbounded maximum width
Variables ¶
This section is empty.
Functions ¶
func SerializedDatabaseSize ¶
SerializedDatabaseSize reports the size that would be required by a database if it were deserialized.
Types ¶
type BlockDatabase ¶
type BlockDatabase interface { Database BlockScanner BlockMatcher }
BlockDatabase scan the target data that is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.
func NewBlockDatabase ¶
func NewBlockDatabase(patterns ...*Pattern) (BlockDatabase, error)
func UnmarshalBlockDatabase ¶
func UnmarshalBlockDatabase(data []byte) (BlockDatabase, error)
UnmarshalBlockDatabase reconstruct a block database from a stream of bytes.
type BlockMatcher ¶
type BlockMatcher interface { // Find returns a slice holding the text of the leftmost match in b of the regular expression. // A return value of nil indicates no match. Find(data []byte) []byte // FindIndex returns a two-element slice of integers defining the location of the leftmost match in b of the regular expression. // The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match. FindIndex(data []byte) (loc []int) // FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAll(data []byte, n int) [][]byte // FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllIndex(data []byte, n int) [][]int // FindString returns a string holding the text of the leftmost match in s of the regular expression. // If there is no match, the return value is an empty string, but it will also be empty // if the regular expression successfully matches an empty string. Use FindStringIndex if it is necessary to distinguish these cases. FindString(s string) string // FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s of the regular expression. // The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match. FindStringIndex(s string) (loc []int) // FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllString(s string, n int) []string // FindAllStringIndex is the 'All' version of FindStringIndex; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllStringIndex(s string, n int) [][]int // Match reports whether the pattern database matches the byte slice b. Match(b []byte) bool // MatchString reports whether the pattern database matches the string s. MatchString(s string) bool }
BlockMatcher implements regular expression search.
type BlockScanner ¶
type BlockScanner interface { // This is the function call in which the actual pattern matching takes place for block-mode pattern databases. Scan(data []byte, scratch *Scratch, handler MatchHandler, context interface{}) error }
BlockScanner is the block (non-streaming) regular expression scanner.
type CompileFlag ¶
type CompileFlag uint
Pattern flags
const ( Caseless CompileFlag = C.HS_FLAG_CASELESS DotAll CompileFlag = C.HS_FLAG_DOTALL // Matching a `.` will not exclude newlines. MultiLine CompileFlag = C.HS_FLAG_MULTILINE // Set multi-line anchoring. SingleMatch CompileFlag = C.HS_FLAG_SINGLEMATCH // Set single-match only mode. AllowEmpty CompileFlag = C.HS_FLAG_ALLOWEMPTY // Allow expressions that can match against empty buffers. Utf8Mode CompileFlag = C.HS_FLAG_UTF8 // Enable UTF-8 mode for this expression. UnicodeProperty CompileFlag = C.HS_FLAG_UCP // Enable Unicode property support for this expression. PrefilterMode CompileFlag = C.HS_FLAG_PREFILTER // Enable prefiltering mode for this expression. SomLeftMost CompileFlag = C.HS_FLAG_SOM_LEFTMOST // Enable leftmost start of match reporting. )
func ParseCompileFlag ¶
func ParseCompileFlag(s string) (CompileFlag, error)
Parse the compile pattern flags from string
i Caseless s DotAll m MultiLine o SingleMatch e AllowEmpty u Utf8Mode p UnicodeProperty f PrefilterMode l SomLeftMost
func (CompileFlag) String ¶
func (flags CompileFlag) String() string
type CpuFeature ¶
type CpuFeature int
CpuFeature is the CPU feature support flags
const ( // AVX2 is a CPU features flag indicates that the target platform supports AVX2 instructions. AVX2 CpuFeature = C.HS_CPU_FEATURES_AVX2 // AVX512 is a CPU features flag indicates that the target platform supports AVX512 instructions, // specifically AVX-512BW. Using AVX512 implies the use of AVX2. AVX512 = C.HS_CPU_FEATURES_AVX512 )
type Database ¶
type Database interface { // Provides information about a database. Info() (DbInfo, error) // Provides the size of the given database in bytes. Size() (int, error) // Free a compiled pattern database. Close() error // Serialize a pattern database to a stream of bytes. Marshal() ([]byte, error) // Reconstruct a pattern database from a stream of bytes at a given memory location. Unmarshal([]byte) error }
func Compile ¶
Compile a regular expression and returns, if successful, a pattern database in the block mode that can be used to match against text.
func MustCompile ¶
MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.
func UnmarshalDatabase ¶
UnmarshalDatabase reconstruct a pattern database from a stream of bytes.
type DatabaseBuilder ¶
type DatabaseBuilder struct { // Array of patterns to compile. Patterns []*Pattern // Compiler mode flags that affect the database as a whole. (Default: block mode) Mode ModeFlag // If not nil, the platform structure is used to determine the target platform for the database. // If nil, a database suitable for running on the current host platform is produced. Platform Platform }
A type to help to build up a database
func (*DatabaseBuilder) AddExpressionWithFlags ¶
func (b *DatabaseBuilder) AddExpressionWithFlags(expr Expression, flags CompileFlag) *DatabaseBuilder
func (*DatabaseBuilder) AddExpressions ¶
func (b *DatabaseBuilder) AddExpressions(exprs ...Expression) *DatabaseBuilder
func (*DatabaseBuilder) Build ¶
func (b *DatabaseBuilder) Build() (Database, error)
type DbInfo ¶
type DbInfo string
The version and platform information for the supplied database
func SerializedDatabaseInfo ¶
SerializedDatabaseInfo provides information about a serialized database.
type ExprExt ¶
type ExprExt struct { // Flags governing which parts of this structure are to be used by the compiler. Flags ExtFlag // The minimum end offset in the data stream at which this expression should match successfully. MinOffset uint64 // The maximum end offset in the data stream at which this expression should match successfully. MaxOffset uint64 // The minimum match length (from start to end) required to successfully match this expression. MinLength uint64 // Allow patterns to approximately match within this edit distance. EditDistance uint }
ExprExt is a structure containing additional parameters related to an expression.
type ExprInfo ¶
type ExprInfo struct { MinWidth uint // The minimum length in bytes of a match for the pattern. MaxWidth uint // The maximum length in bytes of a match for the pattern. ReturnUnordered bool // Whether this expression can produce matches that are not returned in order, such as those produced by assertions. AtEndOfData bool // Whether this expression can produce matches at end of data (EOD). OnlyAtEndOfData bool // Whether this expression can *only* produce matches at end of data (EOD). }
A type containing information related to an expression
type Expression ¶
type Expression string
The expression of pattern
func (Expression) String ¶
func (e Expression) String() string
type MatchContext ¶
MatchContext assembles a DB, it's scratch space and user context
type MatchEvent ¶
MatchEvent is produced for each match
type MatchHandler ¶
type MatchHandler hsMatchEventHandler
MatchHandler is the identity of the function handling matches
type ModeFlag ¶
type ModeFlag uint
Compile mode flags
const ( BlockMode ModeFlag = C.HS_MODE_BLOCK // Block scan (non-streaming) database. NoStreamMode ModeFlag = C.HS_MODE_NOSTREAM // Alias for Block. StreamMode ModeFlag = C.HS_MODE_STREAM // Streaming database. VectoredMode ModeFlag = C.HS_MODE_VECTORED // Vectored scanning database. SomHorizonLargeMode ModeFlag = C.HS_MODE_SOM_HORIZON_LARGE // Use full precision to track start of match offsets in stream state. SomHorizonMediumMode ModeFlag = C.HS_MODE_SOM_HORIZON_MEDIUM // Use medium precision to track start of match offsets in stream state. (within 2^32 bytes) SomHorizonSmallMode ModeFlag = C.HS_MODE_SOM_HORIZON_SMALL // Use limited precision to track start of match offsets in stream state. (within 2^16 bytes) )
func ParseModeFlag ¶
type Pattern ¶
type Pattern struct { Expression // The expression to parse. Flags CompileFlag // Flags which modify the behaviour of the expression. Id int // The ID number to be associated with the corresponding pattern // contains filtered or unexported fields }
func NewPattern ¶
func NewPattern(expr string, flags CompileFlag) *Pattern
func ParsePattern ¶
Parse pattern from a formated string
/<expression>/[flags]
For example, the following pattern will match `test` in the caseless and multi-lines mode
/test/im
type Platform ¶
type Platform interface { // Information about the target platform which may be used to guide the optimisation process of the compile. Tune() TuneFlag // Relevant CPU features available on the target platform CpuFeatures() CpuFeature }
A type containing information on the target platform.
func NewPlatform ¶
func NewPlatform(tune TuneFlag, cpu CpuFeature) Platform
func PopulatePlatform ¶
func PopulatePlatform() Platform
Populates the platform information based on the current host.
type Scratch ¶
type Scratch struct {
// contains filtered or unexported fields
}
Scratch is a Hyperscan scratch space.
func NewScratch ¶
NewScratch allocate a "scratch" space for use by Hyperscan. This is required for runtime use, and one scratch space per thread, or concurrent caller, is required.
func (*Scratch) Clone ¶
Clone allocate a scratch space that is a clone of an existing scratch space.
type Stream ¶
type Stream interface { Scan(data []byte) error Close() error Reset() error Clone() (Stream, error) }
Stream exist in the Hyperscan library so that pattern matching state can be maintained across multiple blocks of target data
type StreamDatabase ¶
type StreamDatabase interface { Database StreamScanner StreamMatcher StreamSize() (int, error) }
StreamDatabase scan the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream.
func NewStreamDatabase ¶
func NewStreamDatabase(patterns ...*Pattern) (StreamDatabase, error)
func UnmarshalStreamDatabase ¶
func UnmarshalStreamDatabase(data []byte) (StreamDatabase, error)
UnmarshalStreamDatabase reconstruct a stream database from a stream of bytes.
type StreamMatcher ¶
type StreamMatcher interface { }
StreamMatcher implements regular expression search.
type StreamScanner ¶
type StreamScanner interface {
Open(flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error)
}
StreamScanner is the streaming regular expression scanner.
type VectoredDatabase ¶
type VectoredDatabase interface { Database VectoredScanner VectoredMatcher }
VectoredDatabase scan the target data that consists of a list of non-contiguous blocks that are available all at once.
func NewVectoredDatabase ¶
func NewVectoredDatabase(patterns ...*Pattern) (VectoredDatabase, error)
func UnmarshalVectoredDatabase ¶
func UnmarshalVectoredDatabase(data []byte) (VectoredDatabase, error)
UnmarshalVectoredDatabase reconstruct a vectored database from a stream of bytes.
type VectoredMatcher ¶
type VectoredMatcher interface { }
VectoredMatcher implements regular expression search.
type VectoredScanner ¶
type VectoredScanner interface {
Scan(data [][]byte, scratch *Scratch, handler MatchHandler, context interface{}) error
}
VectoredScanner is the vectored regular expression scanner.