hyperscan

package
v1.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2023 License: Apache-2.0, MIT Imports: 9 Imported by: 18

Documentation

Overview

Package hyperscan is the Golang binding for Intel's HyperScan regex matching library: [hyperscan.io](https://www.hyperscan.io/)

Hyperscan (https://github.com/intel/hyperscan) is a software regular expression matching engine designed with high performance and flexibility in mind. It is implemented as a library that exposes a straightforward C API.

Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data. Hyperscan is typically used in a DPI library stack. The Hyperscan API itself is composed of two major components:

Compilation

These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Hyperscan scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently. If a pattern cannot be built into a database for any reason (such as the use of an unsupported expression construct, or the overflowing of a resource limit), an error will be returned by the pattern compiler. Compiled databases can be serialized and relocated, so that they can be stored to disk or moved between hosts. They can also be targeted to particular platform features (for example, the use of Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions).

See Compiling Patterns for more detail. (http://intel.github.io/hyperscan/dev-reference/compilation.html)

Scanning

Once a Hyperscan database has been created, it can be used to scan data in memory. Hyperscan provides several scanning modes, depending on whether the data to be scanned is available as a single contiguous block, whether it is distributed amongst several blocks in memory at the same time, or whether it is to be scanned as a sequence of blocks in a stream. Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match. For a given database, Hyperscan provides several guarantees:

1. No memory allocations occur at runtime with the exception of two fixed-size allocations, both of which should be done ahead of time for performance-critical applications:

1.1 Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.

1.2 Stream state: in streaming mode only, some state space is required to store data that persists between scan calls for each stream. This allows Hyperscan to track matches that span multiple blocks of data.

2. The sizes of the scratch space and stream state (in streaming mode) required for a given database are fixed and determined at database compile time. This means that the memory requirements of the application are known ahead of time, and these structures can be pre-allocated if required for performance reasons.

3. Any pattern that has successfully been compiled by the Hyperscan compiler can be scanned against any input. There are no internal resource limits or other limitations at runtime that could cause a scan call to return an error.

See Scanning for Patterns for more detail. (http://intel.github.io/hyperscan/dev-reference/runtime.html)

Building a Database

The Hyperscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data. Compilation allows the Hyperscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time. When compiling expressions, a decision needs to be made whether the resulting compiled patterns are to be used in a streaming, block or vectored mode:

1. Streaming mode: the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream. In streaming mode, each stream requires a block of memory to store its state between scan calls.

2. Block mode: the target data is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.

3. Vectored mode: the target data consists of a list of non-contiguous blocks that are available all at once. As for block mode, no retention of state is required.

Index

Examples

Constants

View Source
const (
	// FloatNumber for matching floating point numbers.
	FloatNumber = `(?:` +
		`[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?.)`

	// IPv4Address for matching IPv4 address.
	IPv4Address = `(?:` +
		`(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}` +
		`(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))`

	// EmailAddress for matching email address.
	EmailAddress = `(?:` +
		`^[A-Za-z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@` +
		`([A-Za-z0-9]+)(([\.\-]?[a-zA-Z0-9]+)*)\.([A-Za-z]{2,})$)`

	// CreditCard for matching credit card number.
	CreditCard = `(?:` +
		`4[0-9]{12}(?:[0-9]{3})?|` +
		`5[1-5][0-9]{14}|` +
		`3[47][0-9]{13}|` +
		`3(?:0[0-5]|[68][0-9])[0-9]{11}|` +
		`6(?:011|5[0-9]{2})[0-9]{12}|` +
		`(?:2131|1800|35\d{3})\d{11})` // JCB
)

Variables

View Source
var ErrTooManyMatches = errors.New("too many matches")

ErrTooManyMatches means too many matches.

Functions

func Match

func Match(pattern string, data []byte) (bool, error)

Match reports whether the byte slice b contains any match of the regular expression pattern.

Example
package main

import (
	"fmt"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	matched, err := hyperscan.Match(`foo.*`, []byte(`seafood`))
	fmt.Println(matched, err)
	matched, err = hyperscan.Match(`bar.*`, []byte(`seafood`))
	fmt.Println(matched, err)
	matched, err = hyperscan.Match(`a(b`, []byte(`seafood`))
	fmt.Println(matched, err)
}
Output:

true <nil>
false <nil>
false parse pattern, pattern `a(b`, Missing close parenthesis for group started at index 1.

func MatchReader

func MatchReader(pattern string, reader io.Reader) (bool, error)

MatchReader reports whether the text returned by the Reader contains any match of the regular expression pattern.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	s := strings.NewReader(strings.Repeat("a", 4096) + `seafood`)
	matched, err := hyperscan.MatchReader(`foo.*`, s)
	fmt.Println(matched, err)
	matched, err = hyperscan.MatchReader(`bar.*`, s)
	fmt.Println(matched, err)
	matched, err = hyperscan.MatchReader(`a(b`, s)
	fmt.Println(matched, err)
}
Output:

true <nil>
false <nil>
false parse pattern, pattern `a(b`, Missing close parenthesis for group started at index 1.

func MatchString

func MatchString(pattern, s string) (matched bool, err error)

MatchString reports whether the string s contains any match of the regular expression pattern.

func Quote

func Quote(s string) string

Quote returns a quoted string literal representing s.

func SerializedDatabaseSize

func SerializedDatabaseSize(data []byte) (int, error)

SerializedDatabaseSize reports the size that would be required by a database if it were deserialized.

func ValidPlatform

func ValidPlatform() error

ValidPlatform test the current system architecture.

func Version

func Version() string

Version identify this release version. The return version is a string containing the version number of this release build and the date of the build.

Types

type BlockDatabase

type BlockDatabase interface {
	Database
	BlockScanner
	BlockMatcher
}

BlockDatabase scan the target data that is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.

func NewBlockDatabase

func NewBlockDatabase(patterns ...*Pattern) (bdb BlockDatabase, err error)

NewBlockDatabase create a block database base on the patterns.

func NewManagedBlockDatabase added in v1.1.1

func NewManagedBlockDatabase(patterns ...*Pattern) (BlockDatabase, error)

NewManagedBlockDatabase is a wrapper for NewBlockDatabase that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.

func UnmarshalBlockDatabase

func UnmarshalBlockDatabase(data []byte) (BlockDatabase, error)

UnmarshalBlockDatabase reconstruct a block database from a stream of bytes.

type BlockMatcher

type BlockMatcher interface {
	// Find returns a slice holding the text of the leftmost match in b of the regular expression.
	// A return value of nil indicates no match.
	Find(data []byte) []byte

	// FindIndex returns a two-element slice of integers defining
	// the location of the leftmost match in b of the regular expression.
	// The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindIndex(data []byte) []int

	// FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAll(data []byte, n int) [][]byte

	// FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllIndex(data []byte, n int) [][]int

	// FindString returns a string holding the text of the leftmost match in s of the regular expression.
	// If there is no match, the return value is an empty string, but it will also be empty
	// if the regular expression successfully matches an empty string.
	// Use FindStringIndex if it is necessary to distinguish these cases.
	FindString(s string) string

	// FindStringIndex returns a two-element slice of integers defining
	// the location of the leftmost match in s of the regular expression.
	// The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindStringIndex(s string) []int

	// FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllString(s string, n int) []string

	// FindAllStringIndex is the 'All' version of FindStringIndex;
	// it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllStringIndex(s string, n int) [][]int

	// Match reports whether the pattern database matches the byte slice b.
	Match(b []byte) bool

	// MatchString reports whether the pattern database matches the string s.
	MatchString(s string) bool
}

BlockMatcher implements regular expression search.

type BlockScanner

type BlockScanner interface {
	// This is the function call in which the actual pattern matching takes place for block-mode pattern databases.
	Scan(data []byte, scratch *Scratch, handler MatchHandler, context interface{}) error
}

BlockScanner is the block (non-streaming) regular expression scanner.

Example
package main

import (
	"fmt"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	// Pattern with `L` flag enable leftmost start of match reporting.
	p, err := hyperscan.ParsePattern(`/foo(bar)+/L`)
	if err != nil {
		fmt.Println("parse pattern failed,", err)
		return
	}

	// Create new block database with pattern
	db, err := hyperscan.NewBlockDatabase(p)
	if err != nil {
		fmt.Println("create database failed,", err)
		return
	}
	defer db.Close()

	// Create new scratch for scanning
	s, err := hyperscan.NewScratch(db)
	if err != nil {
		fmt.Println("create scratch failed,", err)
		return
	}

	defer func() {
		_ = s.Free()
	}()

	// Record matching text
	type Match struct {
		from uint64
		to   uint64
	}

	var matches []Match

	handler := hyperscan.MatchHandler(func(id uint, from, to uint64, flags uint, context interface{}) error {
		matches = append(matches, Match{from, to})
		return nil
	})

	data := []byte("hello foobarbar!")

	// Scan data block with handler
	if err := db.Scan(data, s, handler, nil); err != nil {
		fmt.Println("database scan failed,", err)
		return
	}

	// Hyperscan will reports all matches
	for _, m := range matches {
		fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to]))
	}

}
Output:

match [ 6 : 12 ] foobar
match [ 6 : 15 ] foobarbar

type Builder added in v1.1.1

type Builder interface {
	// Build the database with the given mode.
	Build(mode ModeFlag) (Database, error)

	// ForPlatform determine the target platform for the database
	ForPlatform(mode ModeFlag, platform Platform) (Database, error)
}

Builder creates a database with the given mode and target platform.

type CompileError added in v1.2.0

type CompileError = hs.CompileError

A type containing error details that is returned by the compile calls on failure.

The caller may inspect the values returned in this type to determine the cause of failure.

type CompileFlag

type CompileFlag = hs.CompileFlag

CompileFlag represents a pattern flag.

const (
	// Caseless represents set case-insensitive matching.
	Caseless CompileFlag = hs.Caseless
	// DotAll represents matching a `.` will not exclude newlines.
	DotAll CompileFlag = hs.DotAll
	// MultiLine set multi-line anchoring.
	MultiLine CompileFlag = hs.MultiLine
	// SingleMatch set single-match only mode.
	SingleMatch CompileFlag = hs.SingleMatch
	// AllowEmpty allow expressions that can match against empty buffers.
	AllowEmpty CompileFlag = hs.AllowEmpty
	// Utf8Mode enable UTF-8 mode for this expression.
	Utf8Mode CompileFlag = hs.Utf8Mode
	// UnicodeProperty enable Unicode property support for this expression.
	UnicodeProperty CompileFlag = hs.UnicodeProperty
	// PrefilterMode enable prefiltering mode for this expression.
	PrefilterMode CompileFlag = hs.PrefilterMode
	// SomLeftMost enable leftmost start of match reporting.
	SomLeftMost CompileFlag = hs.SomLeftMost
)
const (
	// Combination represents logical combination.
	Combination CompileFlag = hs.Combination
	// Quiet represents don't do any match reporting.
	Quiet CompileFlag = hs.Quiet
)

func ParseCompileFlag

func ParseCompileFlag(s string) (CompileFlag, error)

ParseCompileFlag parse the compile pattern flags from string

i	Caseless 		Case-insensitive matching
s	DotAll			Dot (.) will match newlines
m	MultiLine		Multi-line anchoring
H	SingleMatch		Report match ID at most once (`o` deprecated)
V	AllowEmpty		Allow patterns that can match against empty buffers (`e` deprecated)
8	Utf8Mode		UTF-8 mode (`u` deprecated)
W	UnicodeProperty		Unicode property support (`p` deprecated)
P	PrefilterMode		Prefiltering mode (`f` deprecated)
L	SomLeftMost		Leftmost start of match reporting (`l` deprecated)
C	Combination		Logical combination of patterns (Hyperscan 5.0)
Q	Quiet			Quiet at matching (Hyperscan 5.0)

type CpuFeature

type CpuFeature = hs.CpuFeature //nolint: golint,stylecheck,revive

CpuFeature is the CPU feature support flags.

const (
	// AVX2 is a CPU features flag indicates that the target platform supports AVX2 instructions.
	AVX2 CpuFeature = hs.AVX2
	// AVX512 is a CPU features flag indicates that the target platform supports AVX512 instructions,
	// specifically AVX-512BW. Using AVX512 implies the use of AVX2.
	AVX512 CpuFeature = hs.AVX512
)

type Database

type Database interface {
	// Provides information about a database.
	Info() (DbInfo, error)

	// Provides the size of the given database in bytes.
	Size() (int, error)

	// Free a compiled pattern database.
	Close() error

	// Serialize a pattern database to a stream of bytes.
	Marshal() ([]byte, error)

	// Reconstruct a pattern database from a stream of bytes at a given memory location.
	Unmarshal([]byte) error
}

Database is an immutable database that can be used by the Hyperscan scanning API.

func Compile

func Compile(expr string) (Database, error)

Compile a regular expression and returns, if successful, a pattern database in the block mode that can be used to match against text.

func MustCompile

func MustCompile(expr string) Database

MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

func UnmarshalDatabase

func UnmarshalDatabase(data []byte) (Database, error)

UnmarshalDatabase reconstruct a pattern database from a stream of bytes.

type DatabaseBuilder

type DatabaseBuilder struct {
	// Array of patterns to compile.
	Patterns

	// Compiler mode flags that affect the database as a whole. (Default: block mode)
	Mode ModeFlag

	// If not nil, the platform structure is used to determine the target platform for the database.
	// If nil, a database suitable for running on the current host platform is produced.
	Platform Platform
}

DatabaseBuilder creates a database that will be used to matching the patterns.

func (*DatabaseBuilder) AddExpressionWithFlags

func (b *DatabaseBuilder) AddExpressionWithFlags(expr string, flags CompileFlag) *DatabaseBuilder

AddExpressionWithFlags add more expressions with flags to the database.

func (*DatabaseBuilder) AddExpressions

func (b *DatabaseBuilder) AddExpressions(exprs ...string) *DatabaseBuilder

AddExpressions add more expressions to the database.

func (*DatabaseBuilder) Build

func (b *DatabaseBuilder) Build() (Database, error)

Build a database base on the expressions and platform.

type DbInfo

type DbInfo string //nolint: stylecheck

DbInfo identify the version and platform information for the supplied database.

func SerializedDatabaseInfo

func SerializedDatabaseInfo(data []byte) (DbInfo, error)

SerializedDatabaseInfo provides information about a serialized database.

func (DbInfo) Mode

func (i DbInfo) Mode() (ModeFlag, error)

Mode is the scanning mode for the supplied database.

func (DbInfo) Parse added in v1.2.0

func (i DbInfo) Parse() (version, features, mode string, err error)

Parse the version and platform information.

func (DbInfo) String

func (i DbInfo) String() string

func (DbInfo) Version

func (i DbInfo) Version() (string, error)

Version is the version for the supplied database.

type Error added in v1.2.0

type Error = hs.Error

Error is the type type for errors returned by Hyperscan functions.

const (
	// ErrSuccess is the error returned if the engine completed normally.
	ErrSuccess Error = hs.ErrSuccess
	// ErrInvalid is the error returned if a parameter passed to this function was invalid.
	ErrInvalid Error = hs.ErrInvalid
	// ErrNoMemory is the error returned if a memory allocation failed.
	ErrNoMemory Error = hs.ErrNoMemory
	// ErrScanTerminated is the error returned if the engine was terminated by callback.
	ErrScanTerminated Error = hs.ErrScanTerminated
	// ErrCompileError is the error returned if the pattern compiler failed.
	ErrCompileError Error = hs.ErrCompileError
	// ErrDatabaseVersionError is the error returned if the given database was built for a different version of Hyperscan.
	ErrDatabaseVersionError Error = hs.ErrDatabaseVersionError
	// ErrDatabasePlatformError is the error returned if the given database was built for a different platform.
	ErrDatabasePlatformError Error = hs.ErrDatabasePlatformError
	// ErrDatabaseModeError is the error returned if the given database was built for a different mode of operation.
	ErrDatabaseModeError Error = hs.ErrDatabaseModeError
	// ErrBadAlign is the error returned if a parameter passed to this function was not correctly aligned.
	ErrBadAlign Error = hs.ErrBadAlign
	// ErrBadAlloc is the error returned if the memory allocator did not correctly return memory suitably aligned.
	ErrBadAlloc Error = hs.ErrBadAlloc
	// ErrScratchInUse is the error returned if the scratch region was already in use.
	ErrScratchInUse Error = hs.ErrScratchInUse
	// ErrArchError is the error returned if unsupported CPU architecture.
	ErrArchError Error = hs.ErrArchError
	// ErrInsufficientSpace is the error returned if provided buffer was too small.
	ErrInsufficientSpace Error = hs.ErrInsufficientSpace
)

type ExprExt

type ExprExt hs.ExprExt

ExprExt is a structure containing additional parameters related to an expression.

func NewExprExt added in v1.2.0

func NewExprExt(exts ...Ext) (ext *ExprExt)

func ParseExprExt added in v1.1.0

func ParseExprExt(s string) (ext *ExprExt, err error)

ParseExprExt parse containing additional parameters from string.

func (*ExprExt) String added in v1.1.0

func (ext *ExprExt) String() string

func (*ExprExt) With added in v1.1.0

func (ext *ExprExt) With(exts ...Ext) *ExprExt

With specifies the additional parameters related to an expression.

type ExprInfo

type ExprInfo = hs.ExprInfo

ExprInfo containing information related to an expression.

type Ext added in v1.1.0

type Ext func(ext *ExprExt)

Ext is a option containing additional parameters related to an expression.

func EditDistance

func EditDistance(n uint32) Ext

EditDistance allow patterns to approximately match within this edit distance.

func HammingDistance

func HammingDistance(n uint32) Ext

HammingDistance allow patterns to approximately match within this Hamming distance.

func MaxOffset

func MaxOffset(n uint64) Ext

MaxOffset given the maximum end offset in the data stream at which this expression should match successfully.

func MinLength

func MinLength(n uint64) Ext

MinLength given the minimum match length (from start to end) required to successfully match this expression.

func MinOffset

func MinOffset(n uint64) Ext

MinOffset given the minimum end offset in the data stream at which this expression should match successfully.

type ExtFlag

type ExtFlag = hs.ExtFlag

ExtFlag are used in ExprExt.Flags to indicate which fields are used.

const (
	// ExtMinOffset is a flag indicating that the ExprExt.MinOffset field is used.
	ExtMinOffset ExtFlag = hs.ExtMinOffset
	// ExtMaxOffset is a flag indicating that the ExprExt.MaxOffset field is used.
	ExtMaxOffset ExtFlag = hs.ExtMaxOffset
	// ExtMinLength is a flag indicating that the ExprExt.MinLength field is used.
	ExtMinLength ExtFlag = hs.ExtMinLength
	// ExtEditDistance is a flag indicating that the ExprExt.EditDistance field is used.
	ExtEditDistance ExtFlag = hs.ExtEditDistance
	// ExtHammingDistance is a flag indicating that the ExprExt.HammingDistance field is used.
	ExtHammingDistance ExtFlag = hs.ExtHammingDistance
)

type HsError

type HsError = Error

HsError is the type type for errors returned by Hyperscan functions.

type MatchContext

type MatchContext interface {
	Database() Database

	Scratch() Scratch

	UserData() interface{}
}

MatchContext represents a match context.

type MatchEvent

type MatchEvent interface {
	Id() uint

	From() uint64

	To() uint64

	Flags() ScanFlag
}

MatchEvent indicates a match event.

type MatchHandler

type MatchHandler = hs.MatchEventHandler

MatchHandler handles match events.

type ModeFlag

type ModeFlag = hs.ModeFlag

ModeFlag represents the compile mode flags.

const (
	// BlockMode for the block scan (non-streaming) database.
	BlockMode ModeFlag = hs.BlockMode
	// NoStreamMode is alias for Block.
	NoStreamMode ModeFlag = hs.NoStreamMode
	// StreamMode for the streaming database.
	StreamMode ModeFlag = hs.StreamMode
	// VectoredMode for the vectored scanning database.
	VectoredMode ModeFlag = hs.VectoredMode
	// SomHorizonLargeMode use full precision to track start of match offsets in stream state.
	SomHorizonLargeMode ModeFlag = hs.SomHorizonLargeMode
	// SomHorizonMediumMode use medium precision to track start of match offsets in stream state (within 2^32 bytes).
	SomHorizonMediumMode ModeFlag = hs.SomHorizonMediumMode
	// SomHorizonSmallMode use limited precision to track start of match offsets in stream state (within 2^16 bytes).
	SomHorizonSmallMode ModeFlag = hs.SomHorizonSmallMode
)

func ParseModeFlag

func ParseModeFlag(s string) (ModeFlag, error)

ParseModeFlag parse a database mode from string.

type Pattern

type Pattern struct {
	Expression string      // The expression to parse.
	Flags      CompileFlag // Flags which modify the behaviour of the expression.
	// The ID number to be associated with the corresponding pattern
	Id int //nolint: revive,stylecheck
	// contains filtered or unexported fields
}

Pattern is a matching pattern.

Example

This example demonstrates construct and match a pattern.

package main

import (
	"fmt"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	p := hyperscan.NewPattern(`foo.*bar`, hyperscan.Caseless)
	fmt.Println(p)

	db, err := hyperscan.NewBlockDatabase(p)
	fmt.Println(err)

	found := db.MatchString("fooxyzbarbar")
	fmt.Println(found)

}
Output:

/foo.*bar/i
<nil>
true

func NewPattern

func NewPattern(expr string, flags CompileFlag, exts ...Ext) *Pattern

NewPattern returns a new pattern base on expression and compile flags.

func ParsePattern

func ParsePattern(s string) (*Pattern, error)

ParsePattern parse pattern from a formated string.

<integer id>:/<expression>/<flags>

For example, the following pattern will match `test` in the caseless and multi-lines mode

/test/im
Example

This example demonstrates parsing pattern with id and flags.

package main

import (
	"fmt"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	p, err := hyperscan.ParsePattern("3:/foobar/i8")

	fmt.Println(err)
	fmt.Println(p.Id)
	fmt.Println(p.Expression)
	fmt.Println(p.Flags)

}
Output:

<nil>
3
foobar
8i

func (*Pattern) Build added in v1.1.1

func (p *Pattern) Build(mode ModeFlag) (Database, error)

Build the database with the given mode.

func (*Pattern) Ext added in v1.1.0

func (p *Pattern) Ext() (*ExprExt, error)

Ext provides additional parameters related to an expression.

func (*Pattern) ForPlatform added in v1.1.1

func (p *Pattern) ForPlatform(mode ModeFlag, platform Platform) (Database, error)

ForPlatform determine the target platform for the database.

func (*Pattern) Info

func (p *Pattern) Info() (*ExprInfo, error)

Info provides information about a regular expression.

func (*Pattern) IsValid

func (p *Pattern) IsValid() bool

IsValid validate the pattern contains a regular expression.

func (*Pattern) Pattern added in v1.2.0

func (p *Pattern) Pattern() *hs.Pattern

func (*Pattern) Patterns added in v1.2.0

func (p *Pattern) Patterns() []*hs.Pattern

func (*Pattern) String

func (p *Pattern) String() string

func (*Pattern) WithExt added in v1.1.0

func (p *Pattern) WithExt(exts ...Ext) *Pattern

WithExt is used to set the additional parameters related to an expression.

type Patterns added in v1.1.0

type Patterns []*Pattern

Patterns is a set of matching patterns.

func ParsePatterns added in v1.1.0

func ParsePatterns(r io.Reader) (patterns Patterns, err error)

ParsePatterns parse lines as `Patterns`.

Example

This example demonstrates parsing patterns with comment.

package main

import (
	"fmt"
	"strings"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	patterns, err := hyperscan.ParsePatterns(strings.NewReader(`
# empty line and comment will be skipped

1:/hatstand.*teakettle/s
2:/(hatstand|teakettle)/iH
3:/^.{10,20}hatstand/m
`))

	fmt.Println(err)

	for _, p := range patterns {
		fmt.Println(p)
	}

}
Output:

<nil>
1:/hatstand.*teakettle/s
2:/(hatstand|teakettle)/Hi
3:/^.{10,20}hatstand/m

func (Patterns) Build added in v1.1.1

func (p Patterns) Build(mode ModeFlag) (Database, error)

Build the database with the given mode.

func (Patterns) ForPlatform added in v1.1.1

func (p Patterns) ForPlatform(mode ModeFlag, platform Platform) (Database, error)

ForPlatform determine the target platform for the database.

func (Patterns) Patterns added in v1.2.0

func (p Patterns) Patterns() (r []*hs.Pattern)

type Platform

type Platform interface {
	// Information about the target platform which may be used to guide the optimisation process of the compile.
	Tune() TuneFlag

	// Relevant CPU features available on the target platform
	CpuFeatures() CpuFeature
}

Platform is a type containing information on the target platform.

func NewPlatform

func NewPlatform(tune TuneFlag, cpu CpuFeature) Platform

NewPlatform create a new platform information on the target platform.

func PopulatePlatform

func PopulatePlatform() Platform

PopulatePlatform populates the platform information based on the current host.

type ScanFlag

type ScanFlag = hs.ScanFlag

type Scratch

type Scratch struct {
	// contains filtered or unexported fields
}

Scratch is a Hyperscan scratch space.

func NewManagedScratch added in v1.1.1

func NewManagedScratch(db Database) (*Scratch, error)

NewManagedScratch is a wrapper for NewScratch that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.

func NewScratch

func NewScratch(db Database) (*Scratch, error)

NewScratch allocate a "scratch" space for use by Hyperscan. This is required for runtime use, and one scratch space per thread, or concurrent caller, is required.

func (*Scratch) Clone

func (s *Scratch) Clone() (*Scratch, error)

Clone allocate a scratch space that is a clone of an existing scratch space.

func (*Scratch) Free

func (s *Scratch) Free() error

Free a scratch block previously allocated.

func (*Scratch) Realloc

func (s *Scratch) Realloc(db Database) error

Realloc reallocate the scratch for another database.

func (*Scratch) Size

func (s *Scratch) Size() (int, error)

Size provides the size of the given scratch space.

type Stream

type Stream interface {
	Scan(data []byte) error

	Close() error

	Reset() error

	Clone() (Stream, error)
}

Stream exist in the Hyperscan library so that pattern matching state can be maintained across multiple blocks of target data.

type StreamCompressor

type StreamCompressor interface {
	// Creates a compressed representation of the provided stream in the buffer provided.
	Compress(stream Stream) ([]byte, error)

	// Decompresses a compressed representation created by `CompressStream` into a new stream.
	Expand(buf []byte, flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error)

	// Decompresses a compressed representation created by `CompressStream` on top of the 'to' stream.
	ResetAndExpand(stream Stream, buf []byte, flags ScanFlag, scratch *Scratch,
		handler MatchHandler, context interface{}) (Stream, error)
}

StreamCompressor implements stream compressor.

type StreamDatabase

type StreamDatabase interface {
	Database
	StreamScanner
	StreamMatcher
	StreamCompressor

	StreamSize() (int, error)
}

StreamDatabase scan the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream.

func NewLargeStreamDatabase

func NewLargeStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)

NewLargeStreamDatabase create a large-sized stream database base on the patterns.

func NewManagedStreamDatabase added in v1.1.1

func NewManagedStreamDatabase(patterns ...*Pattern) (StreamDatabase, error)

NewManagedStreamDatabase is a wrapper for NewStreamDatabase that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.

func NewMediumStreamDatabase

func NewMediumStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)

NewMediumStreamDatabase create a medium-sized stream database base on the patterns.

func NewStreamDatabase

func NewStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)

NewStreamDatabase create a stream database base on the patterns.

func UnmarshalStreamDatabase

func UnmarshalStreamDatabase(data []byte) (StreamDatabase, error)

UnmarshalStreamDatabase reconstruct a stream database from a stream of bytes.

type StreamMatcher

type StreamMatcher interface {
	// Find returns a slice holding the text of the leftmost match in b of the regular expression.
	// A return value of nil indicates no match.
	Find(reader io.ReadSeeker) []byte

	// FindIndex returns a two-element slice of integers defining
	// the location of the leftmost match in b of the regular expression.
	// The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindIndex(reader io.Reader) []int

	// FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAll(reader io.ReadSeeker, n int) [][]byte

	// FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllIndex(reader io.Reader, n int) [][]int

	// Match reports whether the pattern database matches the byte slice b.
	Match(reader io.Reader) bool
}

StreamMatcher implements regular expression search.

type StreamScanner

type StreamScanner interface {
	Open(flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error)

	Scan(reader io.Reader, scratch *Scratch, handler MatchHandler, context interface{}) error
}

StreamScanner is the streaming regular expression scanner.

Example
package main

import (
	"fmt"

	"github.com/flier/gohs/hyperscan"
)

func main() { //nolint:funlen
	// Pattern with `L` flag enable leftmost start of match reporting.
	p, err := hyperscan.ParsePattern(`/foo(bar)+/L`)
	if err != nil {
		fmt.Println("parse pattern failed,", err)
		return
	}

	// Create new stream database with pattern
	db, err := hyperscan.NewStreamDatabase(p)
	if err != nil {
		fmt.Println("create database failed,", err)
		return
	}
	defer db.Close()

	// Create new scratch for scanning
	s, err := hyperscan.NewScratch(db)
	if err != nil {
		fmt.Println("create scratch failed,", err)
		return
	}

	defer func() {
		_ = s.Free()
	}()

	// Record matching text
	type Match struct {
		from uint64
		to   uint64
	}

	var matches []Match

	handler := hyperscan.MatchHandler(func(id uint, from, to uint64, flags uint, context interface{}) error {
		matches = append(matches, Match{from, to})
		return nil
	})

	data := []byte("hello foobarbar!")

	// Open stream with handler
	st, err := db.Open(0, s, handler, nil)
	if err != nil {
		fmt.Println("open streaming database failed,", err)
		return
	}

	// Scan data with stream
	for i := 0; i < len(data); i += 4 {
		start := i
		end := i + 4

		if end > len(data) {
			end = len(data)
		}

		if err = st.Scan(data[start:end]); err != nil {
			fmt.Println("streaming scan failed,", err)
			return
		}
	}

	// Close stream
	if err = st.Close(); err != nil {
		fmt.Println("streaming scan failed,", err)
		return
	}

	// Hyperscan will reports all matches
	for _, m := range matches {
		fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to]))
	}

}
Output:

match [ 6 : 12 ] foobar
match [ 6 : 15 ] foobarbar

type TuneFlag

type TuneFlag = hs.TuneFlag
const (
	// Generic indicates that the compiled database should not be tuned for any particular target platform.
	Generic TuneFlag = hs.Generic
	// SandyBridge indicates that the compiled database should be tuned for the Sandy Bridge microarchitecture.
	SandyBridge TuneFlag = hs.SandyBridge
	// IvyBridge indicates that the compiled database should be tuned for the Ivy Bridge microarchitecture.
	IvyBridge TuneFlag = hs.IvyBridge
	// Haswell indicates that the compiled database should be tuned for the Haswell microarchitecture.
	Haswell TuneFlag = hs.Haswell
	// Silvermont indicates that the compiled database should be tuned for the Silvermont microarchitecture.
	Silvermont TuneFlag = hs.Silvermont
	// Broadwell indicates that the compiled database should be tuned for the Broadwell microarchitecture.
	Broadwell TuneFlag = hs.Broadwell
	// Skylake indicates that the compiled database should be tuned for the Skylake microarchitecture.
	Skylake TuneFlag = hs.Skylake
	// SkylakeServer indicates that the compiled database should be tuned for the Skylake Server microarchitecture.
	SkylakeServer TuneFlag = hs.SkylakeServer
	// Goldmont indicates that the compiled database should be tuned for the Goldmont microarchitecture.
	Goldmont TuneFlag = hs.Goldmont
)

type VectoredDatabase

type VectoredDatabase interface {
	Database
	VectoredScanner
	VectoredMatcher
}

VectoredDatabase scan the target data that consists of a list of non-contiguous blocks that are available all at once.

func NewVectoredDatabase

func NewVectoredDatabase(patterns ...*Pattern) (vdb VectoredDatabase, err error)

NewVectoredDatabase create a vectored database base on the patterns.

func UnmarshalVectoredDatabase

func UnmarshalVectoredDatabase(data []byte) (VectoredDatabase, error)

UnmarshalVectoredDatabase reconstruct a vectored database from a stream of bytes.

type VectoredMatcher

type VectoredMatcher interface{}

VectoredMatcher implements regular expression search.

type VectoredScanner

type VectoredScanner interface {
	Scan(data [][]byte, scratch *Scratch, handler MatchHandler, context interface{}) error
}

VectoredScanner is the vectored regular expression scanner.

Example
package main

import (
	"fmt"

	"github.com/flier/gohs/hyperscan"
)

func main() {
	// Pattern with `L` flag enable leftmost start of match reporting.
	p, err := hyperscan.ParsePattern(`/foo(bar)+/L`)
	if err != nil {
		fmt.Println("parse pattern failed,", err)
		return
	}

	// Create new vectored database with pattern
	db, err := hyperscan.NewVectoredDatabase(p)
	if err != nil {
		fmt.Println("create database failed,", err)
		return
	}
	defer db.Close()

	// Create new scratch for scanning
	s, err := hyperscan.NewScratch(db)
	if err != nil {
		fmt.Println("create scratch failed,", err)
		return
	}

	defer func() {
		_ = s.Free()
	}()

	// Record matching text
	type Match struct {
		from uint64
		to   uint64
	}

	var matches []Match

	handler := hyperscan.MatchHandler(func(id uint, from, to uint64, flags uint, context interface{}) error {
		matches = append(matches, Match{from, to})
		return nil
	})

	data := []byte("hello foobarbar!")

	// Scan vectored data with handler
	if err := db.Scan([][]byte{data[:8], data[8:12], data[12:]}, s, handler, nil); err != nil {
		fmt.Println("database scan failed,", err)
		return
	}

	// Hyperscan will reports all matches
	for _, m := range matches {
		fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to]))
	}

}
Output:

match [ 6 : 12 ] foobar
match [ 6 : 15 ] foobarbar

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL