pcre

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 11, 2023 License: BSD-3-Clause Imports: 6 Imported by: 136

README

go-pcre

This package provides Perl-Compatible RegularExpression support in Go using libpcre or libpcre++.

The origin of this package is glenn-brown pcre but across other forks, this one has JIT compilation available, which makes it much faster. You can check out benchmarks game to see the difference.

Go library Time (lower is better)
This one 3.85 seconds
mdellandrea (also glenn-brown fork) 14.48 seconds
standard one 27.87 seconds

As you can see, this library is almost an order of magnitude faster than standard one and 3-4 times faster tahn pcre without JIT compilation.

Interface / API

API this library provides is a plain copy of C library API. Which may look really ugly to Go programmers. At least, it looks ugly to me, I don't think it's very convenient to set binary flags to use your regexp.

I want to refactor this library and make v2 version which will have API more like a standard library. If you are interested in such library, hit the star button. The more stars I see, the closer I am to implementing this idea.

Documentation

Use godoc.

Installation

  1. install libpcre3-dev or libpcre++-dev

  2. go get

sudo apt-get install libpcre3-dev
go get github.com/GRbit/go-pcre/

Usage

Go programs that depend on this package should import this package as follows to allow automatic downloading:

import (
  "github.com/GRbit/go-pcre/"
)

Building your software

Since this package use cgo it will build dynamically linked. If you plan to use this everywhere without libpcre dependency, you should build it statically linked. You can build your software with the following options:

go build -ldflags="-extldflags=-static"

More details on this here

Performance

Brief performance comparison across other Go libraries is in the beginning of the README, but if you are curious what regex library is the fastest here is an exhaustive research of the question: https://zherczeg.github.io/sljit/regex_perf.html

The answer is: it depends. But in most cases, it's RE2 or PCRE-JIT. RE2 tends to utilize multi-core systems better, while PCRE-JIT is better at using one CPU core for almost all use cases.

LICENSE

This is a fork of hobeone pcre, which is fork of mathpl pcre, which is a fork of glenn-brown pcre. The original package hasn't been updated for several years. But it is still being used in some software, despite its lack of JIT compiling, which gives huge speed-up to regexps. If you somehow can send a message to the original project owner, please inform him about this situation. Maybe he would like to transfer control over the repository to a maintainer who will have time to review pull requests.

Documentation

Index

Constants

View Source
const (
	ANCHORED        = C.PCRE_ANCHORED
	BSR_ANYCRLF     = C.PCRE_BSR_ANYCRLF
	BSR_UNICODE     = C.PCRE_BSR_UNICODE
	NEWLINE_ANY     = C.PCRE_NEWLINE_ANY
	NEWLINE_ANYCRLF = C.PCRE_NEWLINE_ANYCRLF
	NEWLINE_CR      = C.PCRE_NEWLINE_CR
	NEWLINE_CRLF    = C.PCRE_NEWLINE_CRLF
	NEWLINE_LF      = C.PCRE_NEWLINE_LF
	NO_UTF8_CHECK   = C.PCRE_NO_UTF8_CHECK
)

Flags for Compile and Match functions.

View Source
const (
	CASELESS          = C.PCRE_CASELESS
	DOLLAR_ENDONLY    = C.PCRE_DOLLAR_ENDONLY
	DOTALL            = C.PCRE_DOTALL
	DUPNAMES          = C.PCRE_DUPNAMES
	EXTENDED          = C.PCRE_EXTENDED
	EXTRA             = C.PCRE_EXTRA
	FIRSTLINE         = C.PCRE_FIRSTLINE
	JAVASCRIPT_COMPAT = C.PCRE_JAVASCRIPT_COMPAT
	MULTILINE         = C.PCRE_MULTILINE
	NO_AUTO_CAPTURE   = C.PCRE_NO_AUTO_CAPTURE
	UNGREEDY          = C.PCRE_UNGREEDY
	UTF8              = C.PCRE_UTF8
	UCP               = C.PCRE_UCP
)

Flags for Compile functions

View Source
const (
	NOTBOL            = C.PCRE_NOTBOL
	NOTEOL            = C.PCRE_NOTEOL
	NOTEMPTY          = C.PCRE_NOTEMPTY
	NOTEMPTY_ATSTART  = C.PCRE_NOTEMPTY_ATSTART
	NO_START_OPTIMIZE = C.PCRE_NO_START_OPTIMIZE
	PARTIAL_HARD      = C.PCRE_PARTIAL_HARD
	PARTIAL_SOFT      = C.PCRE_PARTIAL_SOFT
)

Flags for Match functions

View Source
const (
	STUDY_JIT_COMPILE              = C.PCRE_STUDY_JIT_COMPILE
	STUDY_JIT_PARTIAL_SOFT_COMPILE = C.PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
	STUDY_JIT_PARTIAL_HARD_COMPILE = C.PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
)

Flags for Study function

View Source
const (
	CONFIG_JIT                    = C.PCRE_CONFIG_JIT
	CONFIG_JITTARGET              = C.PCRE_CONFIG_JITTARGET
	CONFIG_LINK_SIZE              = C.PCRE_CONFIG_LINK_SIZE
	CONFIG_MATCH_LIMIT            = C.PCRE_CONFIG_MATCH_LIMIT
	CONFIG_MATCH_LIMIT_RECURSION  = C.PCRE_CONFIG_MATCH_LIMIT_RECURSION
	CONFIG_NEWLINE                = C.PCRE_CONFIG_NEWLINE
	CONFIG_BSR                    = C.PCRE_CONFIG_BSR
	CONFIG_POSIX_MALLOC_THRESHOLD = C.PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
	CONFIG_STACKRECURSE           = C.PCRE_CONFIG_STACKRECURSE
	CONFIG_UTF16                  = C.PCRE_CONFIG_UTF16
	CONFIG_UTF32                  = C.PCRE_CONFIG_UTF32
	CONFIG_UTF8                   = C.PCRE_CONFIG_UTF8
	CONFIG_UNICODE_PROPERTIES     = C.PCRE_CONFIG_UNICODE_PROPERTIES
)

Flags for Config() fuction

View Source
const (
	ERROR_NOMATCH        = C.PCRE_ERROR_NOMATCH
	ERROR_NULL           = C.PCRE_ERROR_NULL
	ERROR_BADOPTION      = C.PCRE_ERROR_BADOPTION
	ERROR_BADMAGIC       = C.PCRE_ERROR_BADMAGIC
	ERROR_UNKNOWN_OPCODE = C.PCRE_ERROR_UNKNOWN_OPCODE
	ERROR_UNKNOWN_NODE   = C.PCRE_ERROR_UNKNOWN_NODE
	ERROR_NOMEMORY       = C.PCRE_ERROR_NOMEMORY
	ERROR_NOSUBSTRING    = C.PCRE_ERROR_NOSUBSTRING
	ERROR_MATCHLIMIT     = C.PCRE_ERROR_MATCHLIMIT
	ERROR_CALLOUT        = C.PCRE_ERROR_CALLOUT
	ERROR_BADUTF8        = C.PCRE_ERROR_BADUTF8
	ERROR_BADUTF8_OFFSET = C.PCRE_ERROR_BADUTF8_OFFSET
	ERROR_PARTIAL        = C.PCRE_ERROR_PARTIAL
	ERROR_BADPARTIAL     = C.PCRE_ERROR_BADPARTIAL
	ERROR_RECURSIONLIMIT = C.PCRE_ERROR_RECURSIONLIMIT
	ERROR_INTERNAL       = C.PCRE_ERROR_INTERNAL
	ERROR_BADCOUNT       = C.PCRE_ERROR_BADCOUNT
	ERROR_JIT_STACKLIMIT = C.PCRE_ERROR_JIT_STACKLIMIT
)

Exec-time and get/set-time error codes

Variables

This section is empty.

Functions

func Config

func Config(f int) string

Config function returns information about libpcre configuration. Function pass flag f to C.pcre_config() func, and convert returned value to string type. http://www.pcre.org/original/doc/html/pcre_config.html

func ConfigAll

func ConfigAll() string

ConfigAll function returns string, which contains all information you can access by pcre_config() function

func ParseFlags

func ParseFlags(ptr string) (string, int)

ParseFlags returns string with regex pattern and int with pcre flags. Flags are specified before the regex in form like this "(?flags)regex" Supported symbols i=CASELESS; m=MULTILINE; s=DOTALL; U=UNGREEDY; J=DUPNAMES; x=EXTENDED; X=EXTRA; D=DOLLAR_ENDONLY; u=UTF8|UCP;

Types

type Matcher

type Matcher struct {
	Groups int

	Matches  bool   // last match was successful
	Partial  bool   // was the last match a partial match?
	Error    error  // pcre_exec error from last match
	SubjectS string // contain found subject as string
	SubjectB []byte // contain found subject as []byte
	// contains filtered or unexported fields
}

Matcher objects provide a place for storing match results. They can be created by the NewMatcher and NewMatcherString functions, or they can be initialized with Reset or ResetString.

func (*Matcher) Exec

func (m *Matcher) Exec(subject []byte, flags int) int

Exec tries to match the specified byte array slice to the current pattern. Returns exec result. C docs http://www.pcre.org/original/doc/html/pcre_exec.html

func (*Matcher) ExecString

func (m *Matcher) ExecString(subject string, flags int) int

ExecString is same as Exec, but accept string as argument

func (*Matcher) Extract

func (m *Matcher) Extract() [][]byte

Extract returns the captured string with sub-matches of the last match (performed by Matcher, MatcherString, Reset, ResetString, Match, or MatchString). Group 0 is the part of the subject which matches the whole pattern; the first actual capture group is numbered 1. Capture groups which are not present return a nil slice.

func (*Matcher) ExtractString

func (m *Matcher) ExtractString() []string

ExtractString is same as Extract, but returns []string

func (*Matcher) Group

func (m *Matcher) Group(group int) []byte

Group returns the numbered capture group of the last match (performed by Matcher, MatcherString, Reset, ResetString, Match, or MatchString). Group 0 is the part of the subject which matches the whole pattern; the first actual capture group is numbered 1. Capture groups which are not present return a nil slice.

func (*Matcher) GroupIndices

func (m *Matcher) GroupIndices(group int) []int

GroupIndices returns the numbered capture group positions of the last match (performed by Matcher, MatcherString, Reset, ResetString, Match, or MatchString). Group 0 is the part of the subject which matches the whole pattern; the first actual capture group is numbered 1. Capture groups which are not present return a nil slice.

func (*Matcher) GroupString

func (m *Matcher) GroupString(group int) string

GroupString is same as Group, but returns string

func (*Matcher) Index

func (m *Matcher) Index() []int

Index returns the start and end of the first match, if a previous call to Matcher, MatcherString, Reset, ResetString, Match or MatchString succeeded. loc[0] is the start and loc[1] is the end.

func (*Matcher) MatchStringWFlags

func (m *Matcher) MatchStringWFlags(subject string, flags int) bool

MatchStringWFlags tries to match the specified subject string to the pattern. Returns true if the match succeeds.

func (*Matcher) MatchWFlags

func (m *Matcher) MatchWFlags(subject []byte, flags int) bool

MatchWFlags tries to match the specified byte array slice to the pattern. Returns true if the match succeeds.

func (*Matcher) Named

func (m *Matcher) Named(group string) (g []byte, err error)

Named returns the value of the named capture group. This is a nil slice if the capture group is not present. Panics if the name does not refer to a group.

func (*Matcher) NamedPresent

func (m *Matcher) NamedPresent(group string) (pres bool)

NamedPresent returns true if the named capture group is present. Panics if the name does not refer to a group.

func (*Matcher) NamedString

func (m *Matcher) NamedString(group string) (g string, err error)

NamedString returns the value of the named capture group, or an empty string if the capture group is not present. Panics if the name does not refer to a group.

func (*Matcher) Present

func (m *Matcher) Present(group int) bool

Present returns true if the numbered capture group is present in the last match (performed by Matcher, MatcherString, Reset, ResetString, Match, or MatchString). Group numbers start at 1. A capture group can be present and match the empty string.

func (*Matcher) Reset

func (m *Matcher) Reset(re Regexp, subject []byte, flags int)

Reset switches the matcher object to the specified pattern and subject.

func (*Matcher) ResetString

func (m *Matcher) ResetString(re Regexp, subject string, flags int)

ResetString switches the matcher object to the specified pattern and subject string.

type Regexp

type Regexp struct {
	// contains filtered or unexported fields
}

Regexp is a reference to a compiled regular expression. Use Compile or MustCompile to create such objects.

func Compile

func Compile(pattern string, flags int) (Regexp, error)

Compile try to compile the pattern. If an error occurs, the second return value is non-nil.

func CompileJIT

func CompileJIT(pattern string, flagsC, flagsS int) (Regexp, error)

CompileJIT compiles pattern with jit compilation. flagC is Compile flags, flagS is study flag.

func CompileParse

func CompileParse(ptr string) (Regexp, error)

CompileParse try to parse flags of regex and compile it. If an error occurs, the second return value is non-nil. Flags are specified before the regex in form like this "(?flags)regex"

func CompileParseJIT

func CompileParseJIT(ptr string, flags int) (Regexp, error)

CompileParseJIT try to parse flags of regex and compile it with JIT optimization. If an error occurs, the second return value is non-nil.

func MustCompile

func MustCompile(pattern string, flag int) (re Regexp)

MustCompile is same as Compile but if compilation fails, panic.

func MustCompileJIT

func MustCompileJIT(pattern string, flagsC, flagsS int) (re Regexp)

MustCompileJIT is same as CompileJIT but if compilation fails, panic.

func MustCompileParse

func MustCompileParse(pattern string) (re Regexp)

MustCompileParse is same as CompileParse but if compilation fails, panic.

func MustCompileParseJIT

func MustCompileParseJIT(pattern string, flags int) (re Regexp)

MustCompileParseJIT is same as CompileParseJIT but if compilation fails, panic.

func (*Regexp) FindAllIndex

func (re *Regexp) FindAllIndex(bytes []byte, flags int) (r [][]int)

FindAllIndex returns the start and end of the first match.

func (*Regexp) FindIndex

func (re *Regexp) FindIndex(bytes []byte, flags int) []int

FindIndex returns the start and end of the first match, or nil if no match. loc[0] is the start and loc[1] is the end.

func (*Regexp) FindString

func (re *Regexp) FindString(s string, flags int) string

FindString returns the start and end of the first match, or nil if no match. loc[0] is the start and loc[1] is the end.

func (Regexp) Groups

func (re Regexp) Groups() int

Groups return the number of capture groups in the compiled regexp pattern.

func (*Regexp) MatchStringWFlags

func (re *Regexp) MatchStringWFlags(subject string, flags int) bool

MatchStringWFlags is the same as MatchWFlags, but accept string as argument.

func (*Regexp) MatchWFlags

func (re *Regexp) MatchWFlags(subject []byte, flags int) bool

MatchWFlags tries to match the specified byte array slice to the pattern. Returns true if the match succeeds.

func (Regexp) NewMatcher

func (re Regexp) NewMatcher(subject []byte, flags int) *Matcher

NewMatcher return a new matcher object, with the byte array slice as a subject.

func (Regexp) NewMatcherString

func (re Regexp) NewMatcherString(subject string, flags int) *Matcher

NewMatcherString return a new matcher object, with the subject string.

func (Regexp) ReplaceAll

func (re Regexp) ReplaceAll(bytes, repl []byte, flags int) []byte

ReplaceAll return a copy of a byte slice with pattern matches replaced by repl.

func (Regexp) ReplaceAllString

func (re Regexp) ReplaceAllString(subj, repl string, flags int) string

ReplaceAllString is same as ReplaceAll, but accept strings as arguments

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL