re

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 10, 2023 License: MIT Imports: 15 Imported by: 0

README

Starlark-re

Starlark-re is an implementation of Python's re module for Starlark. Its interface is almost entirely compatible with the Python module, so please refer to the Python documentation to learn how to use Starlark-re.

Getting started

The re.NewModule() function returns a new Starlark value, that represents the re module:

import (
    "go.starlark.net/starlark"
    re "github.com/magnetde/starlark-re"
)

// Add the re module to the globals dict.
globals := starlark.StringDict{
    "re": re.NewModule(),
}

// Execute a Starlark program using the re module.
opts := &syntax.FileOptions{GlobalReassign:  true}
thread := &starlark.Thread{Name: "re thread"}
globals, err := starlark.ExecFileOptions(opts, thread, "example.star", nil, globals)
if err != nil { ... }

example.star:

p = re.compile('(a(b)c)d')
m = p.match('abcd')
print(m.group(2, 1, 0))  # prints: ("b", "abc", "abcd")

m = re.match(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
print(m.groupdict())  # prints: {"first": "Jane", "last": "Doe"}

s = re.split(r'[\W]+', 'Words, words, words.', 1)
print(s)  # prints: ["Words", "words, words."]

p = re.compile('(blue|white|red)')
s = p.subn('colour', 'blue socks and red shoes')
print(s)  # prints: ("colour socks and colour shoes", 2)

p = re.compile('section{ ( [^}]* ) }', re.VERBOSE)
s = p.sub(r'subsection{\1}','section{First} section{second}')
print(s)  # prints: subsection{First} subsection{second}

p = re.compile(r'\d+')
s = p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
print(s)  # prints: ["12", "11", "10"]

s = [m.span() for m in p.finditer('12 drummers drumming, 11 ... 10 ...')]
print(s)  # prints: [(0, 2), (22, 24), (29, 31)]

plusone = lambda m: str(int(m.group(0)) + 1)
s = p.sub(plusone, '4 + 7 = 13', 2)
print(s)  # prints: 5 + 8 = 13

re.purge()

Alternatively, the module can be initialized with other parameters:

options := &ModuleOptions{
    DisableCache:    false,
    MaxCacheSize:    128,
    DisableFallback: true,
}

m := re.NewModuleOptions(options)

How it works

When compiling a regular expression pattern, it is first parsed using a Go implementation of the Python regex parser. This allows to raise the same error messages as the Python module does. The parser yields a tree representation of the pattern, which is then checked for any elements that are currently not supported by the default regex engine (regexp.Regexp). These unsupported elements include:

  • lookahead and lookbehind: (?=...), (?<=...), (?!...) or (?<!...)
  • backreferences: e.g, \1 or (?P=name)
  • conditional expression: (?(id/name)yes-pattern|no-pattern)
  • repetition of type {m,n} where m or n exceeds 1000
  • possessive repetition: ?+, *+, ++, {...}+

If the regular expression pattern does not include any unsupported elements, it is preprocessed and then compiled with the default regex engine. The preprocessor will make necessary modifications to literals, ranges and character classes in the pattern so matching with bytes or using flags such as re.UNICODE, re.IGNORECASE or re.ASCII works exactly like expected.

In case that the regex pattern includes unsupported elements, the regex engine regexp2.Regexp, that supports all of these elements except for possessive repeat, is used instead. However, it should be noted that the using regexp2 may result in higher runtimes, so this engine is only used as a fallback when dealing with regex patterns that contain unsupported elements. Compiled patterns are stored in an LRU cache.

The module was tested against all supported Python tests for the re module (see test_re.py).

Limitations

Currently, there are some differences to the Python re module:

  • The re.LOCALE flag has no effect.
  • Positions are given as byte offsets instead of character offsets (which is the default for Go and Starlark).
  • The fallback engine does not support the longest match search, so some matches starting at the same position may be not found. This may result in different outcomes compared to Python, especially for the fullmatch function.
  • The default regex engine does not match \b at unicode word boundaries, while the fallback engine does.
  • There is no support for possessive repetion operators and Pattern.scanner.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Match

type Match struct {
	// contains filtered or unexported fields
}

Match represents a single regex match.

func (*Match) Attr

func (m *Match) Attr(name string) (starlark.Value, error)

Attr returns the member of the module with the given name. If the member exists in `matchMethods`, a bound method is returned. Alternatively, if the pattern member exists in `matchMembers`, the member value is returned instead. If the member does not exist, `nil, nil` is returned.

func (*Match) AttrNames

func (m *Match) AttrNames() []string

AttrNames lists available dot expression members.

func (*Match) CompareSameType

func (m *Match) CompareSameType(op syntax.Token, y starlark.Value, _ int) (bool, error)

CompareSameType compares this matches to another one. It is only supported, to compare matches for equality and inequality.

func (*Match) Freeze

func (m *Match) Freeze()

Freeze marks the value and all members as frozen.

func (*Match) Get

func (m *Match) Get(v starlark.Value) (starlark.Value, bool, error)

Get returns the value corresponding to the specified key. For the match object, this is equal with calling the `group` function.

func (*Match) Hash

func (m *Match) Hash() (uint32, error)

Hash returns the hash value of this value.

func (*Match) String

func (m *Match) String() string

String returns the string representation of the value.

func (*Match) Truth

func (m *Match) Truth() starlark.Bool

Truth returns the truth value of the object.

func (*Match) Type

func (m *Match) Type() string

Type returns a short string describing the value's type.

type Module

type Module struct {
	// contains filtered or unexported fields
}

Module is a module type used for the "re" module. the "re" module contains an LRU cache for compiled regex patterns. This cache is implemented using a map and a linked list. When the cache exceeds the maximum size, the least recently used element is removed. The module is designed to be thread-safe.

func NewModule

func NewModule() *Module

NewModule creates the Starlark "re" module with the default options returned by `DefaultOptions`.

func NewModuleOptions

func NewModuleOptions(opts *ModuleOptions) *Module

NewModuleOptions creates the Starlark "re" module with custom options. The options may be nil. If this is the case then the default options are used. If the cache size is not a positive integer, the pattern cache is disabled.

func (*Module) Attr

func (m *Module) Attr(name string) (starlark.Value, error)

Attr returns the member of the module with the given name. If the member exists and is of the type `*Builtin`, it becomes bound to this module. Otherwise, the member is returned as normal. If the member does not exist, `nil, nil` is returned.

func (*Module) AttrNames

func (m *Module) AttrNames() []string

AttrNames lists available dot expression members.

func (*Module) Freeze

func (m *Module) Freeze()

Freeze marks the value and all members as frozen.

func (*Module) Hash

func (m *Module) Hash() (uint32, error)

Hash returns an error, because the re module is not hashable.

func (*Module) Members

func (m *Module) Members() starlark.StringDict

Members returns a dictionary containing all members of this module. This function may be useful when needing a dictionary of available members inside a `Load()` function. Note that all members of type `*Builtin` are already bound to this module, making them safe to call.

func (*Module) String

func (m *Module) String() string

String returns the string representation of the value.

func (*Module) Truth

func (m *Module) Truth() starlark.Bool

Truth returns the truth value of the object.

func (*Module) Type

func (m *Module) Type() string

Type returns a short string describing the value's type.

type ModuleOptions

type ModuleOptions struct {
	DisableCache    bool
	MaxCacheSize    int
	DisableFallback bool
}

ModuleOptions represents the available options when initializing the "re" module. There are three options:

  • `DisableCache` disables to store compiled patterns in a pattern cache, resulting in higher runtimes.
  • `MaxCacheSize` sets the maximum size of the cache.
  • `DisableFallback` disables the fallback engine `regexp2.Regexp`. Compiling patterns that are not supported by `regexp.Regexp' will then fail.

func DefaultOptions

func DefaultOptions() *ModuleOptions

DefaultOptions returns the default options:

  • pattern cache is enabled
  • a maximum cache size of 64
  • the fallback regex engine is enabled

type Pattern

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern is a starlark representation of a compiled regex.

func (*Pattern) Attr

func (p *Pattern) Attr(name string) (starlark.Value, error)

Attr returns the member of the module with the given name. If the member exists in `patternMethods`, a bound method is returned. Alternatively, if the pattern member exists in `patternMembers`, the member value is returned instead. If the member does not exist, `nil, nil` is returned.

func (*Pattern) AttrNames

func (p *Pattern) AttrNames() []string

AttrNames lists available dot expression members.

func (*Pattern) CompareSameType

func (p *Pattern) CompareSameType(op syntax.Token, y starlark.Value, _ int) (bool, error)

CompareSameType compares this pattern to another one. It is only possible to compare patterns for equality and inequality.

func (*Pattern) Freeze

func (p *Pattern) Freeze()

Freeze marks the value and all members as frozen.

func (*Pattern) Hash

func (p *Pattern) Hash() (uint32, error)

Hash returns the hash value of this value.

func (*Pattern) String

func (p *Pattern) String() string

String returns the string representation of the value.

func (*Pattern) Truth

func (p *Pattern) Truth() starlark.Bool

Truth returns the truth value of the object.

func (*Pattern) Type

func (p *Pattern) Type() string

Type returns a short string describing the value's type.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL