hre

package module
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 25, 2022 License: MIT Imports: 7 Imported by: 1

README

HRE

GoDoc Go Report Card Build Status

Regexp dialect for Hangulize

  • ^ - the beginning of word not line
  • $ - the end of word not line
  • ^^ - the beginning of line (^ in the standard)
  • $$ - the end of line ($ in the standard)
  • cat{dog} - "cat" before "dog" (positive lookahead)
  • {dog}cat - "cat" after "dog" (positive lookbehind)
  • cat{~dog} - "cat" before not "dog" (negative lookahead)
  • {~dog}cat - "cat" after not "dog" (negative lookbehind)
  • <var> - one of letters in the variable "var"

Documentation

Overview

Package hre provides the regular expression dialect for Hangulize called HRE. HRE focuses on a very narrow usage.

The HRE syntax is based on RE2. But it tweaks the assertions. For example, in HRE ^ matches with every beginning of a word, not only the beginning of a string.

Lookaround is not supported in RE2 because there's no known efficient algorithm without backtracking to implement it. Anyways, HRE provides a simplified lookaround. The syntax {...} is for the positive lookaround and {~...} is for the negative lookaround. The lookaround is restircted to place at the leftmost or rightmost.

"foo{bar}"
"{~bar}foo"

The time complexity of the negative lookbehind is O(n²) while other assertions can be done in O(n). The negative lookbehind should not be used for very long string.

HRE also provides macros and variables.

macros := map[string]string {
	"@": "<vowels>",
}

vars := map[string][]string {
	"abc":    []string{"a", "b", "c"},
	"vowels": []string{"a", "e", "i", "o", "u"},
}

p, err := NewPattern("<abc>@", macros, vars)
// The p matches with "ai", "be" or "ci".

Index

Examples

Constants

View Source
const Version = "0.2.2"

Version is the version number of HRE package. The version follows Semantic Versioning 2.0.0.

Variables

This section is empty.

Functions

func RegexpMaxWidth added in v0.2.0

func RegexpMaxWidth(re *syntax.Regexp) int

RegexpMaxWidth calculates the maximum width of a parsed Regexp pattern.

Types

type Pattern

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern represents an HRE (Hangulize-specific Regular Expression) pattern.

The transcription logic includes several rewriting rules. A rule has a Pattern and an RPattern. A sub-word which is matched with the Pattern, will be rewritten by the RPattern.

rewrite:
    "'"        -> ""
    "^gli$"    -> "li"
    "^glia$"   -> "g.lia"
    "^glioma$" -> "g.lioma"
    "^gli{@}"  -> "li"
    "{@}gli"   -> "li"
    "gn{@}"    -> "nJ"
    "gn"       -> "n"

Some expressions in Pattern have special meaning:

"^"      // start of chunk
"^^"     // start of string
"$"      // end of chunk
"$$"     // end of string
"{...}"  // zero-width match
"{~...}" // zero-width negative match
"<var>"  // one of var values (defined in spec)

func NewPattern

func NewPattern(
	expr string,

	macros map[string]string,
	vars map[string][]string,

) (*Pattern, error)

NewPattern compiles an HRE pattern from an expression.

func (*Pattern) Explain

func (p *Pattern) Explain() string

Explain shows the HRE expression with the underlying standard regexp patterns.

func (*Pattern) Find

func (p *Pattern) Find(word string, n int) [][]int

Find searches up to n matches in the word. If n is -1, it will search all matches. The result is an array of submatch locations.

Example
p, _ := NewPattern("^he(l+o){,}", nil, nil)
fmt.Println(p.Find("hello, helo, hellllo", -1))
Output:

[[0 5 2 5] [7 11 9 11]]

func (*Pattern) Letters

func (p *Pattern) Letters() []string

Letters returns the set of natural letters used in the expression in ascending order.

Example
p, _ := NewPattern("^hello{,}", nil, nil)
fmt.Println(p.Letters())
Output:

[, e h l o]

func (*Pattern) NegativeLookaroundWidths added in v0.2.2

func (p *Pattern) NegativeLookaroundWidths() (negAWidth int, negBWidth int)

NegativeLookaroundWidths returns the potential widths of negative lookahead and negative lookbehind.

-1 means unlimited. An unlimited negative lookround width leads to a polynominal time to match. Otherwise, the match consumes only a linear time.

func (*Pattern) Replace

func (p *Pattern) Replace(word string, rpat *RPattern, n int) string

Replace finds matches and replaces by the given RPattern.

Example
p, _ := NewPattern("foo{~bar}", nil, nil)
rp := NewRPattern("xxx", nil, nil)
fmt.Println(p.Replace("foo foobar foobaz", rp, -1))
Output:

xxx foobar xxxbaz

func (*Pattern) String

func (p *Pattern) String() string

type RPattern

type RPattern struct {
	// contains filtered or unexported fields
}

RPattern is a dynamic replacement pattern.

Some expressions in RPattern have special meaning:

"{}"    // zero-width space
"<var>" // ...

"R" in the name means "replacement" or "right-side".

func NewRPattern

func NewRPattern(
	expr string,

	macros map[string]string,
	vars map[string][]string,

) *RPattern

NewRPattern parses the given expression and creates an RPattern.

func (*RPattern) Interpolate

func (rp *RPattern) Interpolate(
	p *Pattern, word string, m []int,
) (string, error)

Interpolate determines the final replacement based on the matched Pattern.

func (*RPattern) Letters

func (rp *RPattern) Letters() []string

Letters returns the set of natural letters used in the expression in ascending order.

func (*RPattern) String

func (rp *RPattern) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL