re

package module

v0.0.0-...-7eba679 Latest Latest Go to latest Published: Aug 25, 2022 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ghemawat/re

Links

Open Source Insights

README ¶

re package

Package re combines regular expression matching with fmt.Scan like extraction of sub-matches into caller-supplied objects. Pointers to variables can be passed as extra arguments to re.Scan. These variables are filled in with regular expression sub-matches. The sub-matches are parsed appropriately based on the type of the variable. E.g., if a *int is passed in, the sub-match is parsed as a number (and overflow is detected).

For example, the host and port portions of a URL can be extracted as follows:

var host string
var port int
reg := regexp.MustCompile(`^https?://([^/:]+):(\d+)/`)
if err := re.Scan(reg, url, &host, &port); err == nil {
	Process(host, port)
}

A "func([]byte) error" can also be passed in as an extra argument to provide custom parsing.

Installation

go get github.com/ghemawat/re

See godoc for further documentation and examples.

godoc.org/github.com/ghemawat/re

Documentation ¶

Overview ¶

Package re combines regular expression matching with fmt.Scan like extraction of sub-matches into caller-supplied objects. Pointers to variables can be passed as extra arguments to re.Scan. These variables are filled in with regular expression sub-matches. The sub-matches are parsed appropriately based on the type of the variable. E.g., if a *int is passed in, the sub-match is parsed as a number (and overflow is detected).

For example, the host and port portions of a URL can be extracted as follows:

var host string
var port int
reg := regexp.MustCompile(`^https?://([^/:]+):(\d+)/`)
if err := re.Scan(reg, url, &host, &port); err == nil {
	Process(host, port)
}

A "func([]byte) error" can also be passed in as an extra argument to provide custom parsing.

Index ¶

Variables
func Scan(re *regexp.Regexp, input []byte, output ...interface{}) error
type Span

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	NotFound = errors.New("not found")
)

Functions ¶

func Scan ¶

func Scan(re *regexp.Regexp, input []byte, output ...interface{}) error

Scan returns nil if regular expression re matches somewhere in input, and for every non-nil entry in output, the corresponding regular expression sub-match is succesfully parsed and stored into *output[i].

The following can be passed as output arguments to Scan:

nil: The corresponding sub-match is discarded without being saved.

Pointer to string or []byte: The corresponding sub-match is stored in the pointed-to object. When storing into a []byte, no copying is done, and the stored slice is an alias of the input.

Pointer to some built-in numeric types (int, int8, int16, int32, int64, uint, uintptr, uint8, uint16, uint32, uint64, float32, float64): The corresponding sub-match will be parsed as a literal of the numeric type and the result stored into *output[i]. Scan will return an error if the sub-match cannot be parsed successfully, or the parse result is out of range for the type.

Pointer to a rune or a byte: rune is an alias of uint32 and byte is an alias of uint8, so the preceding rule applies; i.e., Scan treats the input as a string of digits to be parsed into the rune or byte. Therefore Scan cannot be used to directly extract a single rune or byte from the input. For that, parse into a string or []byte and use the first element, or pass in a custom parsing function (see below).

func([]byte) error: The function is passed the corresponding sub-match. If the result is a non-nil error, the Scan call fails with that error. Pass in such a function to provide custom parsing: e.g., treating a number as decimal even if it starts with "0" (normally Scan would treat such as a number as octal); or parsing an otherwise unsupported type like time.Duration.

An error is returned if output[i] does not have one of the preceding types. Caveat: the set of supported types might be extended in the future.

Extra sub-matches (ones with no corresponding output) are discarded silently.

Example ¶

Parse a line of ls -l output into its fields.

package main

import (
	"fmt"
	"regexp"

	"github.com/ghemawat/re"
)

func main() {
	var f struct {
		mode, user, group, date, name string
		nlinks, size                  int64
	}

	// Sample output from `ls -l --time-style=iso`
	line := "-rwxr-xr-x 1 root root 110080 2014-03-24  /bin/ls"

	// A regexp that matches such lines.
	r := regexp.MustCompile(`^(.{10}) +(\d+) +(\w+) +(\w+) +(\d+) +(\S+) +(.+)$`)

	// Match line to regexp and extract properties into struct.
	if err := re.Scan(r, []byte(line), &f.mode, &f.nlinks, &f.user, &f.group, &f.size, &f.date, &f.name); err != nil {
		panic(err)
	}
	fmt.Printf("%+v\n", f)
}

Output:

{mode:-rwxr-xr-x user:root group:root date:2014-03-24 name:/bin/ls nlinks:1 size:110080}

Example (BinaryNumber) ¶

Use a custom parsing function that parses a number in binary.

package main

import (
	"fmt"
	"regexp"
	"strconv"

	"github.com/ghemawat/re"
)

func main() {
	var number uint64
	parseBinary := func(b []byte) (err error) {
		number, err = strconv.ParseUint(string(b), 2, 64)
		return err
	}

	r := regexp.MustCompile(`([01]+)`)
	if err := re.Scan(r, []byte("1001"), parseBinary); err != nil {
		panic(err)
	}
	fmt.Println(number)
}

Output:

9

Example (ParseDuration) ¶

Use a custom re-usable parser for time.Duration.

package main

import (
	"fmt"
	"regexp"
	"time"

	"github.com/ghemawat/re"
)

func main() {
	// parseDuration(&d) returns a parser that stores its result in *d.
	parseDuration := func(d *time.Duration) func([]byte) error {
		return func(b []byte) (err error) {
			*d, err = time.ParseDuration(string(b))
			return err
		}
	}

	r := regexp.MustCompile(`^elapsed: (.*)$`)
	var interval time.Duration
	if err := re.Scan(r, []byte("elapsed: 200s"), parseDuration(&interval)); err != nil {
		panic(err)
	}
	fmt.Println(interval)
}

Output:

3m20s

Example (Repeatedly) ¶

package main

import (
	"errors"
	"fmt"
	"regexp"

	"github.com/ghemawat/re"
)

func main() {
	line := []byte("www.google.com:1234 www.google.com:2345")
	r := regexp.MustCompile(`((\S+):(\d+))`)

	for {
		var (
			span re.Span
			host string
			port int
		)
		err := re.Scan(r, line, &span, &host, &port)
		if errors.Is(err, re.NotFound) {
			// Terminate the loop. We're done scanning.
			break
		} else if err != nil {
			// If this wasn't example code, we'd return the error to the caller here.
			fmt.Println("Error encountered:", err)
			return
		}

		fmt.Println("host:", host, "port:", port)
		line = line[span.End:]
	}
}

Output:

host: www.google.com port: 1234
host: www.google.com port: 2345

Types ¶

type Span ¶

type Span struct {
	Start int
	End   int
}

Span is a special type designed to be passed via pointer to Scan. re.Scan will store the starting and ending offsets of the corresponding regular expression capture group into the Span.

This type can be placed anywhere within the list of arguments to scan, but the most typical usage is to find the entire extent of the match, which can be achieved by placing it third (immediately after the regular expression and input), and wrapping the entire regexp in parentheses so that the Span is filled with the extent of the entire match.

Source Files ¶

View all Source files

re.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL