re

package module
v0.0.0-...-7eba679 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 25, 2022 License: Apache-2.0 Imports: 5 Imported by: 0

README

re package

Package re combines regular expression matching with fmt.Scan like extraction of sub-matches into caller-supplied objects. Pointers to variables can be passed as extra arguments to re.Scan. These variables are filled in with regular expression sub-matches. The sub-matches are parsed appropriately based on the type of the variable. E.g., if a *int is passed in, the sub-match is parsed as a number (and overflow is detected).

For example, the host and port portions of a URL can be extracted as follows:

var host string
var port int
reg := regexp.MustCompile(`^https?://([^/:]+):(\d+)/`)
if err := re.Scan(reg, url, &host, &port); err == nil {
	Process(host, port)
}

A "func([]byte) error" can also be passed in as an extra argument to provide custom parsing.

Installation

go get github.com/ghemawat/re

See godoc for further documentation and examples.

Documentation

Overview

Package re combines regular expression matching with fmt.Scan like extraction of sub-matches into caller-supplied objects. Pointers to variables can be passed as extra arguments to re.Scan. These variables are filled in with regular expression sub-matches. The sub-matches are parsed appropriately based on the type of the variable. E.g., if a *int is passed in, the sub-match is parsed as a number (and overflow is detected).

For example, the host and port portions of a URL can be extracted as follows:

var host string
var port int
reg := regexp.MustCompile(`^https?://([^/:]+):(\d+)/`)
if err := re.Scan(reg, url, &host, &port); err == nil {
	Process(host, port)
}

A "func([]byte) error" can also be passed in as an extra argument to provide custom parsing.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	NotFound = errors.New("not found")
)

Functions

func Scan

func Scan(re *regexp.Regexp, input []byte, output ...interface{}) error

Scan returns nil if regular expression re matches somewhere in input, and for every non-nil entry in output, the corresponding regular expression sub-match is succesfully parsed and stored into *output[i].

The following can be passed as output arguments to Scan:

nil: The corresponding sub-match is discarded without being saved.

Pointer to string or []byte: The corresponding sub-match is stored in the pointed-to object. When storing into a []byte, no copying is done, and the stored slice is an alias of the input.

Pointer to some built-in numeric types (int, int8, int16, int32, int64, uint, uintptr, uint8, uint16, uint32, uint64, float32, float64): The corresponding sub-match will be parsed as a literal of the numeric type and the result stored into *output[i]. Scan will return an error if the sub-match cannot be parsed successfully, or the parse result is out of range for the type.

Pointer to a rune or a byte: rune is an alias of uint32 and byte is an alias of uint8, so the preceding rule applies; i.e., Scan treats the input as a string of digits to be parsed into the rune or byte. Therefore Scan cannot be used to directly extract a single rune or byte from the input. For that, parse into a string or []byte and use the first element, or pass in a custom parsing function (see below).

func([]byte) error: The function is passed the corresponding sub-match. If the result is a non-nil error, the Scan call fails with that error. Pass in such a function to provide custom parsing: e.g., treating a number as decimal even if it starts with "0" (normally Scan would treat such as a number as octal); or parsing an otherwise unsupported type like time.Duration.

An error is returned if output[i] does not have one of the preceding types. Caveat: the set of supported types might be extended in the future.

Extra sub-matches (ones with no corresponding output) are discarded silently.

Example

Parse a line of ls -l output into its fields.

package main

import (
	"fmt"
	"regexp"

	"github.com/ghemawat/re"
)

func main() {
	var f struct {
		mode, user, group, date, name string
		nlinks, size                  int64
	}

	// Sample output from `ls -l --time-style=iso`
	line := "-rwxr-xr-x 1 root root 110080 2014-03-24  /bin/ls"

	// A regexp that matches such lines.
	r := regexp.MustCompile(`^(.{10}) +(\d+) +(\w+) +(\w+) +(\d+) +(\S+) +(.+)$`)

	// Match line to regexp and extract properties into struct.
	if err := re.Scan(r, []byte(line), &f.mode, &f.nlinks, &f.user, &f.group, &f.size, &f.date, &f.name); err != nil {
		panic(err)
	}
	fmt.Printf("%+v\n", f)
}
Output:

{mode:-rwxr-xr-x user:root group:root date:2014-03-24 name:/bin/ls nlinks:1 size:110080}
Example (BinaryNumber)

Use a custom parsing function that parses a number in binary.

package main

import (
	"fmt"
	"regexp"
	"strconv"

	"github.com/ghemawat/re"
)

func main() {
	var number uint64
	parseBinary := func(b []byte) (err error) {
		number, err = strconv.ParseUint(string(b), 2, 64)
		return err
	}

	r := regexp.MustCompile(`([01]+)`)
	if err := re.Scan(r, []byte("1001"), parseBinary); err != nil {
		panic(err)
	}
	fmt.Println(number)
}
Output:

9
Example (ParseDuration)

Use a custom re-usable parser for time.Duration.

package main

import (
	"fmt"
	"regexp"
	"time"

	"github.com/ghemawat/re"
)

func main() {
	// parseDuration(&d) returns a parser that stores its result in *d.
	parseDuration := func(d *time.Duration) func([]byte) error {
		return func(b []byte) (err error) {
			*d, err = time.ParseDuration(string(b))
			return err
		}
	}

	r := regexp.MustCompile(`^elapsed: (.*)$`)
	var interval time.Duration
	if err := re.Scan(r, []byte("elapsed: 200s"), parseDuration(&interval)); err != nil {
		panic(err)
	}
	fmt.Println(interval)
}
Output:

3m20s
Example (Repeatedly)
package main

import (
	"errors"
	"fmt"
	"regexp"

	"github.com/ghemawat/re"
)

func main() {
	line := []byte("www.google.com:1234 www.google.com:2345")
	r := regexp.MustCompile(`((\S+):(\d+))`)

	for {
		var (
			span re.Span
			host string
			port int
		)
		err := re.Scan(r, line, &span, &host, &port)
		if errors.Is(err, re.NotFound) {
			// Terminate the loop. We're done scanning.
			break
		} else if err != nil {
			// If this wasn't example code, we'd return the error to the caller here.
			fmt.Println("Error encountered:", err)
			return
		}

		fmt.Println("host:", host, "port:", port)
		line = line[span.End:]
	}
}
Output:

host: www.google.com port: 1234
host: www.google.com port: 2345

Types

type Span

type Span struct {
	Start int
	End   int
}

Span is a special type designed to be passed via pointer to Scan. re.Scan will store the starting and ending offsets of the corresponding regular expression capture group into the Span.

This type can be placed anywhere within the list of arguments to scan, but the most typical usage is to find the entire extent of the match, which can be achieved by placing it third (immediately after the regular expression and input), and wrapping the entire regexp in parentheses so that the Span is filled with the extent of the entire match.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL