parser

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 2, 2024 License: Apache-2.0 Imports: 4 Imported by: 0

README

Parser

License Go Reference Go Report Card GitHub CI codecov

Simple, fast, zero-allocation combinatorial parsing with Go

Project Description

parser is intended to be a simple, expressive and easy to use API for all your text parsing needs. It aims to be:

  • Fast: Performant text parsing can be tricky, parser aims to be as fast as possible without compromising safety or error handling. Every parser function has a benchmark and has been written with performance in mind, almost none of them allocate on the heap ⚡️
  • Correct: You get the correct behaviour at all times, on any valid UTF-8 text. Errors are well handled and reported for easy debugging. 100% test coverage.
  • Intuitive: Some parser combinator libraries are tricky to wrap your head around, I want parser to be super simple to use so that anyone can pick it up and be productive quickly
  • Well Documented: Every combinator in parser has a comprehensive doc comment describing it's entire behaviour, as well as an executable example of its use

Installation

go get github.com/FollowTheProcess/parser@latest

Quickstart

Let's borrow the nom example and parse a hex colour!

package main

import (
	"fmt"
	"log"
	"strconv"

	"github.com/FollowTheProcess/parser"
)

// RGB represents a colour.
type RGB struct {
	Red   int
	Green int
	Blue  int
}

// fromHex parses a string into a hex digit.
func fromHex(s string) (int, error) {
	hx, err := strconv.ParseUint(s, 16, 64)
	return int(hx), err
}

// hexPair is a parser that converts a hex string into it's integer value.
func hexPair(colour string) (int, string, error) {
	return parser.Map(
		parser.Take(2),
		fromHex,
	)(colour)
}

func main() {
	// Let's parse this into an RGB
	colour := "#2F14DF"

	// We don't actually care about the #
	_, colour, err := parser.Char('#')(colour)
	if err != nil {
		log.Fatalln(err)
	}

	// We want 3 hex pairs
	pairs, _, err := parser.Count(hexPair, 3)(colour)
	if err != nil {
		log.Fatalln(err)
	}

	if len(pairs) != 3 {
		log.Fatalln("Not enough pairs")
	}

	rgb := RGB{
		Red:   pairs[0],
		Green: pairs[1],
		Blue:  pairs[2],
	}

	fmt.Printf("%#v\n", rgb) // main.RGB{Red:47, Green:20, Blue:223}
}

Credits

This package was created with copier and the FollowTheProcess/go_copier project template.

It is also heavily inspired by nom, an excellent combinatorial parsing library written in Rust.

Documentation

Overview

Package parser implements simple, yet expressive mechanisms for combinatorial parsing in Go.

Example
package main

import (
	"fmt"
	"os"
	"strconv"

	"github.com/FollowTheProcess/parser"
)

// RGB represents a colour.
type RGB struct {
	Red   int
	Green int
	Blue  int
}

// fromHex parses a string into a hex digit.
func fromHex(s string) (int, error) {
	hx, err := strconv.ParseInt(s, 16, 64)
	return int(hx), err
}

// hexPair is a parser that converts a hex string into it's integer value.
func hexPair(colour string) (int, string, error) {
	return parser.Map(
		parser.Take(2),
		fromHex,
	)(colour)
}

func main() {
	// Let's parse this into an RGB
	colour := "#2F14DF"

	// We don't actually care about the #
	_, colour, err := parser.Char('#')(colour)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}

	// We want 3 hex pairs
	pairs, _, err := parser.Count(hexPair, 3)(colour)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}

	if len(pairs) != 3 {
		fmt.Fprintln(os.Stderr, err)
		return
	}

	rgb := RGB{
		Red:   pairs[0],
		Green: pairs[1],
		Blue:  pairs[2],
	}

	fmt.Printf("%#v\n", rgb)

}
Output:

parser_test.RGB{Red:47, Green:20, Blue:223}

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Parser

type Parser[T any] func(input string) (value T, remainder string, err error)

Parser is the core parsing function that all parser functions return, they can be combined and composed to parse complex grammars.

Each Parser is generic over type T and returns the parsed value from the input, the remaining unparsed input and an error.

func AnyOf

func AnyOf(chars string) Parser[string]

AnyOf returns a Parser that continues taking characters so long as they are contained in the passed in set of chars.

Parsing stops at the first occurrence of a character not contained in the argument and the offending character is not included in the parsed value, but will be in the remainder.

AnyOf is the opposite to NotAnyOf.

If the input or chars is empty, an error will be returned. Likewise if none of the chars are present at the start of the input.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "DEADBEEF and the rest"

	chars := "1234567890ABCDEF" // Any hexadecimal digit

	value, remainder, err := parser.AnyOf(chars)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "DEADBEEF"
Remainder: " and the rest"

func Char

func Char(char rune) Parser[string]

Char returns a Parser that consumes a single exact, case-sensitive utf-8 character from the input.

If the first char in the input is not the requested char, an error will be returned.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "X marks the spot!"

	value, remainder, err := parser.Char('X')(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "X"
Remainder: " marks the spot!"

func Count

func Count[T any](parser Parser[T], count int) Parser[[]T]

Count returns a Parser that applies another parser a certain number of times, returning the values in a slice along with any remaining input.

If the parser fails or the input is exhausted before the parser has been applied the requested number of times, an error will be returned.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "12345678rest..." // Pairs of digits with a bit on the end

	value, remainder, err := parser.Count(parser.Take(2), 4)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %#v\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: []string{"12", "34", "56", "78"}
Remainder: "rest..."

func Exact

func Exact(match string) Parser[string]

Exact returns a Parser that consumes an exact, case-sensitive string from the input.

If the string is not present at the beginning of the input, an error will be returned.

An empty match string or empty input (i.e. "") will also return an error.

Exact is case-sensitive, if you need a case-insensitive match, use ExactCaseInsensitive instead.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "General Kenobi! You are a bold one."

	value, remainder, err := parser.Exact("General Kenobi!")(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "General Kenobi!"
Remainder: " You are a bold one."

func ExactCaseInsensitive

func ExactCaseInsensitive(match string) Parser[string]

ExactCaseInsensitive returns a Parser that consumes an exact, case-insensitive string from the input.

If the string is not present at the beginning of the input, an error will be returned.

An empty match string or empty input (i.e. "") will also return an error.

ExactCaseInsensitive is case-insensitive, if you need a case-sensitive match, use Exact instead.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "GENERAL KENOBI! YOU ARE A BOLD ONE."

	value, remainder, err := parser.ExactCaseInsensitive("GEnErAl KeNobI!")(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "GENERAL KENOBI!"
Remainder: " YOU ARE A BOLD ONE."

func Many

func Many[T any](parsers ...Parser[T]) Parser[[]T]

Many returns a Parser that calls a series of sub-parsers, passing the remainder from one as input to the next and returning a slice of values; one from each parser, and any remaining input after applying all the parsers.

If any of the parsers fail, an error will be returned.

Note: Because Many takes a variadic argument and returns a slice, it is one of the only parser functions to allocate on the heap.

Example
package main

import (
	"fmt"
	"os"
	"unicode"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "1234abcd\t\n日ð本rest..."

	value, remainder, err := parser.Many(
		// Can do this is a number of ways, but here's one!
		parser.TakeWhile(unicode.IsDigit),
		parser.Exact("abcd"),
		parser.TakeWhile(unicode.IsSpace),
		parser.Char('日'),
		parser.Char('ð'),
		parser.Char('本'),
	)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %#v\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: []string{"1234", "abcd", "\t\n", "日", "ð", "本"}
Remainder: "rest..."

func Map

func Map[T1, T2 any](parser Parser[T1], fn func(T1) (T2, error)) Parser[T2]

Map returns a Parser that applies a function to the result of another parser.

It is particularly useful for parsing a section of string input, then converting that captured string to another type.

If the provided parser or the mapping function 'fn' return an error, Map will bubble up this error to the caller.

Example
package main

import (
	"fmt"
	"os"
	"strconv"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "27 <- this is a number" // Let's convert it to an int!

	value, remainder, err := parser.Map(parser.Take(2), strconv.Atoi)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value %[1]d is type %[1]T\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value 27 is type int
Remainder: " <- this is a number"

func NoneOf

func NoneOf(chars string) Parser[string]

NoneOf returns a Parser that recognises any char other than any of the provided characters from the start of input.

It can be considered as the opposite to OneOf.

If the input or chars is empty, an error will be returned. Likewise if one of the chars was recognised.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "abcdefg"

	chars := "xyz" // Match anything other than 'x', 'y', or 'z' from input

	value, remainder, err := parser.NoneOf(chars)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "a"
Remainder: "bcdefg"

func NotAnyOf

func NotAnyOf(chars string) Parser[string]

NotAnyOf returns a Parser that continues taking characters so long as they are not contained in the passed in set of chars.

Parsing stops at the first occurrence of a character contained in the argument and the offending character is not included in the parsed value, but will be in the remainder.

NotAnyOf is the opposite of AnyOf.

If the input or chars is empty, an error will be returned. Likewise if any of the chars are present at the start of the input.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "69 is a number"

	chars := "abcdefghijklmnopqrstuvwxyz" // Parse until we hit any lowercase letter

	value, remainder, err := parser.NotAnyOf(chars)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "69 "
Remainder: "is a number"

func OneOf

func OneOf(chars string) Parser[string]

OneOf returns a Parser that recognises one of the provided characters from the start of input.

If you want to match anything other than the provided char set, use NoneOf.

If the input or chars is empty, an error will be returned. Likewise if none of the chars was recognised.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "abcdefg"

	chars := "abc" // Match any of 'a', 'b', or 'c' from input

	value, remainder, err := parser.OneOf(chars)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "a"
Remainder: "bcdefg"

func Optional added in v0.2.0

func Optional(match string) Parser[string]

Optional returns a Parser that recognises an optional exact string from the start of input.

If the match is there, it is returned as the value with the remainder being the remaining input, if the match is not there, the entire input is returned as the remainder with no value and no error.

If the input is empty or invalid utf-8, then an error will be returned.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "12.6.7-rc.2" // A semver, but could have an optional v

	// Doesn't matter...
	value, remainder, err := parser.Optional("v")(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: ""
Remainder: "12.6.7-rc.2"

func Take

func Take(n int) Parser[string]

Take returns a Parser that consumes n utf-8 chars from the input.

If n is less than or equal to 0, or greater than the number of utf-8 chars in the input, an error will be returned.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "Hello I am some input for you to parser"

	value, remainder, err := parser.Take(10)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "Hello I am"
Remainder: " some input for you to parser"

func TakeTo

func TakeTo(match string) Parser[string]

TakeTo returns a Parser that consumes characters until it first hits an exact string.

If the input is empty or the exact string is not in the input, an error will be returned.

The value will contain everything from the start of the input up to the first occurrence of match, and the remainder will contain the match and everything thereafter.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "lots of stuff KEYWORD more stuff"

	value, remainder, err := parser.TakeTo("KEYWORD")(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "lots of stuff "
Remainder: "KEYWORD more stuff"

func TakeUntil

func TakeUntil(predicate func(r rune) bool) Parser[string]

TakeUntil returns a Parser that continues taking characters until the predicate returns true, the parsing stops as soon as the predicate returns true for a particular character. The last character for which the predicate returns false is captured; that is, TakeUntil is inclusive.

TakeUntil can be thought of as the inverse of TakeWhile.

If the input is empty or predicate == nil, an error will be returned.

If the predicate never returns true, the entire input will be returned as the value with no remainder.

A predicate that never returns false will return an error.

Example
package main

import (
	"fmt"
	"os"
	"unicode"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "something <- first whitespace is here"

	value, remainder, err := parser.TakeUntil(unicode.IsSpace)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "something"
Remainder: " <- first whitespace is here"

func TakeWhile

func TakeWhile(predicate func(r rune) bool) Parser[string]

TakeWhile returns a Parser that continues consuming characters so long as the predicate returns true, the parsing stops as soon as the predicate returns false for a particular character. The last character for which the predicate returns true is captured; that is, TakeWhile is inclusive.

TakeWhile can be thought of as the inverse of TakeUntil.

If the input is empty or predicate == nil, an error will be returned.

If the predicate doesn't return false for any char in the input, the entire input is returned as the value with no remainder.

A predicate that never returns true will return an error.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "本本本b語ç日ð本Ê語"

	pred := func(r rune) bool { return r == '本' }

	value, remainder, err := parser.TakeWhile(pred)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "本本本"
Remainder: "b語ç日ð本Ê語"

func TakeWhileBetween

func TakeWhileBetween(lower, upper int, predicate func(r rune) bool) Parser[string]

TakeWhileBetween returns a Parser that recognises the longest (lower <= len <= upper) sequence of utf-8 characters for which the predicate returns true.

Any of the following conditions will return an error:

  • input is empty
  • input is not valid utf-8
  • predicate is nil
  • lower < 0
  • lower > upper
  • predicate never returns true
  • predicate matched some chars but less than lower limit
Example
package main

import (
	"fmt"
	"os"
	"strconv"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "2F14DF" // A hex colour (minus the #)

	isHexDigit := func(r rune) bool {
		_, err := strconv.ParseUint(string(r), 16, 64)
		return err == nil
	}

	value, remainder, err := parser.TakeWhileBetween(2, 2, isHexDigit)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "2F"
Remainder: "14DF"

func Try

func Try[T any](parsers ...Parser[T]) Parser[T]

Try returns a Parser that attempts a series of sub-parsers, returning the output from the first successful one.

If all parsers fail, an error will be returned.

Note: Because Try takes a variadic argument, it is one of the only parser functions to allocate on the heap.

Example
package main

import (
	"fmt"
	"os"

	"github.com/FollowTheProcess/parser"
)

func main() {
	input := "xyzabc日ð本Ê語"

	value, remainder, err := parser.Try(
		parser.OneOf("abc"),                // Will fail
		parser.Char('本'),                   // Same
		parser.ExactCaseInsensitive("XyZ"), // Should succeed, this is the output we'll get
	)(input)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
	}

	fmt.Printf("Value: %q\n", value)
	fmt.Printf("Remainder: %q\n", remainder)

}
Output:

Value: "xyz"
Remainder: "abc日ð本Ê語"

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL