byline

package module

v1.1.1 Latest Latest Go to latest Published: Dec 6, 2018 License: MIT Imports: 6 Imported by: 17

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/msoap/byline

Links

Open Source Insights

README ¶

byline Reader

Go-library for reading and processing data from a io.Reader line by line. Now you can add UNIX text processing principles to its Reader (like with awk, grep, sed ...).

Install

go get -u github.com/msoap/byline

Usage

import "github.com/msoap/byline"

// Create new line-by-line Reader from io.Reader:
lr := byline.NewReader(reader)

// Add to the Reader stack of a filter functions:
lr.MapString(func(line string) string {return "prefix_" + line}).GrepByRegexp(regexp.MustCompile("only this"))

// Read all content
result, err := lr.ReadAll()

// Use everywhere instead of io.Reader
_, err := io.Copy(os.Stdout, lr)

// Or in one place
result, err := byline.NewReader(reader).MapString(func(line string) string {return "prefix_" + line}).ReadAll()

Filter functions

Map(func([]byte) []byte) - processing of each line as []byte.
MapErr(func([]byte) ([]byte, error)) - processing of each line as []byte, and you can return error, io.EOF or custom error.
MapString(func(string) string) - processing of each line as string.
MapStringErr(func(string) (string, error)) - processing of each line as string, and you can return error.
Each(func([]byte)) - processing each line without changing the line
EachString(func(string)) - processing each line as string without changing the line
Grep(func([]byte) bool) - filtering lines by function.
GrepString(func(string) bool) - filtering lines as string by function.
GrepByRegexp(re *regexp.Regexp) - filtering lines by regexp.
AWKMode(func(line string, fields []string, vars AWKVars) (string, error)) - processing of each line in AWK mode. In addition to current line, filterFn gets slice with fields splitted by separator (default is /\s+/) and vars releated to awk (NR, NF, RS, FS). Attention! Use AWKMode() with caution on large data sets, see Overheads below.

Map*Err, AWKMode methods can return byline.ErrOmitLine - error for discard processing of current line.

Helper methods

SetRS(rs byte) - set line (record) separator, default is newline - \n.
SetFS(fs *regexp.Regexp) - set field separator for AWK mode, default is \s+.
Discard() - discard all content from Reader only for side effect of filter functions.
ReadAll() ([]byte, error) - return all content as slice of bytes.
ReadAllSlice() ([][]byte, error) - return all content by lines as [][]byte.
ReadAllString() (string, error) - return all content as string.
ReadAllSliceString() ([]string, error) - return all content by lines as slice of strings.

Examples

Add line number to each line and add suffix at the end of line:

reader := strings.NewReader("111\n222\n333")
// or read file
reader, err := os.Open("file.txt")
// or process response from HTTP client
reader := httpResponse.Body

i := 0
blr := byline.NewReader(reader).MapString(func(line string) string {
	i++
	return fmt.Sprintf("(%d) %s", i, string(line))
}).Map(func(line []byte) []byte {
	return regexp.MustCompile(`\n?$`).ReplaceAll(line, []byte(" suf\n"))
})

result, err := blr.ReadAll()

Select all types from the Go-source:

type StateMachine struct {
	beginRe *regexp.Regexp
	endRe   *regexp.Regexp
	inBlock bool
}

func (sm *StateMachine) SMFilter(line []byte) bool {
	switch {
	case sm.beginRe.Match(line):
		sm.inBlock = true
		return true
	case sm.inBlock && sm.endRe.Match(line):
		sm.inBlock = false
		return true
	default:
		return sm.inBlock
	}
}

func ExampleReader_Grep() {
	file, err := os.Open("byline.go")
	if err != nil {
		fmt.Println(err)
		return
	}

	// get all lines between "^type..." and "^}"
	sm := StateMachine{
		beginRe: regexp.MustCompile(`^type `),
		endRe:   regexp.MustCompile(`^}\s+$`),
	}

	blr := byline.NewReader(file).Grep(sm.SMFilter).Map(func(line []byte) []byte {
		// and remove comments
		return regexp.MustCompile(`\s+//.+`).ReplaceAll(line, []byte{})
	})

	result, err := blr.ReadAllString()
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Print(result)
}

Output:

type Reader struct {
	scanner     *bufio.Scanner
	buffer      bytes.Buffer
	existsData  bool
	filterFuncs []func(line []byte) ([]byte, error)
	awkVars     AWKVars
}
type AWKVars struct {
	NR int
	NF int
	RS byte
	FS *regexp.Regexp
}

Example of AWK mode, sum the third column with the filter (>10.0):

// CSV with "#" instead of "\n"
reader := strings.NewReader(`1,name one,12.3#2,second row;7.1#3,three row;15.51`)

sum := 0.0
err := byline.NewReader(reader).
	SetRS('#').
	SetFS(regexp.MustCompile(`[,;]`)).
	AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
		if vars.NF < 3 {
			return "", fmt.Errorf("csv parse failed for %q", line)
		}

		if price, err := strconv.ParseFloat(fields[2], 10); err != nil {
			return "", err
		} else if price < 10 {
			return "", byline.ErrOmitLine
		} else {
			sum += price
			return "", nil
		}
	}).Discard()

if err != nil {
	fmt.Println("Price sum:", sum)
}

Output:

Price sum: 27.81

Overheads

An example in which we get odd lines (for io.Reader with 10000 lines):

❯ make benchmark
go test -benchtime 5s -benchmem -bench .
Benchmark_NativeScannerBytes-4       	   20000	    312502 ns/op	  215080 B/op	      24 allocs/op
Benchmark_NativeScannerOnlyCount-4   	   30000	    217491 ns/op	    4160 B/op	       4 allocs/op
Benchmark_MapBytes-4                 	   10000	    567421 ns/op	  135184 B/op	      17 allocs/op
Benchmark_MapString-4                	    5000	   1408956 ns/op	  374000 B/op	   15018 allocs/op
Benchmark_Grep-4                     	   10000	    592100 ns/op	  135200 B/op	      18 allocs/op
Benchmark_GrepString-4               	    5000	   1151309 ns/op	  294416 B/op	   10019 allocs/op
Benchmark_Each-4                     	   10000	    562337 ns/op	    6201 B/op	      13 allocs/op
Benchmark_EachString-4               	   10000	    991528 ns/op	  165427 B/op	   10013 allocs/op
Benchmark_AWKMode-4                  	     500	  11865482 ns/op	 3410392 B/op	   55466 allocs/op
PASS

See benchmark_test.go for benchmark code

Documentation ¶

Overview ¶

Package byline implements Reader interface for processing io.Reader line-by-line. You can add UNIX text processing principles to its Reader (like with awk, grep, sed ...).

Install

go get -u github.com/msoap/byline

Usage

import "github.com/msoap/byline"

// Create new line-by-line Reader from io.Reader:
lr := byline.NewReader(reader)

// Add to the Reader stack of a filter functions:
lr.MapString(func(line string) string {return "prefix_" + line}).GrepByRegexp(regexp.MustCompile("only this"))

// Read all content
result, err := lr.ReadAll()

// Use everywhere instead of io.Reader
_, err := io.Copy(os.Stdout, lr)

// Or in one place
result, err := byline.NewReader(reader).MapString(func(line string) string {return "prefix_" + line}).ReadAll()

Example ¶

package main

import (
	"bytes"
	"fmt"
	"io"
	"regexp"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`CSV Title
CSV description
ID,NAME,PRICE
A001,name one,12.3

A002,second row;7.1
A003,three row;15.51
Total: ....
Some text
`)

	lr := byline.NewReader(reader).
		GrepString(func(line string) bool {
			// skip empty lines
			return line != "" && line != "\n"
		}).
		Grep(func(line []byte) bool {
			return !bytes.HasPrefix(line, []byte("CSV"))
		}).
		SetFS(regexp.MustCompile(`[,;]`)).
		AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
			// skip header
			if strings.HasPrefix(fields[0], "ID") {
				return "", byline.ErrOmitLine
			}
			// skip footer
			if strings.HasPrefix(fields[0], "Total:") {
				return "", io.EOF
			}
			return line, nil
		}).
		MapString(func(line string) string {
			return "Z" + line
		}).
		AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
			if vars.NF < 3 {
				return "", fmt.Errorf("csv parse failed for %q", line)
			}

			return fmt.Sprintf("%s - %s (line:%d)", fields[0], fields[1], vars.NR), nil
		})

	result, err := lr.ReadAllString()
	fmt.Print("\n", result, err)
}

Output:

ZA001 - name one (line:4)
ZA002 - second row (line:6)
ZA003 - three row (line:7)
<nil>

Index ¶

Variables
type AWKVars
type Reader
- func NewReader(reader io.Reader) *Reader

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ErrOmitLine - error for Map*Err/AWKMode, for omitting current line
	ErrOmitLine = errors.New("ErrOmitLine")

	// ErrNilReader - error for provided reader being nil
	ErrNilReader = errors.New("nil reader")
)

Functions ¶

This section is empty.

Types ¶

type AWKVars ¶

type AWKVars struct {
	NR int            // number of the current line (begin from 1)
	NF int            // number of fields in the current line
	RS byte           // record separator, default is '\n'
	FS *regexp.Regexp // field separator, default is `\s+`
}

AWKVars - settings for AWK mode, see man awk

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader - line by line Reader

func NewReader ¶

func NewReader(reader io.Reader) *Reader

NewReader - get new line by line Reader

func (*Reader) AWKMode ¶

func (lr *Reader) AWKMode(filterFn func(line string, fields []string, vars AWKVars) (string, error)) *Reader

AWKMode - process lines with AWK like mode

Example ¶

package main

import (
	"fmt"
	"io"
	"regexp"
	"strconv"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`ID,NAME,PRICE
A001,name one,12.3
A002,second row;7.1
A003,three row;15.51
Total: ....
Some text
`)

	sum := 0.0
	lr := byline.NewReader(reader).
		SetFS(regexp.MustCompile(`[,;]`)).
		AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
			if vars.NR == 1 {
				// skip first line
				return "", byline.ErrOmitLine
			}

			if vars.NF > 0 && strings.HasPrefix(fields[0], "Total:") {
				// skip rest of file
				return "", io.EOF
			}

			if vars.NF < 3 {
				return "", fmt.Errorf("csv parse failed for %q", line)
			}

			if price, err := strconv.ParseFloat(fields[2], 10); err != nil {
				return "", err
			} else if price < 10 {
				return "", byline.ErrOmitLine
			} else {
				sum += price
			}

			return fmt.Sprintf("line:%d. %s - %s", vars.NR, fields[0], fields[1]), nil
		})

	result, err := lr.ReadAllString()
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Print(result)
	fmt.Printf("Sum: %.2f", sum)
}

Output:

line:2. A001 - name one
line:4. A003 - three row
Sum: 27.81

func (*Reader) Discard ¶

func (lr *Reader) Discard() error

Discard - read all content from Reader for side effect from filter functions

func (*Reader) Each ¶

func (lr *Reader) Each(filterFn func([]byte)) *Reader

Each - processing each line. Do not save the value of the byte slice, since it can change in the next filter-steps.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`1 1 1
2 2 2
3 3 3
`)

	spacesCount, bytesCount, linesCount := 0, 0, 0
	err := byline.NewReader(reader).
		Each(func(line []byte) {
			linesCount++
			bytesCount += len(line)
			for _, b := range line {
				if b == ' ' {
					spacesCount++
				}
			}
		}).Discard()

	if err == nil {
		fmt.Printf("spaces: %d, bytes: %d, lines: %d\n", spacesCount, bytesCount, linesCount)
	}
}

Output:

spaces: 6, bytes: 18, lines: 3

func (*Reader) EachString ¶

func (lr *Reader) EachString(filterFn func(string)) *Reader

EachString - processing each line as string

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`111
222
333
`)

	result := []string{}
	err := byline.NewReader(reader).
		EachString(func(line string) {
			result = append(result, line)
		}).Discard()

	if err == nil {
		fmt.Printf("%q\n", result)
	}
}

Output:

["111\n" "222\n" "333\n"]

func (*Reader) Grep ¶

func (lr *Reader) Grep(filterFn func([]byte) bool) *Reader

Grep - grep lines by func

Example ¶

package main

import (
	"fmt"
	"os"
	"regexp"

	"github.com/msoap/byline"
)

type StateMachine struct {
	beginRe *regexp.Regexp
	endRe   *regexp.Regexp
	inBlock bool
}

func (sm *StateMachine) SMFilter(line []byte) bool {
	switch {
	case sm.beginRe.Match(line):
		sm.inBlock = true
		return true
	case sm.inBlock && sm.endRe.Match(line):
		sm.inBlock = false
		return true
	default:
		return sm.inBlock
	}
}

func main() {
	file, err := os.Open("byline.go")
	if err != nil {
		fmt.Println(err)
		return
	}

	// get all lines between "^type..." and "^}"
	sm := StateMachine{
		beginRe: regexp.MustCompile(`^type `),
		endRe:   regexp.MustCompile(`^}\s+$`),
	}

	lr := byline.NewReader(file).Grep(sm.SMFilter).Map(func(line []byte) []byte {
		// and remove comments
		return regexp.MustCompile(`\s+//.+`).ReplaceAll(line, []byte{})
	})

	result, err := lr.ReadAllString()
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Print("\n" + result)
}

Output:

type Reader struct {
	scanner     *bufio.Scanner
	buffer      bytes.Buffer
	existsData  bool
	filterFuncs []func(line []byte) ([]byte, error)
	awkVars     AWKVars
}
type AWKVars struct {
	NR int
	NF int
	RS byte
	FS *regexp.Regexp
}

func (*Reader) GrepByRegexp ¶

func (lr *Reader) GrepByRegexp(re *regexp.Regexp) *Reader

GrepByRegexp - grep lines by regexp

Example ¶

package main

import (
	"fmt"
	"regexp"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`ID,NAME,PRICE
A001,name one,12.3
A002,second row;7.1
A003,three row;15.51
Total: ....
Some text
`)

	result, err := byline.NewReader(reader).GrepByRegexp(regexp.MustCompile(`^A\d+,`)).ReadAllString()
	fmt.Print("\n"+result, err)
}

Output:

A001,name one,12.3
A002,second row;7.1
A003,three row;15.51
<nil>

func (*Reader) GrepString ¶

func (lr *Reader) GrepString(filterFn func(string) bool) *Reader

GrepString - grep lines as string by func

func (*Reader) Map ¶

func (lr *Reader) Map(filterFn func([]byte) []byte) *Reader

Map - set filter function for process each line

func (*Reader) MapErr ¶

func (lr *Reader) MapErr(filterFn func([]byte) ([]byte, error)) *Reader

MapErr - set filter function for process each line, returns error if needed (io.EOF for example)

func (*Reader) MapString ¶

func (lr *Reader) MapString(filterFn func(string) string) *Reader

MapString - set filter function for process each line as string

func (*Reader) MapStringErr ¶

func (lr *Reader) MapStringErr(filterFn func(string) (string, error)) *Reader

MapStringErr - set filter function for process each line as string, returns error if needed (io.EOF for example)

Example ¶

package main

import (
	"fmt"
	"io"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`
100000
200000
300000
end ...
Some text
`)

	result, err := byline.NewReader(reader).
		MapStringErr(func(line string) (string, error) {
			switch {
			case line == "" || line == "\n":
				return "", byline.ErrOmitLine
			case strings.HasPrefix(line, "end "):
				return "", io.EOF
			default:
				return "<" + line, nil
			}
		}).
		ReadAllString()

	fmt.Print("\n"+result, err)
}

Output:

<100000
<200000
<300000
<nil>

func (*Reader) Read ¶

func (lr *Reader) Read(p []byte) (n int, err error)

Read - implement io.Reader interface

func (*Reader) ReadAll ¶

func (lr *Reader) ReadAll() ([]byte, error)

ReadAll - read all content from Reader to slice of bytes

func (*Reader) ReadAllSlice ¶

func (lr *Reader) ReadAllSlice() ([][]byte, error)

ReadAllSlice - read all content from Reader by lines to slice of []byte

func (*Reader) ReadAllSliceString ¶

func (lr *Reader) ReadAllSliceString() ([]string, error)

ReadAllSliceString - read all content from Reader to string slice by lines

func (*Reader) ReadAllString ¶

func (lr *Reader) ReadAllString() (string, error)

ReadAllString - read all content from Reader to one string

func (*Reader) SetFS ¶

func (lr *Reader) SetFS(fs *regexp.Regexp) *Reader

SetFS - set field separator for AWK mode

func (*Reader) SetRS ¶

func (lr *Reader) SetRS(rs byte) *Reader

SetRS - set lines (records) separator

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

byline Reader

Install

Usage

Filter functions

Helper methods

Examples

Overheads

See also

Documentation ¶

Overview ¶

Index ¶

Examples ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type AWKVars ¶

type Reader ¶

func NewReader ¶

func (*Reader) AWKMode ¶

func (*Reader) Discard ¶

func (*Reader) Each ¶

func (*Reader) EachString ¶

func (*Reader) Grep ¶

func (*Reader) GrepByRegexp ¶

func (*Reader) GrepString ¶

func (*Reader) Map ¶

func (*Reader) MapErr ¶

func (*Reader) MapString ¶

func (*Reader) MapStringErr ¶

func (*Reader) Read ¶

func (*Reader) ReadAll ¶

func (*Reader) ReadAllSlice ¶

func (*Reader) ReadAllSliceString ¶

func (*Reader) ReadAllString ¶

func (*Reader) SetFS ¶

func (*Reader) SetRS ¶

Source Files ¶