byline

package module
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 6, 2018 License: MIT Imports: 6 Imported by: 17

README

byline Reader GoDoc Build Status Coverage Status Sourcegraph Report Card

Go-library for reading and processing data from a io.Reader line by line. Now you can add UNIX text processing principles to its Reader (like with awk, grep, sed ...).

Install

go get -u github.com/msoap/byline

Usage

import "github.com/msoap/byline"

// Create new line-by-line Reader from io.Reader:
lr := byline.NewReader(reader)

// Add to the Reader stack of a filter functions:
lr.MapString(func(line string) string {return "prefix_" + line}).GrepByRegexp(regexp.MustCompile("only this"))

// Read all content
result, err := lr.ReadAll()

// Use everywhere instead of io.Reader
_, err := io.Copy(os.Stdout, lr)

// Or in one place
result, err := byline.NewReader(reader).MapString(func(line string) string {return "prefix_" + line}).ReadAll()

Filter functions

  • Map(func([]byte) []byte) - processing of each line as []byte.
  • MapErr(func([]byte) ([]byte, error)) - processing of each line as []byte, and you can return error, io.EOF or custom error.
  • MapString(func(string) string) - processing of each line as string.
  • MapStringErr(func(string) (string, error)) - processing of each line as string, and you can return error.
  • Each(func([]byte)) - processing each line without changing the line
  • EachString(func(string)) - processing each line as string without changing the line
  • Grep(func([]byte) bool) - filtering lines by function.
  • GrepString(func(string) bool) - filtering lines as string by function.
  • GrepByRegexp(re *regexp.Regexp) - filtering lines by regexp.
  • AWKMode(func(line string, fields []string, vars AWKVars) (string, error)) - processing of each line in AWK mode. In addition to current line, filterFn gets slice with fields splitted by separator (default is /\s+/) and vars releated to awk (NR, NF, RS, FS). Attention! Use AWKMode() with caution on large data sets, see Overheads below.

Map*Err, AWKMode methods can return byline.ErrOmitLine - error for discard processing of current line.

Helper methods

  • SetRS(rs byte) - set line (record) separator, default is newline - \n.
  • SetFS(fs *regexp.Regexp) - set field separator for AWK mode, default is \s+.
  • Discard() - discard all content from Reader only for side effect of filter functions.
  • ReadAll() ([]byte, error) - return all content as slice of bytes.
  • ReadAllSlice() ([][]byte, error) - return all content by lines as [][]byte.
  • ReadAllString() (string, error) - return all content as string.
  • ReadAllSliceString() ([]string, error) - return all content by lines as slice of strings.

Examples

Add line number to each line and add suffix at the end of line:

reader := strings.NewReader("111\n222\n333")
// or read file
reader, err := os.Open("file.txt")
// or process response from HTTP client
reader := httpResponse.Body

i := 0
blr := byline.NewReader(reader).MapString(func(line string) string {
	i++
	return fmt.Sprintf("(%d) %s", i, string(line))
}).Map(func(line []byte) []byte {
	return regexp.MustCompile(`\n?$`).ReplaceAll(line, []byte(" suf\n"))
})

result, err := blr.ReadAll()
Select all types from the Go-source:
type StateMachine struct {
	beginRe *regexp.Regexp
	endRe   *regexp.Regexp
	inBlock bool
}

func (sm *StateMachine) SMFilter(line []byte) bool {
	switch {
	case sm.beginRe.Match(line):
		sm.inBlock = true
		return true
	case sm.inBlock && sm.endRe.Match(line):
		sm.inBlock = false
		return true
	default:
		return sm.inBlock
	}
}

func ExampleReader_Grep() {
	file, err := os.Open("byline.go")
	if err != nil {
		fmt.Println(err)
		return
	}

	// get all lines between "^type..." and "^}"
	sm := StateMachine{
		beginRe: regexp.MustCompile(`^type `),
		endRe:   regexp.MustCompile(`^}\s+$`),
	}

	blr := byline.NewReader(file).Grep(sm.SMFilter).Map(func(line []byte) []byte {
		// and remove comments
		return regexp.MustCompile(`\s+//.+`).ReplaceAll(line, []byte{})
	})

	result, err := blr.ReadAllString()
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Print(result)
}

Output:

type Reader struct {
	scanner     *bufio.Scanner
	buffer      bytes.Buffer
	existsData  bool
	filterFuncs []func(line []byte) ([]byte, error)
	awkVars     AWKVars
}
type AWKVars struct {
	NR int
	NF int
	RS byte
	FS *regexp.Regexp
}
Example of AWK mode, sum the third column with the filter (>10.0):
// CSV with "#" instead of "\n"
reader := strings.NewReader(`1,name one,12.3#2,second row;7.1#3,three row;15.51`)

sum := 0.0
err := byline.NewReader(reader).
	SetRS('#').
	SetFS(regexp.MustCompile(`[,;]`)).
	AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
		if vars.NF < 3 {
			return "", fmt.Errorf("csv parse failed for %q", line)
		}

		if price, err := strconv.ParseFloat(fields[2], 10); err != nil {
			return "", err
		} else if price < 10 {
			return "", byline.ErrOmitLine
		} else {
			sum += price
			return "", nil
		}
	}).Discard()

if err != nil {
	fmt.Println("Price sum:", sum)
}

Output:

Price sum: 27.81

Overheads

An example in which we get odd lines (for io.Reader with 10000 lines):

❯ make benchmark
go test -benchtime 5s -benchmem -bench .
Benchmark_NativeScannerBytes-4       	   20000	    312502 ns/op	  215080 B/op	      24 allocs/op
Benchmark_NativeScannerOnlyCount-4   	   30000	    217491 ns/op	    4160 B/op	       4 allocs/op
Benchmark_MapBytes-4                 	   10000	    567421 ns/op	  135184 B/op	      17 allocs/op
Benchmark_MapString-4                	    5000	   1408956 ns/op	  374000 B/op	   15018 allocs/op
Benchmark_Grep-4                     	   10000	    592100 ns/op	  135200 B/op	      18 allocs/op
Benchmark_GrepString-4               	    5000	   1151309 ns/op	  294416 B/op	   10019 allocs/op
Benchmark_Each-4                     	   10000	    562337 ns/op	    6201 B/op	      13 allocs/op
Benchmark_EachString-4               	   10000	    991528 ns/op	  165427 B/op	   10013 allocs/op
Benchmark_AWKMode-4                  	     500	  11865482 ns/op	 3410392 B/op	   55466 allocs/op
PASS

See benchmark_test.go for benchmark code

See also

  • io, ioutil, bufio - Go packages for work with Readers.
  • go-linereader - package that reads lines from an io.Reader and puts them onto a channel.
  • AWK - programming language and great UNIX tool.

Documentation

Overview

Package byline implements Reader interface for processing io.Reader line-by-line. You can add UNIX text processing principles to its Reader (like with awk, grep, sed ...).

Install

go get -u github.com/msoap/byline

Usage

import "github.com/msoap/byline"

// Create new line-by-line Reader from io.Reader:
lr := byline.NewReader(reader)

// Add to the Reader stack of a filter functions:
lr.MapString(func(line string) string {return "prefix_" + line}).GrepByRegexp(regexp.MustCompile("only this"))

// Read all content
result, err := lr.ReadAll()

// Use everywhere instead of io.Reader
_, err := io.Copy(os.Stdout, lr)

// Or in one place
result, err := byline.NewReader(reader).MapString(func(line string) string {return "prefix_" + line}).ReadAll()
Example
package main

import (
	"bytes"
	"fmt"
	"io"
	"regexp"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`CSV Title
CSV description
ID,NAME,PRICE
A001,name one,12.3

A002,second row;7.1
A003,three row;15.51
Total: ....
Some text
`)

	lr := byline.NewReader(reader).
		GrepString(func(line string) bool {
			// skip empty lines
			return line != "" && line != "\n"
		}).
		Grep(func(line []byte) bool {
			return !bytes.HasPrefix(line, []byte("CSV"))
		}).
		SetFS(regexp.MustCompile(`[,;]`)).
		AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
			// skip header
			if strings.HasPrefix(fields[0], "ID") {
				return "", byline.ErrOmitLine
			}
			// skip footer
			if strings.HasPrefix(fields[0], "Total:") {
				return "", io.EOF
			}
			return line, nil
		}).
		MapString(func(line string) string {
			return "Z" + line
		}).
		AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
			if vars.NF < 3 {
				return "", fmt.Errorf("csv parse failed for %q", line)
			}

			return fmt.Sprintf("%s - %s (line:%d)", fields[0], fields[1], vars.NR), nil
		})

	result, err := lr.ReadAllString()
	fmt.Print("\n", result, err)
}
Output:

ZA001 - name one (line:4)
ZA002 - second row (line:6)
ZA003 - three row (line:7)
<nil>

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// ErrOmitLine - error for Map*Err/AWKMode, for omitting current line
	ErrOmitLine = errors.New("ErrOmitLine")

	// ErrNilReader - error for provided reader being nil
	ErrNilReader = errors.New("nil reader")
)

Functions

This section is empty.

Types

type AWKVars

type AWKVars struct {
	NR int            // number of the current line (begin from 1)
	NF int            // number of fields in the current line
	RS byte           // record separator, default is '\n'
	FS *regexp.Regexp // field separator, default is `\s+`
}

AWKVars - settings for AWK mode, see man awk

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader - line by line Reader

func NewReader

func NewReader(reader io.Reader) *Reader

NewReader - get new line by line Reader

func (*Reader) AWKMode

func (lr *Reader) AWKMode(filterFn func(line string, fields []string, vars AWKVars) (string, error)) *Reader

AWKMode - process lines with AWK like mode

Example
package main

import (
	"fmt"
	"io"
	"regexp"
	"strconv"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`ID,NAME,PRICE
A001,name one,12.3
A002,second row;7.1
A003,three row;15.51
Total: ....
Some text
`)

	sum := 0.0
	lr := byline.NewReader(reader).
		SetFS(regexp.MustCompile(`[,;]`)).
		AWKMode(func(line string, fields []string, vars byline.AWKVars) (string, error) {
			if vars.NR == 1 {
				// skip first line
				return "", byline.ErrOmitLine
			}

			if vars.NF > 0 && strings.HasPrefix(fields[0], "Total:") {
				// skip rest of file
				return "", io.EOF
			}

			if vars.NF < 3 {
				return "", fmt.Errorf("csv parse failed for %q", line)
			}

			if price, err := strconv.ParseFloat(fields[2], 10); err != nil {
				return "", err
			} else if price < 10 {
				return "", byline.ErrOmitLine
			} else {
				sum += price
			}

			return fmt.Sprintf("line:%d. %s - %s", vars.NR, fields[0], fields[1]), nil
		})

	result, err := lr.ReadAllString()
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Print(result)
	fmt.Printf("Sum: %.2f", sum)
}
Output:

line:2. A001 - name one
line:4. A003 - three row
Sum: 27.81

func (*Reader) Discard

func (lr *Reader) Discard() error

Discard - read all content from Reader for side effect from filter functions

func (*Reader) Each

func (lr *Reader) Each(filterFn func([]byte)) *Reader

Each - processing each line. Do not save the value of the byte slice, since it can change in the next filter-steps.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`1 1 1
2 2 2
3 3 3
`)

	spacesCount, bytesCount, linesCount := 0, 0, 0
	err := byline.NewReader(reader).
		Each(func(line []byte) {
			linesCount++
			bytesCount += len(line)
			for _, b := range line {
				if b == ' ' {
					spacesCount++
				}
			}
		}).Discard()

	if err == nil {
		fmt.Printf("spaces: %d, bytes: %d, lines: %d\n", spacesCount, bytesCount, linesCount)
	}
}
Output:

spaces: 6, bytes: 18, lines: 3

func (*Reader) EachString

func (lr *Reader) EachString(filterFn func(string)) *Reader

EachString - processing each line as string

Example
package main

import (
	"fmt"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`111
222
333
`)

	result := []string{}
	err := byline.NewReader(reader).
		EachString(func(line string) {
			result = append(result, line)
		}).Discard()

	if err == nil {
		fmt.Printf("%q\n", result)
	}
}
Output:

["111\n" "222\n" "333\n"]

func (*Reader) Grep

func (lr *Reader) Grep(filterFn func([]byte) bool) *Reader

Grep - grep lines by func

Example
package main

import (
	"fmt"
	"os"
	"regexp"

	"github.com/msoap/byline"
)

type StateMachine struct {
	beginRe *regexp.Regexp
	endRe   *regexp.Regexp
	inBlock bool
}

func (sm *StateMachine) SMFilter(line []byte) bool {
	switch {
	case sm.beginRe.Match(line):
		sm.inBlock = true
		return true
	case sm.inBlock && sm.endRe.Match(line):
		sm.inBlock = false
		return true
	default:
		return sm.inBlock
	}
}

func main() {
	file, err := os.Open("byline.go")
	if err != nil {
		fmt.Println(err)
		return
	}

	// get all lines between "^type..." and "^}"
	sm := StateMachine{
		beginRe: regexp.MustCompile(`^type `),
		endRe:   regexp.MustCompile(`^}\s+$`),
	}

	lr := byline.NewReader(file).Grep(sm.SMFilter).Map(func(line []byte) []byte {
		// and remove comments
		return regexp.MustCompile(`\s+//.+`).ReplaceAll(line, []byte{})
	})

	result, err := lr.ReadAllString()
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Print("\n" + result)
}
Output:

type Reader struct {
	scanner     *bufio.Scanner
	buffer      bytes.Buffer
	existsData  bool
	filterFuncs []func(line []byte) ([]byte, error)
	awkVars     AWKVars
}
type AWKVars struct {
	NR int
	NF int
	RS byte
	FS *regexp.Regexp
}

func (*Reader) GrepByRegexp

func (lr *Reader) GrepByRegexp(re *regexp.Regexp) *Reader

GrepByRegexp - grep lines by regexp

Example
package main

import (
	"fmt"
	"regexp"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`ID,NAME,PRICE
A001,name one,12.3
A002,second row;7.1
A003,three row;15.51
Total: ....
Some text
`)

	result, err := byline.NewReader(reader).GrepByRegexp(regexp.MustCompile(`^A\d+,`)).ReadAllString()
	fmt.Print("\n"+result, err)
}
Output:

A001,name one,12.3
A002,second row;7.1
A003,three row;15.51
<nil>

func (*Reader) GrepString

func (lr *Reader) GrepString(filterFn func(string) bool) *Reader

GrepString - grep lines as string by func

func (*Reader) Map

func (lr *Reader) Map(filterFn func([]byte) []byte) *Reader

Map - set filter function for process each line

func (*Reader) MapErr

func (lr *Reader) MapErr(filterFn func([]byte) ([]byte, error)) *Reader

MapErr - set filter function for process each line, returns error if needed (io.EOF for example)

func (*Reader) MapString

func (lr *Reader) MapString(filterFn func(string) string) *Reader

MapString - set filter function for process each line as string

func (*Reader) MapStringErr

func (lr *Reader) MapStringErr(filterFn func(string) (string, error)) *Reader

MapStringErr - set filter function for process each line as string, returns error if needed (io.EOF for example)

Example
package main

import (
	"fmt"
	"io"
	"strings"

	"github.com/msoap/byline"
)

func main() {
	reader := strings.NewReader(`
100000
200000
300000
end ...
Some text
`)

	result, err := byline.NewReader(reader).
		MapStringErr(func(line string) (string, error) {
			switch {
			case line == "" || line == "\n":
				return "", byline.ErrOmitLine
			case strings.HasPrefix(line, "end "):
				return "", io.EOF
			default:
				return "<" + line, nil
			}
		}).
		ReadAllString()

	fmt.Print("\n"+result, err)
}
Output:

<100000
<200000
<300000
<nil>

func (*Reader) Read

func (lr *Reader) Read(p []byte) (n int, err error)

Read - implement io.Reader interface

func (*Reader) ReadAll

func (lr *Reader) ReadAll() ([]byte, error)

ReadAll - read all content from Reader to slice of bytes

func (*Reader) ReadAllSlice

func (lr *Reader) ReadAllSlice() ([][]byte, error)

ReadAllSlice - read all content from Reader by lines to slice of []byte

func (*Reader) ReadAllSliceString

func (lr *Reader) ReadAllSliceString() ([]string, error)

ReadAllSliceString - read all content from Reader to string slice by lines

func (*Reader) ReadAllString

func (lr *Reader) ReadAllString() (string, error)

ReadAllString - read all content from Reader to one string

func (*Reader) SetFS

func (lr *Reader) SetFS(fs *regexp.Regexp) *Reader

SetFS - set field separator for AWK mode

func (*Reader) SetRS

func (lr *Reader) SetRS(rs byte) *Reader

SetRS - set lines (records) separator

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL