gobls

package module
v1.3.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 3, 2019 License: MIT Imports: 3 Imported by: 4

README

gobls

Gobls is a buffered line scanner for Go.

GoDoc

Description

Similar to bufio.Scanner, but wraps bufio.Reader.ReadLine so lines of arbitrary length can be scanned. It uses a hybrid approach so that in most cases, when lines are not unusually long, the fast code path is taken. When lines are unusually long, it uses the per-scanner pre-allocated byte slice to reassemble the fragments into a single slice of bytes.

Example

Enumerating lines from an io.Reader (drop in replacement for bufio.Scanner)

When you have an io.Reader that you want to enumerate, normally you wrap it in bufio.Scanner. This library is a drop in replacement for this particular circumstance, and you can change from bufio.NewScanner(r) to gobls.NewScanner(r), and no longer have to worry about token too long errors.

    var lines, characters int
    ls := gobls.NewScanner(os.Stdin)
    for ls.Scan() {
        lines++
        characters += len(ls.Bytes())
    }
    if err:= ls.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "cannot scan:", err)
    }
    fmt.Println("Counted",lines,"lines and",characters,"characters.")
Enumerating lines from []byte

If you already have a slice of bytes that you want to enumerate lines for, it is much more performant to wrap that byte slice with gobls.NewBufferScanner(buf) than to wrap the slice in a io.Reader and call either the above or bufio.NewScanner.

    var lines, characters int
    ls := gobls.NewBufferScanner(buf)
    for ls.Scan() {
        lines++
        characters += len(ls.Bytes())
    }
    if err:= ls.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "cannot scan:", err)
    }
    fmt.Println("Counted",lines,"lines and",characters,"characters.")

Performance

The BufferScanner is faster than bufio.Scanner for all benchmarks. However, on my test system, the regular Scanner takes from 2% to nearly 40% longer than bufio scanner, depending on the length of the lines to be scanned. The 40% longer times were only observed when line lengths were bufio.MaxScanTokenSize bytes long. Usually the performance penalty is 2% to 15% of bufio measurements.

Run go test -bench=. -benchmem on your system for comparison. I'm sure the testing method could be improved. Suggestions are welcomed.

For circumstances where there is no concern about enumerating lines whose lengths are longer than the max token length from bufio, then I recommend using the standard library.

On the other hand, if you already have a slice of bytes, library is much more performant than the equivalent bufio.NewScanner(bytes.NewReader(buf)).

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/karrick/gobls
BenchmarkScanner/0064/bufio-8               30000000   43.7  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0064/reader-8              20000000   59.2  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0064/buffer-8              50000000   33.7  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0128/bufio-8               30000000   54.5  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0128/reader-8              20000000   70.5  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0128/buffer-8              30000000   38.9  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0256/bufio-8               20000000   79.8  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0256/reader-8              20000000   94.9  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0256/buffer-8              30000000   50.2  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0512/bufio-8               10000000    123  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0512/reader-8              10000000    144  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/0512/buffer-8              20000000   79.0  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/1024/bufio-8               10000000    210  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/1024/reader-8              10000000    227  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/1024/buffer-8              10000000    119  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/2048/bufio-8                5000000    382  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/2048/reader-8               3000000    413  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/2048/buffer-8               5000000    272  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/4096/bufio-8                2000000    701  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/4096/reader-8               2000000    733  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/4096/buffer-8               3000000    517  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/excessively_long/bufio-8     200000  11681  ns/op  0  B/op  0  allocs/op
BenchmarkScanner/excessively_long/reader-8    100000  14464  ns/op  2  B/op  0  allocs/op
BenchmarkScanner/excessively_long/buffer-8    200000   8688  ns/op  0  B/op  0  allocs/op
PASS
ok  	github.com/karrick/gobls	256.191s

Documentation

Index

Constants

View Source
const DefaultBufferSize = 16 * 1024

DefaultBufferSize specifies the initial bytes size each gobls scanner will allocate to be used for aggregation of line fragments.

Variables

This section is empty.

Functions

This section is empty.

Types

type BufferScanner added in v1.3.0

type BufferScanner struct {
	// contains filtered or unexported fields
}

BufferScanner enumerates newline terminated strings from a provided slice of bytes faster than bufio.Scanner and gobls.Scanner. This is particular useful when a program already has the entire buffer in a slice of bytes. This structure uses newline as the line terminator, but returns nether the newline nor an optional carriage return from each discovered string.

func (*BufferScanner) Bytes added in v1.3.0

func (b *BufferScanner) Bytes() []byte

Bytes returns the byte slice that was just scanned. It does not return the terminating newline character, nor any optional preceding carriage return character.

func (*BufferScanner) Err added in v1.3.0

func (b *BufferScanner) Err() error

Err returns nil because scanning from a slice of bytes will never cause an error.

func (*BufferScanner) Scan added in v1.3.0

func (b *BufferScanner) Scan() bool

Scan will scan the text from the original slice of bytes, and return true if scanning ought to continue or false if scanning is complete, because of the end of the slice of bytes.

func (*BufferScanner) Text added in v1.3.0

func (b *BufferScanner) Text() string

Text returns the string representation of the byte slice returned by the most recent Scan call. It does not return the terminating newline character, nor any optional preceding carriage return character.

type Scanner

type Scanner interface {
	Bytes() []byte
	Err() error
	Scan() bool
	Text() string
}

Scanner provides an interface for reading newline-delimited lines of text. It is similar to bufio.Scanner, but wraps the ReadLine method of bufio.Reader so lines of arbitrary length can be scanned. Successive calls to the Scan method will step through the lines of a file, skipping the newline whitespace between lines.

Scanning stops unrecoverably at EOF, or at the first I/O error. Unlike bufio.Scanner, however, attempting to scan a line longer than bufio.MaxScanTokenSize will not result in an error, but will return the long line.

Also like bufio.Scanner, it is not necessary to check for errors by calling the Err method until after scanning stops, when the Scan method returns false.

This Scanner ought behave exactly like bufio.Scanner. All methods ought to have the exact same return values while stepping through the given the provided io.Reader.

func NewBufferScanner added in v1.3.0

func NewBufferScanner(buf []byte) Scanner

NewBufferScanner returns a BufferScanner that enumerates newline terminated strings from buf.

func NewScanner

func NewScanner(r io.Reader) Scanner

NewScanner returns a scanner that reads from the specified `io.Reader`. It allocates a scanning buffer with the default buffer size. This per-scanner buffer will grow to accomodate extremely long lines.

var lines, characters int
ls := gobls.NewScanner(os.Stdin)
for ls.Scan() {
    lines++
    characters += len(ls.Bytes())
}
if ls.Err() != nil {
    fmt.Fprintln(os.Stderr, "cannot scan:", ls.Err())
}
fmt.Println("Counted",lines,"and",characters,"characters.")

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL