iterators

package
v1.12.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2023 License: MIT Imports: 5 Imported by: 1

Documentation

Overview

Package iterators is a support (base types) package for other packages in UAX29.

Index

Constants

This section is empty.

Variables

View Source
var ErrAdvanceNegative = errors.New("SplitFunc returned a negative advance, this is likely a bug in the SplitFunc")
View Source
var ErrAdvanceTooFar = errors.New("SplitFunc advanced beyond the end of the data, this is likely a bug in the SplitFunc")
View Source
var ErrorScanCalled = errors.New("cannot call Transform after Scan has been called")

Functions

func All

func All(src []byte, dest *[][]byte, split bufio.SplitFunc) error

All iterates through all tokens and collect them into a [][]byte. It is a convenience method. The downside is that it allocates, and can do so unbounded: O(n) on the number of tokens (24 bytes per token). Prefer Segmenter for constant memory usage.

Types

type Scanner

type Scanner struct {
	// contains filtered or unexported fields
}

func NewScanner

func NewScanner(r io.Reader, split bufio.SplitFunc) *Scanner

NewScanner creates a new Scanner given an io.Reader and bufio.SplitFunc. To use the new scanner, iterate while Scan() is true.

func (*Scanner) Bytes added in v1.11.0

func (sc *Scanner) Bytes() []byte

Bytes returns the current token, which results from calling Scan.

func (*Scanner) Err added in v1.9.0

func (sc *Scanner) Err() error

Err returns any error that resulted from calling Scan.

func (*Scanner) Filter

func (sc *Scanner) Filter(filter filter.Func)

Filter applies one or more filters (predicates) to all tokens, only returning those where all filters evaluate true. Filters are applied after Transformers.

func (*Scanner) Scan

func (sc *Scanner) Scan() bool

Scan advances to the next token. It returns true until end of data, or an error. Use Bytes() to retrieve the token, and be sure to check Err().

func (*Scanner) Text added in v1.11.0

func (sc *Scanner) Text() string

Text returns the current token as a string, which results from calling Scan.

func (*Scanner) Transform added in v1.9.0

func (sc *Scanner) Transform(transformers ...transform.Transformer)

Transform applies one or more transformers to all tokens, in order. Calling Transform overwrites previous transformers, so call it once (it's variadic, you can add multiple). Transformers are applied before Filters.

type Segmenter

type Segmenter struct {
	// contains filtered or unexported fields
}

Segmenter is an iterator for byte slices, which are segmented into tokens (segments). To use it, you will define a SplitFunc, SetText with the bytes you wish to tokenize, loop over Next until false, call Bytes to retrieve the current token, and check Err after the loop.

Note that Segmenter is designed for use with the SplitFuncs in the various uax29 sub-packages, and relies on assumptions about their behavior. Caveat emptor when bringing your own SplitFunc.

func NewSegmenter

func NewSegmenter(split bufio.SplitFunc) *Segmenter

NewSegmenter creates a new segmenter given a SplitFunc. To use the new segmenter, call SetText() and then iterate while Next() is true.

Note that Segmenter is designed for use with the SplitFuncs in the various uax29 sub-packages, and relies on assumptions about their behavior. Caveat emptor when bringing your own SplitFunc.

func (*Segmenter) Bytes

func (seg *Segmenter) Bytes() []byte

Bytes returns the current token.

func (*Segmenter) End added in v1.10.0

func (seg *Segmenter) End() int

End returns the position (byte index) of the first byte after the current token, in the original text.

In other words, segmenter.Bytes() == original[segmenter.Start():segmenter.End()]

func (*Segmenter) Err

func (seg *Segmenter) Err() error

Err indicates an error occured when calling Next; Next will return false when an error occurs.

func (*Segmenter) Filter

func (seg *Segmenter) Filter(filter filter.Func)

Filter applies a filter (predicate) to all tokens, returning only those where all filters evaluate true. Calling Filter will overwrite the previous filter.

func (*Segmenter) Next

func (seg *Segmenter) Next() bool

Next advances Segmenter to the next token (segment). It returns false when there are no remaining segments, or an error occurred.

func (*Segmenter) SetText

func (seg *Segmenter) SetText(data []byte)

SetText sets the text for the segmenter to operate on, and resets all state.

func (*Segmenter) Start added in v1.10.0

func (seg *Segmenter) Start() int

Start returns the position (byte index) of the current token in the original text.

func (*Segmenter) Text

func (seg *Segmenter) Text() string

Text returns the current token as a newly-allocated string.

func (*Segmenter) Transform added in v1.9.0

func (seg *Segmenter) Transform(transformers ...transform.Transformer)

Transform applies one or more transforms to all tokens. Calling Transform will overwrite previous transforms, so call it once (it's variadic, you can add multiple, which will be applied in order).

Directories

Path Synopsis
Package filter provides methods for filtering via Scanners and Segmenters.
Package filter provides methods for filtering via Scanners and Segmenters.
Package transformer provides a few handy transformers, for use with Scanner and Segmenter.
Package transformer provides a few handy transformers, for use with Scanner and Segmenter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL