go-textseg: github.com/apparentlymart/go-textseg/textseg Index | Files

package textseg

import "github.com/apparentlymart/go-textseg/textseg"

line 1 "grapheme_clusters.rl"

Index

Package Files

all_tokens.go generate.go grapheme_clusters.go tables.go utf8_seqs.go

Variables

var Error = errors.New("invalid UTF8 text")

func AllTokens Uses

func AllTokens(buf []byte, splitFunc bufio.SplitFunc) ([][]byte, error)

AllTokens is a utility that uses a bufio.SplitFunc to produce a slice of all of the recognized tokens in the given buffer.

func ScanGraphemeClusters Uses

func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error)

ScanGraphemeClusters is a split function for bufio.Scanner that splits on grapheme cluster boundaries.

func ScanUTF8Sequences Uses

func ScanUTF8Sequences(data []byte, atEOF bool) (int, []byte, error)

ScanGraphemeClusters is a split function for bufio.Scanner that splits on UTF8 sequence boundaries.

This is included largely for completeness, since this behavior is already built in to Go when ranging over a string.

func TokenCount Uses

func TokenCount(buf []byte, splitFunc bufio.SplitFunc) (int, error)

TokenCount is a utility that uses a bufio.SplitFunc to count the number of recognized tokens in the given buffer.

Package textseg imports 5 packages (graph) and is imported by 18 packages. Updated 2017-06-29. Refresh now. Tools for package owners.