segmenter

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2024 License: BSD-3-Clause, Unlicense Imports: 2 Imported by: 6

Documentation

Overview

Package segmenter implements Unicode rules used to segment a paragraph of text according to several criteria. In particular, it provides a way of delimiting line break opportunities.

The API of the package follows the very nice iterator pattern proposed in github.com/npillmayer/uax, but use a somewhat simpler internal implementation, inspired by Pango.

The reference documentation is at https://unicode.org/reports/tr14 and https://unicode.org/reports/tr29.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Grapheme

type Grapheme struct {
	// Text is a subslice of the original input slice, containing the delimited grapheme
	Text []rune
	// Offset is the start of the grapheme in the input rune slice
	Offset int
}

Line is the content of a grapheme delimited by the segmenter.

type GraphemeIterator

type GraphemeIterator struct {
	// contains filtered or unexported fields
}

GraphemeIterator provides a convenient way of iterating over the graphemes delimited by a `Segmenter`.

func (*GraphemeIterator) Grapheme

func (gr *GraphemeIterator) Grapheme() Grapheme

Grapheme returns the current `Grapheme`

func (*GraphemeIterator) Next

func (gr *GraphemeIterator) Next() bool

Next returns true if there is still a grapheme to process, and advances the iterator; or return false.

type Line

type Line struct {
	// Text is a subslice of the original input slice, containing the delimited line
	Text []rune
	// Offset is the start of the line in the input rune slice
	Offset int
	// IsMandatoryBreak is true if breaking (at the end of the line)
	// is mandatory
	IsMandatoryBreak bool
}

Line is the content of a line delimited by the segmenter.

type LineIterator

type LineIterator struct {
	// contains filtered or unexported fields
}

LineIterator provides a convenient way of iterating over the lines delimited by a `Segmenter`.

func (*LineIterator) Line

func (li *LineIterator) Line() Line

Line returns the current `Line`

func (*LineIterator) Next

func (li *LineIterator) Next() bool

Next returns true if there is still a line to process, and advances the iterator; or return false.

type Segmenter

type Segmenter struct {
	// contains filtered or unexported fields
}

Segmenter is the entry point of the package.

Usage :

var seg Segmenter
seg.Init(...)
iter := seg.LineIterator()
for iter.Next() {
  ... // do something with iter.Line()
}

func (*Segmenter) GraphemeIterator

func (sg *Segmenter) GraphemeIterator() *GraphemeIterator

GraphemeIterator returns an iterator over the graphemes delimited in [Init].

func (*Segmenter) Init

func (seg *Segmenter) Init(paragraph []rune)

Init resets the segmenter storage with the given input, and computes the attributes required to segment the text.

func (*Segmenter) LineIterator

func (sg *Segmenter) LineIterator() *LineIterator

LineIterator returns an iterator on the lines delimited in [Init].

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL