textproc

package module
v3.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2021 License: MIT Imports: 6 Imported by: 1

README

Text processing

For example LF end-of-line, remove trailing white space, sort paragraphs.

Go library and command.

https://pkg.go.dev/github.com/MihaiB/textproc/v3

go install ./textproc
`go env GOPATH`/bin/textproc -help

Documentation

Overview

Package textproc provides text processing.

On a pair of channels (chan dataType, chan error) all data is transmitted then the data channel is closed then a single error is transmitted then the error channel is closed. The nil error represents success. Any non-nil error (including io.EOF) represents failure.

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidUTF8 = errors.New("invalid UTF-8")

ErrInvalidUTF8 is the error returned when the input is not valid UTF-8.

Functions

func ConvertLineTerminatorsToLF

func ConvertLineTerminatorsToLF(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

ConvertLineTerminatorsToLF converts "\r" and "\r\n" to "\n".

func EnsureFinalLFIfNonEmpty

func EnsureFinalLFIfNonEmpty(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

EnsureFinalLFIfNonEmpty ensures non-empty content ends with "\n".

func ReadLFLineContent

func ReadLFLineContent(runeIn <-chan rune, errIn <-chan error) (
	<-chan []rune, <-chan error)

ReadLFLineContent reads the content of each line. The content does not include the line terminator. Lines are terminated by "\n".

func ReadLFParagraphContent

func ReadLFParagraphContent(runeIn <-chan rune, errIn <-chan error) (
	<-chan []rune, <-chan error)

ReadLFParagraphContent reads the content of each paragraph. The content does not include the line terminator of the paragraph's last line.

A paragraph consists of adjacent non-empty lines. Lines are terminated by "\n".

func ReadRunes

func ReadRunes(r io.Reader) (<-chan rune, <-chan error)

ReadRunes reads the runes from r. It fails with ErrInvalidUTF8 if the input is not valid UTF-8.

func SortLFLinesI

func SortLFLinesI(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

SortLFLinesI reads the content of all lines using ReadLFLineContent, sorts the items in case-insensitive order and adds "\n" after each.

func SortLFParagraphsI

func SortLFParagraphsI(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

SortLFParagraphsI reads the content of all paragraphs using ReadLFParagraphContent, sorts the items in case-insensitive order, joins them with "\n\n" and adds "\n" after the last one.

func TrimLFTrailingWhiteSpace

func TrimLFTrailingWhiteSpace(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

TrimLFTrailingWhiteSpace removes white space at the end of lines. Lines are terminated by "\n".

func TrimLeadingEmptyLFLines

func TrimLeadingEmptyLFLines(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

TrimLeadingEmptyLFLines removes empty lines at the start of the input. Lines are terminated by "\n".

func TrimTrailingEmptyLFLines

func TrimTrailingEmptyLFLines(runeIn <-chan rune, errIn <-chan error) (
	<-chan rune, <-chan error)

TrimTrailingEmptyLFLines removes empty lines at the end of the input. Lines are terminated by "\n".

Types

type RuneProcessor

type RuneProcessor = func(runeIn <-chan rune, errIn <-chan error) (
	runeOut <-chan rune, errOut <-chan error)

A RuneProcessor consumes and produces runes.

type Tokenizer

type Tokenizer = func(runeIn <-chan rune, errIn <-chan error) (
	tokenOut <-chan []rune, errOut <-chan error)

A Tokenizer consumes runes and produces tokens.

Directories

Path Synopsis
Package internal contains textproc internals.
Package internal contains textproc internals.
Textproc processes text.
Textproc processes text.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL