norm

package module
v0.0.0-...-9b058df Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 10, 2023 License: MIT Imports: 8 Imported by: 2

README

norm

Basic text normalization.

Example:
norm := norm.NewNormalizer("nfd lines collapse trim")
output, err := norm.Normalize(input_slice)
Options

nfd lowercase accents quotemarks collapse trim leading-space lines

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Accents

func Accents(b []byte) (output []byte, err error)

Removes accents from UTF-8 text.

func AddLeadingSpace

func AddLeadingSpace(b []byte) []byte

Adds a leading space if there isn't one already.

func Case

func Case(b []byte) (output []byte, err error)

Lowercases UTF-8 text.

func CaseAndAccents

func CaseAndAccents(b []byte) (output []byte, err error)

Lowercase and remove accents from UTF-8 text.

func Collapse

func Collapse(input []byte) []byte

All sequences of 2 or more spaces are converted into single spaces.

func CollapseAndQuotemarks

func CollapseAndQuotemarks(input []byte) []byte

All sequences of 2 or more spaces are converted into single spaces, and curly UTF-8 apostrophes and quotes are converted into ASCII.

func CollapseAndQuotemarksAndUnixLines

func CollapseAndQuotemarksAndUnixLines(input []byte) []byte

Does all three in one loop.

func CollapseAndUnixLines

func CollapseAndUnixLines(input []byte) []byte

Do both in one loop.

func NFD

func NFD(input []byte) (output []byte, err error)

Performs UTF-8 NFD normalization.

func NFDAndCase

func NFDAndCase(b []byte) (output []byte, err error)

Lowercases and performs UTF-8 NFD normalization.

func Quotemarks

func Quotemarks(input []byte) []byte

Curly UTF-8 apostrophes and quotes are converted into ASCII.

func Trim

func Trim(b []byte) []byte

Removes preceding and trailing whitespace (& some non-printable characters).

func TrimAndAddLeadingSpace

func TrimAndAddLeadingSpace(b []byte) []byte

Removes preceding and trailing whitespace (& some non-printable characters), then adds space to the front.

func UnixLines

func UnixLines(input []byte) []byte

Newlines converts /r/n to /n

Types

type Normalizer

type Normalizer struct {
	Flag uint8
}

func NewNormalizer

func NewNormalizer(s string) (Normalizer, error)

func (Normalizer) Normalize

func (n Normalizer) Normalize(data []byte) ([]byte, error)

func (Normalizer) SpecifiedAccents

func (n Normalizer) SpecifiedAccents() bool

func (Normalizer) SpecifiedCollapse

func (n Normalizer) SpecifiedCollapse() bool

func (Normalizer) SpecifiedLeadingSpace

func (n Normalizer) SpecifiedLeadingSpace() bool

func (Normalizer) SpecifiedLowercase

func (n Normalizer) SpecifiedLowercase() bool

func (Normalizer) SpecifiedNFD

func (n Normalizer) SpecifiedNFD() bool

func (Normalizer) SpecifiedQuotemarks

func (n Normalizer) SpecifiedQuotemarks() bool

func (Normalizer) SpecifiedTrim

func (n Normalizer) SpecifiedTrim() bool

func (Normalizer) SpecifiedUnixLines

func (n Normalizer) SpecifiedUnixLines() bool

func (Normalizer) String

func (n Normalizer) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL