unicode

package
v0.0.0-...-4d00197 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 28, 2023 License: GPL-3.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const Newline rune = '\n'

Variables

This section is empty.

Functions

func BlockFor

func BlockFor(r rune) string

BlockFor returns the name of the unicode script where the input rune belongs

func NFC

func NFC(s string) string

func NFD

func NFD(s string) string

func NFKC

func NFKC(s string) string

func NFKD

func NFKD(s string) string

func NameFor

func NameFor(r rune) string

NameFor returns the name of the input rune

func UnicodeFor

func UnicodeFor(s string) []string

UnicodeFor Returns a list of unicodes for each input rune

func UnicodeForR

func UnicodeForR(r rune) string

UnicodeForR Returns unicode for the input rule

Types

type Info

type Info struct {
	// A string representation of the input rune (for special newline and tab, the string representation is empty in this implementation)
	String string

	// The unicode number
	Unicode string

	// The character name
	CharName string

	// The codeblock
	CodeBlock string
}

Info holds a set of unicode-related information for a rune

type Processor

type Processor struct {
	NFC                       bool
	NFD                       bool
	ConvertFromUnicodeNumbers bool
}

Processor

func (*Processor) Normalize

func (p *Processor) Normalize(s string) string

Normalize according to the NFC/NFD settings in the UnicodeProcessor

func (*Processor) RuneInfo

func (p *Processor) RuneInfo(r rune) Info

RuneInfo Returns tab-separated unicode information for each input rune

func (*Processor) UnicodeInfo

func (p *Processor) UnicodeInfo(s string) []Info

Info Creates a list with unicode information for each input rune

type Token

type Token struct {
	UnicodeBlock string `json:"block"`
	String       string `json:"string"`
}

type Tokenizer

type Tokenizer struct {
	UP             Processor
	SkipWhiteSpace bool
}

Tokenizer is a simple unicode tokenizer, that groups characters by code block A sequence of characters is treated as one token, as long as they belong to the same unicode code block Numerals, spacing and punctuation are treated as separates code blocks

func (*Tokenizer) BlockFor

func (t *Tokenizer) BlockFor(r rune) string

BlockFor returns the name of the unicode block for the input rune. Numerals, spacing and punctuation are treated as separate code blocks.

func (*Tokenizer) Tokenize

func (t *Tokenizer) Tokenize(s string) []Token

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL