unicode

package

v0.0.0-...-4d00197 Latest Latest Go to latest Published: Apr 28, 2023 License: GPL-3.0 Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/HannaLindgren/go-utils

Documentation ¶

Index ¶

Constants
func BlockFor(r rune) string
func NFC(s string) string
func NFD(s string) string
func NFKC(s string) string
func NFKD(s string) string
func NameFor(r rune) string
func UnicodeFor(s string) []string
func UnicodeForR(r rune) string
type Info
type Processor
type Token
type Tokenizer
- func (t *Tokenizer) BlockFor(r rune) string
- func (t *Tokenizer) Tokenize(s string) []Token

Constants ¶

View Source

const Newline rune = '\n'

Variables ¶

This section is empty.

Functions ¶

func BlockFor ¶

func BlockFor(r rune) string

BlockFor returns the name of the unicode script where the input rune belongs

func NFC ¶

func NFC(s string) string

func NFD ¶

func NFD(s string) string

func NFKC ¶

func NFKC(s string) string

func NFKD ¶

func NFKD(s string) string

func NameFor ¶

func NameFor(r rune) string

NameFor returns the name of the input rune

func UnicodeFor ¶

func UnicodeFor(s string) []string

UnicodeFor Returns a list of unicodes for each input rune

func UnicodeForR ¶

func UnicodeForR(r rune) string

UnicodeForR Returns unicode for the input rule

Types ¶

type Info ¶

type Info struct {
	// A string representation of the input rune (for special newline and tab, the string representation is empty in this implementation)
	String string

	// The unicode number
	Unicode string

	// The character name
	CharName string

	// The codeblock
	CodeBlock string
}

Info holds a set of unicode-related information for a rune

func (*Processor) Normalize ¶

func (p *Processor) Normalize(s string) string

Normalize according to the NFC/NFD settings in the UnicodeProcessor

func (*Processor) RuneInfo ¶

func (p *Processor) RuneInfo(r rune) Info

RuneInfo Returns tab-separated unicode information for each input rune

func (*Processor) UnicodeInfo ¶

func (p *Processor) UnicodeInfo(s string) []Info

Info Creates a list with unicode information for each input rune

type Token ¶

type Token struct {
	UnicodeBlock string `json:"block"`
	String       string `json:"string"`
}

type Tokenizer ¶

type Tokenizer struct {
	UP             Processor
	SkipWhiteSpace bool
}

Tokenizer is a simple unicode tokenizer, that groups characters by code block A sequence of characters is treated as one token, as long as they belong to the same unicode code block Numerals, spacing and punctuation are treated as separates code blocks

func (*Tokenizer) BlockFor ¶

func (t *Tokenizer) BlockFor(r rune) string

BlockFor returns the name of the unicode block for the input rune. Numerals, spacing and punctuation are treated as separate code blocks.

func (*Tokenizer) Tokenize ¶

func (t *Tokenizer) Tokenize(s string) []Token

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL