uax11

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 8, 2021 License: BSD-3-Clause, Unlicense Imports: 8 Imported by: 2

Documentation

Overview

Package uax11 provides utilities for Unicode® Standard Annex #11 “East Asian Width”.

UAX 11 Introduction

This annex presents the specifications of a normative property for Unicode characters that is useful when interoperating with East Asian Legacy character sets. […] When dealing with East Asian text, there is the concept of an inherent width of a character. This width takes on either of two values: narrow or wide.

[…]

For a traditional East Asian fixed pitch font, this width translates to a display width of either one half or a whole unit width. A common name for this unit width is “Em”. While an Em is customarily the height of the letter “M”, it is the same as the unit width in East Asian fonts, because in these fonts the standard character cell is square

[…]

Except for a few characters, which are explicitly called out as fullwidth or halfwidth in the Unicode Standard, characters are not duplicated based on distinction in width. Some characters, such as the ideographs, are always wide; others are always narrow; and some can be narrow or wide, depending on the context. The Unicode character property East_Asian_Width provides a default classification of characters, which an implementation can use to decide at runtime whether to treat a character as narrow or wide.

Caveats

Determining the legacy fixed-width display length is not an exact science. Much depends on the properties of output devices, on fonts used, on a device's interpretation of display rules, etc. Clients should treat results of UAX#11 as heuristics. Using proportional fonts is almost always a better solution.

___________________________________________________________________________

License

This project is provided under the terms of the UNLICENSE or the 3-Clause BSD license denoted by the following SPDX identifier:

SPDX-License-Identifier: 'Unlicense' OR 'BSD-3-Clause'

You may use the project under the terms of either license.

Licenses are reproduced in the license file in the root folder of this module.

Copyright © 2021 Norbert Pillmayer <norbert@pillmayer.com>

Index

Constants

This section is empty.

Variables

View Source
var EastAsianContext = makeEastAsianContext()

EastAsianContext is a context for East Asian languages.

View Source
var LatinContext = makeLatinContext()

LatinContext is a context for western languages.

Functions

func StringWidth

func StringWidth(s grapheme.String, context *Context) int

StringWidth calculates the width of a grapheme.String in terms of `en`s, where 1en stands for 1/2em, i.e. half a full width character.

If an empty context is given, LatinContext is assumed.

s := grapheme.StringFromString("A (世). 😀")
w := uax11.StringWidth(s, uax11.LatinContext)
fmt.Printf("string has fixed-width display length of %d en", w)     ⇒  10

func Width

func Width(grphm []byte, context *Context) int

Width returns the width of a grapheme, given as a byte slice, in terms of `en`s, where 1en stands for 1/2em, i.e. half a full width character. If grphm is invalid or just a zero width rune, a width of 0 is returned.

If an empty context is given, LatinContext is assumed.

Returns either 0, 1 (narrow character) or 2 (wide character).

Types

type Context

type Context struct {
	ForceEastAsian bool            // force East Asian context
	Script         language.Script // ISO 15924 script identifier
	Locale         string          // ISO 639/3166 locale string
	// contains filtered or unexported fields
}

Context represents information about the typesetting environment.

From UAX#11: The term context as used here includes extra information such as explicit markup, knowledge of the source code page, font information, or language and script identification

Clients may fill a context paritially and hand it over to uax11. The functions in this package will try to derive a meaningful context from a partially filled one. This package relies on https://pkg.go.dev/golang.org/x/text/language/ for this to work.

context := &Context{Locale: "zh"}   // unspecified Chinese
_ = Width([]byte("世"), context)
fmt.Printf("%v", context.Script)    ⇒    “Hans”  (simplified Chinese script)

Alternatively, clients may use one of the pre-defined contexts or use `ContextFromEnvironment` to get a client-machine dependent one.

func ContextFromEnvironment

func ContextFromEnvironment() *Context

ContextFromEnvironment creates a Context from the operating system environment, i.e. either from environment variables on *nix sytems of from a kernel call on Windows systems. (We rely on http://github.com/cloudfoundry/jibber_jabber for this).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL