encoding

package

v0.0.0-...-d3199ed Latest Latest Go to latest Published: Dec 18, 2020 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/badu/term

Links

Open Source Insights

README ¶

A package for dealing with encodings

Rearranged from this package.

Documentation ¶

Index ¶

Constants
func GetEncoding(charset string) encoding.Encoding
func Register()
func RegisterEncoding(charset string, enc encoding.Encoding)
func SetEncodingFallback(fb Fallback)
type CharMap
type Fallback

Constants ¶

View Source

const (
	FallbackFail  = iota // FallbackFail behavior causes GetEncoding to fail when it cannot find an encoding.
	FallbackASCII        // FallbackASCII behavior causes GetEncoding to fall back to a 7-bit ASCII encoding, if no other encoding can be found.
	FallbackUTF8         // FallbackUTF8 behavior causes GetEncoding to assume UTF8 can pass unmodified upon failure. Note that this behavior is not recommended, unless you are sure your terminal can cope  with real UTF8 sequences.
)

View Source

const (
	Sterling = '£'
	DArrow   = '↓'
	LArrow   = '←'
	RArrow   = '→'
	UArrow   = '↑'
	Bullet   = '·'
	Board    = '░'
	CkBoard  = '▒'
	Degree   = '°'
	Diamond  = '◆'
	GEqual   = '≥'
	Pi       = 'π'
	HLine    = '─'
	Lantern  = '§'
	Plus     = '┼'
	LEqual   = '≤'
	LLCorner = '└'
	LRCorner = '┘'
	NEqual   = '≠'
	PlMinus  = '±'
	S1       = '⎺'
	S3       = '⎻'
	S7       = '⎼'
	S9       = '⎽'
	Block    = '█'
	TTee     = '┬'
	RTee     = '┤'
	LTee     = '├'
	BTee     = '┴'
	ULCorner = '┌'
	URCorner = '┐'
	VLine    = '│'
	Space    = ' '
)

The names of these constants are chosen to match Terminfo names, modulo case, and changing the prefix from ACS_ to Rune. These are the runes we provide extra special handling for, with ASCII fallbacks for terminals that lack them.

Variables ¶

This section is empty.

Functions ¶

func GetEncoding ¶

func GetEncoding(charset string) encoding.Encoding

GetEncoding is used by Screen implementors who want to locate an encoding for the given character set name. Note that this will return nil for either the Unicode (UTF-8) or ASCII encodings, since we don't use encodings for them but instead have our own native methods.

func Register ¶

func Register()

Register registers all known encodings. This is a short-cut to add full character set support to your program. Note that this can add several megabytes to your program's size, because some of the encodings are rather large (particularly those from East Asia.)

func RegisterEncoding ¶

func RegisterEncoding(charset string, enc encoding.Encoding)

RegisterEncoding may be called by the application to register an encoding. The presence of additional encodings will facilitate application usage with terminal environments where the I/O subsystem does not support Unicode. Windows systems use Unicode natively, and do not need any of the encoding subsystem when using Windows Console screens.

Please see the Go documentation for golang.org/x/text/encoding -- most of the common ones exist already as stock variables. For example, ISO8859-15 can be registered using the following code:

import "golang.org/x/text/encoding/charmap"
...
RegisterEncoding("ISO8859-15", charmap.ISO8859_15)

Aliases can be registered as well, for example "8859-15" could be an alias for "ISO8859-15".

For POSIX systems, the term package will check the environment variables LC_ALL, LC_CTYPE, and LANG (in that order) to determine the character set. These are expected to have the following pattern:

$language[.$codeset[@$variant]

We extract only the $codeset part, which will usually be something like UTF-8 or ISO8859-15 or KOI8-R. Note that if the locale is either "POSIX" or "C", then we assume US-ASCII (the POSIX 'portable character set' and assume all other characters are somehow invalid.)

Modern POSIX systems and terminal emulators may use UTF-8, and for those systems, this API is also unnecessary. For example, Darwin (MacOS X) and modern Linux running modern xterm generally will out of the box without any of this. Use of UTF-8 is recommended when possible, as it saves quite a lot processing overhead.

Note that some encodings are quite large (for example GB18030 which is a superset of Unicode) and so the application size can be expected to increase quite a bit as each encoding is added. The East Asian encodings have been seen to add 100-200K per encoding to the application size.

func SetEncodingFallback ¶

func SetEncodingFallback(fb Fallback)

SetEncodingFallback changes the behavior of GetEncoding when a suitable encoding is not found. The default is FallbackFail, which causes GetEncoding to simply return nil.

Types ¶

type CharMap ¶

type CharMap struct {
	transform.NopResetter

	// The map between bytes and runes.  To indicate that a specific
	// byte value is invalid for a character set, use the rune
	// utf8.RuneError.  Values that are absent from this map will
	// be assumed to have the identity mapping -- that is the default
	// is to assume ISO8859-1, where all 8-bit characters have the same
	// numeric value as their Unicode runes.  (Not to be confused with
	// the UTF-8 values, which *will* be different for non-ASCII runes.)
	//
	// If no values less than RuneSelf are changed (or have non-identity
	// mappings), then the character set is assumed to be an ASCII
	// superset, and certain assumptions and optimizations become
	// available for ASCII bytes.
	Map map[byte]rune

	// The ReplacementChar is the byte value to use for substitution.
	// It should normally be ASCIISub for ASCII encodings.  This may be
	// unset (left to zero) for mappings that are strictly ASCII supersets.
	// In that case ASCIISub will be assumed instead.
	ReplacementChar byte
	// contains filtered or unexported fields
}

CharMap is a structure for setting up encodings for 8-bit character sets, for transforming between UTF8 and that other character set. It has some ideas borrowed from golang.org/x/text/encoding/charmap, but it uses a different implementation. This implementation uses maps, and supports user-defined maps.

We do assume that a character map has a reasonable substitution character, and that valid encodings are stable (exactly a 1:1 map) and stateless (that is there is no shift character or anything like that.) Hence this approach will not work for many East Asian character sets.

Measurement shows little or no measurable difference in the performance of the two approaches. The difference was down to a couple of nsec/op, and no consistent pattern as to which ran faster. With the conversion to UTF-8 the code takes about 25 nsec/op. The conversion in the reverse direction takes about 100 nsec/op. The larger cost for conversion from UTF-8 is most likely due to the need to convert the UTF-8 byte stream to a rune before conversion.

func (*CharMap) Init ¶

func (c *CharMap) Init()

Init initializes internal values of a character map. This should be done early, to minimize the cost of allocation of transforms later. It is not strictly necessary however, as the allocation functions will arrange to call it if it has not already been done.

func (*CharMap) NewDecoder ¶

func (c *CharMap) NewDecoder() *encoding.Decoder

NewDecoder returns a Decoder the converts from the 8-bit character set to UTF-8. Unknown mappings, if any, are mapped to '\uFFFD'.

func (*CharMap) NewEncoder ¶

func (c *CharMap) NewEncoder() *encoding.Encoder

NewEncoder returns a Transformer that converts from UTF8 to the 8-bit character set. Unknown mappings are mapped to 0x1A.

type Fallback ¶

type Fallback int

Fallback describes how the system behaves when the locale requires a character set that we do not support. The system always supports UTF-8 and US-ASCII. On Windows consoles, UTF-16LE is also supported automatically. Other character sets must be added using the RegisterEncoding API. (A large group of nearly all of them can be added using the RegisterAll function in the encoding sub package.)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL