utf32

package

v0.14.0 Latest Latest Go to latest Published: Oct 11, 2023 License: BSD-3-Clause Imports: 5 Imported by: 49

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

cs.opensource.google/go/x/text

Links

Documentation ¶

Overview ¶

Package utf32 provides the UTF-32 Unicode encoding.

Please note that support for UTF-32 is discouraged as it is a rare and inefficient encoding, unfit for use as an interchange format. For use on the web, the W3C strongly discourages its use (https://www.w3.org/TR/html5/document-metadata.html#charset) while WHATWG directly prohibits supporting it (https://html.spec.whatwg.org/multipage/syntax.html#character-encodings).

Index ¶

Variables
func UTF32(e Endianness, b BOMPolicy) encoding.Encoding
type BOMPolicy
type Endianness

Constants ¶

This section is empty.

Variables ¶

View Source

var All = []encoding.Encoding{
	UTF32(BigEndian, UseBOM),
	UTF32(BigEndian, IgnoreBOM),
	UTF32(LittleEndian, IgnoreBOM),
}

All lists a configuration for each IANA-defined UTF-32 variant.

View Source

var ErrMissingBOM = errors.New("encoding: missing byte order mark")

ErrMissingBOM means that decoding UTF-32 input with ExpectBOM did not find a starting byte order mark.

Functions ¶

func UTF32 ¶

func UTF32(e Endianness, b BOMPolicy) encoding.Encoding

UTF32 returns a UTF-32 Encoding for the given default endianness and byte order mark (BOM) policy.

When decoding from UTF-32 to UTF-8, if the BOMPolicy is IgnoreBOM then neither BOMs U+FEFF nor ill-formed code units 0xFFFE0000 in the input stream will affect the endianness used for decoding. Instead BOMs will be output as their standard UTF-8 encoding "\xef\xbb\xbf" while 0xFFFE0000 code units will be output as "\xef\xbf\xbd", the standard UTF-8 encoding for the Unicode replacement character. If the BOMPolicy is UseBOM or ExpectBOM a starting BOM is not written to the UTF-8 output. Instead, it overrides the default endianness e for the remainder of the transformation. Any subsequent BOMs U+FEFF or ill-formed code units 0xFFFE0000 will not affect the endianness used, and will instead be output as their standard UTF-8 (replacement) encodings. For UseBOM, if there is no starting BOM, it will proceed with the default Endianness. For ExpectBOM, in that case, the transformation will return early with an ErrMissingBOM error.

When encoding from UTF-8 to UTF-32, a BOM will be inserted at the start of the output if the BOMPolicy is UseBOM or ExpectBOM. Otherwise, a BOM will not be inserted. The UTF-8 input does not need to contain a BOM.

There is no concept of a 'native' endianness. If the UTF-32 data is produced and consumed in a greater context that implies a certain endianness, use IgnoreBOM. Otherwise, use ExpectBOM and always produce and consume a BOM.

In the language of https://www.unicode.org/faq/utf_bom.html#bom10, IgnoreBOM corresponds to "Where the precise type of the data stream is known... the BOM should not be used" and ExpectBOM corresponds to "A particular protocol... may require use of the BOM".

Types ¶

type BOMPolicy ¶

type BOMPolicy uint8

BOMPolicy is a UTF-32 encodings's byte order mark policy.

const (

	// IgnoreBOM means to ignore any byte order marks.
	IgnoreBOM BOMPolicy = 0

	// UseBOM means that the UTF-32 form may start with a byte order mark,
	// which will be used to override the default encoding.
	UseBOM BOMPolicy = writeBOM | acceptBOM

	// ExpectBOM means that the UTF-32 form must start with a byte order mark,
	// which will be used to override the default encoding.
	ExpectBOM BOMPolicy = writeBOM | acceptBOM | requireBOM
)

type Endianness ¶

type Endianness bool

Endianness is a UTF-32 encoding's default endianness.

const (
	// BigEndian is UTF-32BE.
	BigEndian Endianness = false
	// LittleEndian is UTF-32LE.
	LittleEndian Endianness = true
)

Source Files ¶

View all Source files

utf32.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL