mojibake

package module
v0.0.0-...-96bec8e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 30, 2023 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// Limits of first byte of UTF-8 encoded Unicode codepoint outside the ASCII range
	UTF8FirstByteMin = 194 // \U00000080 in UTF-8 starts with byte 194
	UTF8FirstByteMax = 244 // \U0010FFFF in UTF-8 starts with byte 244
)

Variables

View Source
var (
	// ErrUTF8 is raised if the input has rune errors
	ErrUTF8 = errors.New("UTF-8 encoding error")

	// ErrImpure is raised if the string is not purely double UTF-8 encoded
	// Impurity criterias:
	// - some runes have values above 255
	// - some consecutive runes with value < 256 do not combine to make a valid rune
	ErrImpure = errors.New("FixDoubleUTF8: skip (impure input)")
)

Functions

func FixDoubleUTF8

func FixDoubleUTF8(buf []byte) ([]byte, error)

FixDoubleUTF8 fixes double UTF-8 encoding issues in-place.

All precautions are taken: nothing is changed if the input is not purely double encoded.

In case of error, buf is not changed and is just returned. In case of success and double UTF-8 was found, the returned slice will be shorter than the input.

Two errors may be returned:

  • ErrUTF8: this is not a valid UTF-8 string
  • ErrImpure: this is a valid UTF-8 string, but above, some rune do not make a purely double encoded rune

func FixHTMLEntities

func FixHTMLEntities(s string) string

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL