mycharset

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 11, 2023 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DetectAll

func DetectAll(content []byte) ([]chardet.Result, error)

DetectAll returns all chardet.Results which have non-zero Confidence. The Results are sorted by Confidence in descending order

Totally same as saintfish/chardet - chardet.NewTextDetector().DetectAll(content)

func DetectAndConvertToUtf8

func DetectAndConvertToUtf8(content []byte) (convertedContent []byte, charset string, confidence int, converted bool, err error)

Detect and convert content to UTF-8 encoded content.

func DetectBest

func DetectBest(content []byte) (r *chardet.Result, err error)

DetectBest returns the chardet.Result with highest Confidence.

Totally same as saintfish/chardet - chardet.NewTextDetector().DetectBest(content)

func IsValidBig5

func IsValidBig5(content []byte) bool

Check whether content is valid under Big5 rule, referce: https://zh.wikipedia.org/wiki/Big5

func IsValidGB18030

func IsValidGB18030(content []byte) bool

Check whether content is valid under GB18030 rule, referce: https://zh.wikipedia.org/wiki/GB_18030

func IsValidGBK

func IsValidGBK(content []byte) bool

Check whether content is valid under GBK rule, referce: https://zh.wikipedia.org/wiki/GBK

func IsValidUTF16

func IsValidUTF16(content []byte) (isUTF16 bool, BE bool, LE bool)

Check whether content is valid under UTF-16 rule, reference: https://zh.wikipedia.org/wiki/UTF-16

return: isUTF16 bool, BE bool

BE: true if content is valid under UTF-16 BE rule, false if not BE: true if content is valid under UTF-16 LE rule, false if not

func IsValidUTF16BE

func IsValidUTF16BE(content []byte) bool

Check whether content is valid under UTF-16-BE rule, reference: https://zh.wikipedia.org/wiki/UTF-16

func IsValidUTF16LE

func IsValidUTF16LE(content []byte) bool

Check whether content is valid under UTF-16-LE rule, reference: https://zh.wikipedia.org/wiki/UTF-16 This function assume content is little endian and then use CheckIsValidUTF16BE's method to valid content

func IsValidUTF8

func IsValidUTF8(content []byte) bool

Check whether content is valid under UTF-8 rule

func ToUtf8WithDecoder

func ToUtf8WithDecoder(content []byte, d encoding.Decoder) ([]byte, error)

get a UTF-8 encoded []byte with encoding.Decoder.

func ToUtf8WithEncoding

func ToUtf8WithEncoding(content []byte, e encoding.Encoding) ([]byte, error)

get a UTF-8 encoded []byte with encoding.Encoding.

Types

type Result

type Result struct {
	// IANA name of the detected charset.
	Charset string
	// IANA name of the detected language. It may be empty for some charsets.
	Language string
	// Confidence of the Result. Scale from 1 to 100. The bigger, the more confident.
	Confidence int
	// Encoding of the Result, default encoding.Nop.
	Encoding encoding.Encoding
	// Whether the charset can be converted by this package
	Convertible bool
}

Result contains all the information that charset detector gives.

func DetectEncoding

func DetectEncoding(content []byte) (r *Result, err error)

DetectEncoding return the Result with highest Confidence and save matched encoding.Encoing in Result if confidence > 95.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL