import "github.com/henrylee2cn/mahonia"
This package is a character-set conversion library for Go.
(DEPRECATED: use code.google.com/p/go.text/encoding, perhaps along with code.google.com/p/go.net/html/charset)
8bit.go ASCII.go big5-data.go big5.go charset.go convert_string.go cp51932.go entity.go entity_data.go euc-jp.go euc-kr-data.go euc-kr.go fallback.go gb18030-data.go gb18030.go gbk-data.go gbk.go iso2022jp.go jis0201-data.go jis0208-data.go jis0212-data.go kuten.go mbcs.go ms-jis-data.go reader.go shiftjis-data.go shiftjis.go tcvn3.go translate.go utf16.go utf8.go writer.go
const ( // SUCCESS means that the character was converted with no problems. SUCCESS = Status(iota) // INVALID_CHAR means that the source contained invalid bytes, or that the character // could not be represented in the destination encoding. // The Encoder or Decoder should have output a substitute character. INVALID_CHAR // NO_ROOM means there were not enough input bytes to form a complete character, // or there was not enough room in the output buffer to write a complete character. // No bytes were written, and no internal state was changed in the Encoder or Decoder. NO_ROOM // STATE_ONLY means that bytes were read or written indicating a state transition, // but no actual character was processed. (Examples: byte order marks, ISO-2022 escape sequences) STATE_ONLY )
RegisterCharset adds a charset to the charsetMap.
type Charset struct { // Name is the character set's canonical name. Name string // Aliases returns a list of alternate names. Aliases []string // NewDecoder returns a Decoder to convert from the charset to Unicode. NewDecoder func() Decoder // NewEncoder returns an Encoder to convert from Unicode to the charset. NewEncoder func() Encoder }
A Charset represents a character set that can be converted, and contains functions to create Converters to encode and decode strings in that character set.
GetCharset fetches a charset by name. If the name is not found, it returns nil.
A Decoder is a function that decodes a character set, one character at a time. It works much like utf8.DecodeRune, but has an aditional status return value.
EntityDecoder returns a Decoder that decodes HTML character entities. If there is no valid character entity at the current position, it returns INVALID_CHAR. So it needs to be combined with another Decoder via FallbackDecoder.
FallbackDecoder combines a series of Decoders into one. If the first Decoder returns a status of INVALID_CHAR, the others are tried as well.
Note: if the text to be decoded ends with a sequence of bytes that is not a valid character in the first charset, but it could be the beginning of a valid character, the FallbackDecoder will give a status of NO_ROOM instead of falling back to the other Decoders.
NewDecoder returns a Decoder to decode the named charset. If the name is not found, it returns nil.
ConvertString converts a string from d's encoding to UTF-8.
ConvertStringOK converts a string from d's encoding to UTF-8. It also returns a boolean indicating whether every character was converted successfully.
NewReader creates a new Reader that uses the receiver to decode text.
Translate enables a Decoder to implement go-charset's Translator interface.
An Encoder is a function that encodes a character set, one character at a time. It works much like utf8.EncodeRune, but has an additional status return value.
NewEncoder returns an Encoder to encode the named charset.
ConvertString converts a string from UTF-8 to e's encoding.
ConvertStringOK converts a string from UTF-8 to e's encoding. It also returns a boolean indicating whether every character was converted successfully.
NewWriter creates a new Writer that uses the receiver to encode text.
type MBCSTable struct {
// contains filtered or unexported fields
}
A MBCSTable holds the data to convert to and from Unicode.
AddCharacter adds a character to the table. rune is its Unicode code point, and bytes contains the bytes used to encode it in the character set.
type Reader struct {
// contains filtered or unexported fields
}
Reader implements character-set decoding for an io.Reader object.
Read reads data into p. It returns the number of bytes read into p. It calls Read at most once on the underlying Reader, hence n may be less than len(p). At EOF, the count will be zero and err will be os.EOF.
ReadRune reads a single Unicode character and returns the rune and its size in bytes.
Status is the type for the status return value from a Decoder or Encoder.
type Writer struct {
// contains filtered or unexported fields
}
Writer implements character-set encoding for an io.Writer object.
Write encodes and writes the data from p.
Path | Synopsis |
---|---|
mahoniconv |
Package mahonia imports 8 packages (graph) and is imported by 16 packages. Updated 2016-07-16. Refresh now. Tools for package owners.