mahonia: github.com/yinheli/mahonia Index | Files | Directories

package mahonia

import "github.com/yinheli/mahonia"

This package is a character-set conversion library for Go.

(DEPRECATED: use code.google.com/p/go.text/encoding, perhaps along with code.google.com/p/go.net/html/charset)

Index

Package Files

8bit.go ASCII.go big5-data.go big5.go charset.go convert_string.go cp51932.go entity.go entity_data.go euc-jp.go euc-kr-data.go euc-kr.go fallback.go gb18030-data.go gb18030.go gbk-data.go gbk.go iso2022jp.go jis0201-data.go jis0208-data.go jis0212-data.go kuten.go mbcs.go ms-jis-data.go reader.go shiftjis-data.go shiftjis.go tcvn3.go translate.go utf16.go utf8.go writer.go

Constants

const (
    // SUCCESS means that the character was converted with no problems.
    SUCCESS = Status(iota)

    // INVALID_CHAR means that the source contained invalid bytes, or that the character
    // could not be represented in the destination encoding.
    // The Encoder or Decoder should have output a substitute character.
    INVALID_CHAR

    // NO_ROOM means there were not enough input bytes to form a complete character,
    // or there was not enough room in the output buffer to write a complete character.
    // No bytes were written, and no internal state was changed in the Encoder or Decoder.
    NO_ROOM

    // STATE_ONLY means that bytes were read or written indicating a state transition,
    // but no actual character was processed. (Examples: byte order marks, ISO-2022 escape sequences)
    STATE_ONLY
)

func RegisterCharset Uses

func RegisterCharset(cs *Charset)

RegisterCharset adds a charset to the charsetMap.

type Charset Uses

type Charset struct {
    // Name is the character set's canonical name.
    Name string

    // Aliases returns a list of alternate names.
    Aliases []string

    // NewDecoder returns a Decoder to convert from the charset to Unicode.
    NewDecoder func() Decoder

    // NewEncoder returns an Encoder to convert from Unicode to the charset.
    NewEncoder func() Encoder
}

A Charset represents a character set that can be converted, and contains functions to create Converters to encode and decode strings in that character set.

func GetCharset Uses

func GetCharset(name string) *Charset

GetCharset fetches a charset by name. If the name is not found, it returns nil.

type Decoder Uses

type Decoder func(p []byte) (c rune, size int, status Status)

A Decoder is a function that decodes a character set, one character at a time. It works much like utf8.DecodeRune, but has an aditional status return value.

func EntityDecoder Uses

func EntityDecoder() Decoder

EntityDecoder returns a Decoder that decodes HTML character entities. If there is no valid character entity at the current position, it returns INVALID_CHAR. So it needs to be combined with another Decoder via FallbackDecoder.

func FallbackDecoder Uses

func FallbackDecoder(decoders ...Decoder) Decoder

FallbackDecoder combines a series of Decoders into one. If the first Decoder returns a status of INVALID_CHAR, the others are tried as well.

Note: if the text to be decoded ends with a sequence of bytes that is not a valid character in the first charset, but it could be the beginning of a valid character, the FallbackDecoder will give a status of NO_ROOM instead of falling back to the other Decoders.

func NewDecoder Uses

func NewDecoder(name string) Decoder

NewDecoder returns a Decoder to decode the named charset. If the name is not found, it returns nil.

func (Decoder) ConvertString Uses

func (d Decoder) ConvertString(s string) string

ConvertString converts a string from d's encoding to UTF-8.

func (Decoder) ConvertStringOK Uses

func (d Decoder) ConvertStringOK(s string) (result string, ok bool)

ConvertStringOK converts a string from d's encoding to UTF-8. It also returns a boolean indicating whether every character was converted successfully.

func (Decoder) NewReader Uses

func (d Decoder) NewReader(rd io.Reader) *Reader

NewReader creates a new Reader that uses the receiver to decode text.

func (Decoder) Translate Uses

func (d Decoder) Translate(data []byte, eof bool) (n int, cdata []byte, err error)

Translate enables a Decoder to implement go-charset's Translator interface.

type Encoder Uses

type Encoder func(p []byte, c rune) (size int, status Status)

An Encoder is a function that encodes a character set, one character at a time. It works much like utf8.EncodeRune, but has an additional status return value.

func NewEncoder Uses

func NewEncoder(name string) Encoder

NewEncoder returns an Encoder to encode the named charset.

func (Encoder) ConvertString Uses

func (e Encoder) ConvertString(s string) string

ConvertString converts a string from UTF-8 to e's encoding.

func (Encoder) ConvertStringOK Uses

func (e Encoder) ConvertStringOK(s string) (result string, ok bool)

ConvertStringOK converts a string from UTF-8 to e's encoding. It also returns a boolean indicating whether every character was converted successfully.

func (Encoder) NewWriter Uses

func (e Encoder) NewWriter(wr io.Writer) *Writer

NewWriter creates a new Writer that uses the receiver to encode text.

type MBCSTable Uses

type MBCSTable struct {
    // contains filtered or unexported fields
}

A MBCSTable holds the data to convert to and from Unicode.

func (*MBCSTable) AddCharacter Uses

func (table *MBCSTable) AddCharacter(c rune, bytes string)

AddCharacter adds a character to the table. rune is its Unicode code point, and bytes contains the bytes used to encode it in the character set.

func (*MBCSTable) Decoder Uses

func (table *MBCSTable) Decoder() Decoder

func (*MBCSTable) Encoder Uses

func (table *MBCSTable) Encoder() Encoder

type Reader Uses

type Reader struct {
    // contains filtered or unexported fields
}

Reader implements character-set decoding for an io.Reader object.

func (*Reader) Read Uses

func (b *Reader) Read(p []byte) (n int, err error)

Read reads data into p. It returns the number of bytes read into p. It calls Read at most once on the underlying Reader, hence n may be less than len(p). At EOF, the count will be zero and err will be os.EOF.

func (*Reader) ReadRune Uses

func (b *Reader) ReadRune() (c rune, size int, err error)

ReadRune reads a single Unicode character and returns the rune and its size in bytes.

type Status Uses

type Status int

Status is the type for the status return value from a Decoder or Encoder.

type Writer Uses

type Writer struct {
    // contains filtered or unexported fields
}

Writer implements character-set encoding for an io.Writer object.

func (*Writer) Write Uses

func (w *Writer) Write(p []byte) (n int, err error)

Write encodes and writes the data from p.

func (*Writer) WriteRune Uses

func (w *Writer) WriteRune(c rune) (size int, err error)

Directories

PathSynopsis
mahoniconv

Package mahonia imports 8 packages (graph) and is imported by 2 packages. Updated 2016-07-22. Refresh now. Tools for package owners.