xml

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2018 License: Apache-2.0 Imports: 14 Imported by: 2

Documentation

Overview

Package xml contains a parser for XML.

Package xml implements a simple XML 1.0 parser that understands XML name spaces.

Index

Constants

View Source
const MinRead = 512

MinRead is the minimum slice size passed to a Read call by Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond what is required to hold the contents of r, ReadFrom will not grow the underlying buffer.

Variables

View Source
var ErrTooLarge = errors.New("bytes.Buffer: too large")

ErrTooLarge is passed to panic if memory cannot be allocated to store data in a buffer.

View Source
var HTMLAutoClose = htmlAutoClose

HTMLAutoClose is the set of HTML elements that should be considered to close automatically.

View Source
var HTMLEntity = htmlEntity

HTMLEntity is an entity map containing translations for the standard HTML entity characters.

Functions

func Compare

func Compare(a, b []byte) int

Compare returns an integer comparing two byte slices lexicographically. The result will be 0 if a==b, -1 if a < b, and +1 if a > b. A nil argument is equivalent to an empty slice.

func Equal

func Equal(a, b []byte) bool

Equal returns a boolean reporting whether a and b are the same length and contain the same bytes. A nil argument is equivalent to an empty slice.

func IndexByte

func IndexByte(s []byte, c byte) int

IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.

func Unmarshal

func Unmarshal(data []byte, v interface{}) error

Unmarshal parses the XML-encoded data and stores the result in the value pointed to by v, which must be an arbitrary struct, slice, or string. Well-formed data that does not fit into v is discarded.

Because Unmarshal uses the reflect package, it can only assign to exported (upper case) fields. Unmarshal uses a case-sensitive comparison to match XML element names to tag values and struct field names.

Unmarshal maps an XML element to a struct using the following rules. In the rules, the tag of a field refers to the value associated with the key 'xml' in the struct field's tag (see the example above).

  • If the struct has a field of type []byte or string with tag ",innerxml", Unmarshal accumulates the raw XML nested inside the element in that field. The rest of the rules still apply.

  • If the struct has a field named XMLName of type xml.Name, Unmarshal records the element name in that field.

  • If the XMLName field has an associated tag of the form "name" or "namespace-URL name", the XML element must have the given name (and, optionally, name space) or else Unmarshal returns an error.

  • If the XML element has an attribute whose name matches a struct field name with an associated tag containing ",attr" or the explicit name in a struct field tag of the form "name,attr", Unmarshal records the attribute value in that field.

  • If the XML element contains character data, that data is accumulated in the first struct field that has tag ",chardata". The struct field may have type []byte or string. If there is no such field, the character data is discarded.

  • If the XML element contains comments, they are accumulated in the first struct field that has tag ",comment". The struct field may have type []byte or string. If there is no such field, the comments are discarded.

  • If the XML element contains a sub-element whose name matches the prefix of a tag formatted as "a" or "a>b>c", unmarshal will descend into the XML structure looking for elements with the given names, and will map the innermost elements to that struct field. A tag starting with ">" is equivalent to one starting with the field name followed by ">".

  • If the XML element contains a sub-element whose name matches a struct field's XMLName tag and the struct field has no explicit name tag as per the previous rule, unmarshal maps the sub-element to that struct field.

  • If the XML element contains a sub-element whose name matches a field without any mode flags (",attr", ",chardata", etc), Unmarshal maps the sub-element to that struct field.

  • If the XML element contains a sub-element that hasn't matched any of the above rules and the struct has a field with tag ",any", unmarshal maps the sub-element to that struct field.

  • An anonymous struct field is handled as if the fields of its value were part of the outer struct.

  • A struct field with tag "-" is never unmarshalled into.

Unmarshal maps an XML element to a string or []byte by saving the concatenation of that element's character data in the string or []byte. The saved []byte is never nil.

Unmarshal maps an attribute value to a string or []byte by saving the value in the string or slice.

Unmarshal maps an XML element to a slice by extending the length of the slice and mapping the element to the newly created value.

Unmarshal maps an XML element or attribute value to a bool by setting it to the boolean value represented by the string.

Unmarshal maps an XML element or attribute value to an integer or floating-point field by setting the field to the result of interpreting the string value in decimal. There is no check for overflow.

Unmarshal maps an XML element to an xml.Name by recording the element name.

Unmarshal maps an XML element to a pointer by setting the pointer to a freshly allocated value and then mapping the element to that value.

func WithAttrPrefix

func WithAttrPrefix(a string) func(x *xmlParser)

WithAttrPrefix specifies the prefix which will be added to attributes returned by the parser.

func WithElemPrefix

func WithElemPrefix(e string) func(x *xmlParser)

WithElemPrefix specifies the prefix which will be added to elements returned by the parser.

func WithTextPrefix

func WithTextPrefix(e string) func(x *xmlParser)

WithTextPrefix specifies the prefix which will be added to text returned by the parser.

Types

type Attr

type Attr struct {
	Name  Name
	Value string
}

An Attr represents an attribute in an XML element (Name=Value).

type Buffer

type Buffer struct {
	// contains filtered or unexported fields
}

A Buffer is a variable-sized buffer of bytes with Read and Write methods. The zero value for Buffer is an empty buffer ready to use.

func NewBuffer

func NewBuffer(buf []byte) *Buffer

NewBuffer creates and initializes a new Buffer using buf as its initial contents. It is intended to prepare a Buffer to read existing data. It can also be used to size the internal buffer for writing. To do that, buf should have the desired capacity but a length of zero.

In most cases, new(Buffer) (or just declaring a Buffer variable) is sufficient to initialize a Buffer.

func NewBufferString

func NewBufferString(s string) *Buffer

NewBufferString creates and initializes a new Buffer using string s as its initial contents. It is intended to prepare a buffer to read an existing string.

In most cases, new(Buffer) (or just declaring a Buffer variable) is sufficient to initialize a Buffer.

func (*Buffer) Bytes

func (b *Buffer) Bytes() []byte

Bytes returns a slice of the contents of the unread portion of the buffer; len(b.Bytes()) == b.Len(). If the caller changes the contents of the returned slice, the contents of the buffer will change provided there are no intervening method calls on the Buffer.

func (*Buffer) Cap

func (b *Buffer) Cap() int

Cap returns the capacity of the buffer's underlying byte slice, that is, the total space allocated for the buffer's data.

func (*Buffer) Grow

func (b *Buffer) Grow(n int)

Grow grows the buffer's capacity, if necessary, to guarantee space for another n bytes. After Grow(n), at least n bytes can be written to the buffer without another allocation. If n is negative, Grow will panic. If the buffer can't grow it will panic with ErrTooLarge.

func (*Buffer) Len

func (b *Buffer) Len() int

Len returns the number of bytes of the unread portion of the buffer; b.Len() == len(b.Bytes()).

func (*Buffer) Next

func (b *Buffer) Next(n int) []byte

Next returns a slice containing the next n bytes from the buffer, advancing the buffer as if the bytes had been returned by Read. If there are fewer than n bytes in the buffer, Next returns the entire buffer. The slice is only valid until the next call to a read or write method.

func (*Buffer) Read

func (b *Buffer) Read(p []byte) (n int, err error)

Read reads the next len(p) bytes from the buffer or until the buffer is drained. The return value n is the number of bytes read. If the buffer has no data to return, err is io.EOF (unless len(p) is zero); otherwise it is nil.

func (*Buffer) ReadByte

func (b *Buffer) ReadByte() (c byte, err error)

ReadByte reads and returns the next byte from the buffer. If no byte is available, it returns error io.EOF.

func (*Buffer) ReadBytes

func (b *Buffer) ReadBytes(delim byte) (line []byte, err error)

ReadBytes reads until the first occurrence of delim in the input, returning a slice containing the data up to and including the delimiter. If ReadBytes encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF). ReadBytes returns err != nil if and only if the returned data does not end in delim.

func (*Buffer) ReadFrom

func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads data from r until EOF and appends it to the buffer, growing the buffer as needed. The return value n is the number of bytes read. Any error except io.EOF encountered during the read is also returned. If the buffer becomes too large, ReadFrom will panic with ErrTooLarge.

func (*Buffer) ReadRune

func (b *Buffer) ReadRune() (r rune, size int, err error)

ReadRune reads and returns the next UTF-8-encoded Unicode code point from the buffer. If no bytes are available, the error returned is io.EOF. If the bytes are an erroneous UTF-8 encoding, it consumes one byte and returns U+FFFD, 1.

func (*Buffer) ReadString

func (b *Buffer) ReadString(delim byte) (line string, err error)

ReadString reads until the first occurrence of delim in the input, returning a string containing the data up to and including the delimiter. If ReadString encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF). ReadString returns err != nil if and only if the returned data does not end in delim.

func (*Buffer) Reset

func (b *Buffer) Reset()

Reset resets the buffer so it has no content. b.Reset() is the same as b.Truncate(0).

func (*Buffer) String

func (b *Buffer) String() string

String returns the contents of the unread portion of the buffer as a string. If the Buffer is a nil pointer, it returns "<nil>".

func (*Buffer) Truncate

func (b *Buffer) Truncate(n int)

Truncate discards all but the first n unread bytes from the buffer. It panics if n is negative or greater than the length of the buffer.

func (*Buffer) UnreadByte

func (b *Buffer) UnreadByte() error

UnreadByte unreads the last byte returned by the most recent read operation. If write has happened since the last read, UnreadByte returns an error.

func (*Buffer) UnreadRune

func (b *Buffer) UnreadRune() error

UnreadRune unreads the last rune returned by ReadRune. If the most recent read or write operation on the buffer was not a ReadRune, UnreadRune returns an error. (In this regard it is stricter than UnreadByte, which will unread the last byte from any read operation.)

func (*Buffer) Write

func (b *Buffer) Write(p []byte) (n int, err error)

Write appends the contents of p to the buffer, growing the buffer as needed. The return value n is the length of p; err is always nil. If the buffer becomes too large, Write will panic with ErrTooLarge.

func (*Buffer) WriteByte

func (b *Buffer) WriteByte(c byte) error

WriteByte appends the byte c to the buffer, growing the buffer as needed. The returned error is always nil, but is included to match bufio.Writer's WriteByte. If the buffer becomes too large, WriteByte will panic with ErrTooLarge.

func (*Buffer) WriteRune

func (b *Buffer) WriteRune(r rune) (n int, err error)

WriteRune appends the UTF-8 encoding of Unicode code point r to the buffer, returning its length and an error, which is always nil but is included to match bufio.Writer's WriteRune. The buffer is grown as needed; if it becomes too large, WriteRune will panic with ErrTooLarge.

func (*Buffer) WriteString

func (b *Buffer) WriteString(s string) (n int, err error)

WriteString appends the contents of s to the buffer, growing the buffer as needed. The return value n is the length of s; err is always nil. If the buffer becomes too large, WriteString will panic with ErrTooLarge.

func (*Buffer) WriteTo

func (b *Buffer) WriteTo(w io.Writer) (n int64, err error)

WriteTo writes data to w until the buffer is drained or an error occurs. The return value n is the number of bytes written; it always fits into an int, but it is int64 to match the io.WriterTo interface. Any error encountered during the write is also returned.

type CharData

type CharData []byte

A CharData represents XML character data (raw text), in which XML escape sequences have been replaced by the characters they represent.

func (CharData) Copy

func (c CharData) Copy() CharData

type Comment

type Comment []byte

A Comment represents an XML comment of the form <!--comment-->. The bytes do not include the <!-- and --> comment markers.

func (Comment) Copy

func (c Comment) Copy() Comment

type Decoder

type Decoder struct {
	// Strict defaults to true, enforcing the requirements
	// of the XML specification.
	// If set to false, the parser allows input containing common
	// mistakes:
	//	* If an element is missing an end tag, the parser invents
	//	  end tags as necessary to keep the return values from Token
	//	  properly balanced.
	//	* In attribute values and character data, unknown or malformed
	//	  character entities (sequences beginning with &) are left alone.
	//
	// Setting:
	//
	//	d.Strict = false;
	//	d.AutoClose = HTMLAutoClose;
	//	d.Entity = HTMLEntity
	//
	// creates a parser that can handle typical HTML.
	//
	// Strict mode does not enforce the requirements of the XML name spaces TR.
	// In particular it does not reject name space tags using undefined prefixes.
	// Such tags are recorded with the unknown prefix as the name space URL.
	Strict bool

	// When Strict == false, AutoClose indicates a set of elements to
	// consider closed immediately after they are opened, regardless
	// of whether an end element is present.
	AutoClose []string

	// Entity can be used to map non-standard entity names to string replacements.
	// The parser behaves as if these standard mappings are present in the map,
	// regardless of the actual map content:
	//
	//	"lt": "<",
	//	"gt": ">",
	//	"amp": "&",
	//	"apos": "'",
	//	"quot": `"`,
	Entity map[string]string

	// CharsetReader, if non-nil, defines a function to generate
	// charset-conversion readers, converting from the provided
	// non-UTF-8 charset into UTF-8. If CharsetReader is nil or
	// returns an error, parsing stops with an error. One of the
	// the CharsetReader's result values must be non-nil.
	CharsetReader func(charset string, input io.Reader) (io.Reader, error)

	// DefaultSpace sets the default name space used for unadorned tags,
	// as if the entire XML stream were wrapped in an element containing
	// the attribute xmlns="DefaultSpace".
	DefaultSpace string
	// contains filtered or unexported fields
}

A Decoder represents an XML parser reading a particular input stream. The parser assumes that its input is encoded in UTF-8.

func NewDecoder

func NewDecoder(r io.Reader) *Decoder

NewDecoder creates a new XML parser reading from r. If r does not implement io.ByteReader, NewDecoder will do its own buffering.

func (*Decoder) Decode

func (d *Decoder) Decode(v interface{}) error

Decode works like xml.Unmarshal, except it reads the decoder stream to find the start element.

func (*Decoder) DecodeElement

func (d *Decoder) DecodeElement(v interface{}, start *StartElement) error

DecodeElement works like xml.Unmarshal except that it takes a pointer to the start XML element to decode into v. It is useful when a client reads some raw XML tokens itself but also wants to defer to Unmarshal for some elements.

func (*Decoder) InputOffset

func (d *Decoder) InputOffset() int64

InputOffset returns the input stream byte offset of the current decoder position. The offset gives the location of the end of the most recently returned token and the beginning of the next token.

func (*Decoder) RawToken

func (d *Decoder) RawToken() (Token, error)

RawToken is like Token but does not verify that start and end elements match and does not translate name space prefixes to their corresponding URLs.

func (*Decoder) Skip

func (d *Decoder) Skip() error

Skip reads tokens until it has consumed the end element matching the most recent start element already consumed. It recurs if it encounters a start element, so it can be used to skip nested structures. It returns nil if it finds an end element matching the start element; otherwise it returns an error describing the problem.

func (*Decoder) Token

func (d *Decoder) Token() (t Token, err error)

Token returns the next XML token in the input stream. At the end of the input stream, Token returns nil, io.EOF.

Slices of bytes in the returned token data refer to the parser's internal buffer and remain valid only until the next call to Token. To acquire a copy of the bytes, call CopyToken or the token's Copy method.

Token expands self-closing elements such as <br/> into separate start and end elements returned by successive calls.

Token guarantees that the StartElement and EndElement tokens it returns are properly nested and matched: if Token encounters an unexpected end element, it will return an error.

Token implements XML name spaces as described by http://www.w3.org/TR/REC-xml-names/. Each of the Name structures contained in the Token has the Space set to the URL identifying its name space when known. If Token encounters an unrecognized name space prefix, it uses the prefix as the Space rather than report an error.

type Directive

type Directive []byte

A Directive represents an XML directive of the form <!text>. The bytes do not include the <! and > markers.

func (Directive) Copy

func (d Directive) Copy() Directive

type EndElement

type EndElement struct {
	Name Name
}

An EndElement represents an XML end element.

type Name

type Name struct {
	Space, Local string
}

A Name represents an XML name (Local) annotated with a name space identifier (Space). In tokens returned by Decoder.Token, the Space identifier is given as a canonical URL, not the short prefix used in the document being parsed.

type Option

type Option func(x *xmlParser)

Option is used set options when creating a new XMLParser

type ProcInst

type ProcInst struct {
	Target string
	Inst   []byte
}

A ProcInst represents an XML processing instruction of the form <?target inst?>

func (ProcInst) Copy

func (p ProcInst) Copy() ProcInst

type StartElement

type StartElement struct {
	Name Name
	Attr []Attr
}

A StartElement represents an XML start element.

func (StartElement) Copy

func (e StartElement) Copy() StartElement

func (StartElement) End

func (e StartElement) End() EndElement

End returns the corresponding XML end element.

type SyntaxError

type SyntaxError struct {
	Msg  string
	Line int
}

A SyntaxError represents a syntax error in the XML input stream.

func (*SyntaxError) Error

func (e *SyntaxError) Error() string

type TagPathError

type TagPathError struct {
	Struct       reflect.Type
	Field1, Tag1 string
	Field2, Tag2 string
}

A TagPathError represents an error in the unmarshalling process caused by the use of field tags with conflicting paths.

func (*TagPathError) Error

func (e *TagPathError) Error() string

type Token

type Token interface{}

A Token is an interface holding one of the token types: StartElement, EndElement, CharData, Comment, ProcInst, or Directive.

func CopyToken

func CopyToken(t Token) Token

CopyToken returns a copy of a Token.

type UnmarshalError

type UnmarshalError string

An UnmarshalError represents an error in the unmarshalling process.

func (UnmarshalError) Error

func (e UnmarshalError) Error() string

type Unmarshaler

type Unmarshaler interface {
	UnmarshalXML(d *Decoder, start StartElement) error
}

Unmarshaler is the interface implemented by objects that can unmarshal an XML element description of themselves.

UnmarshalXML decodes a single XML element beginning with the given start element. If it returns an error, the outer call to Unmarshal stops and returns that error. UnmarshalXML must consume exactly one XML element. One common implementation strategy is to unmarshal into a separate value with a layout matching the expected XML using d.DecodeElement, and then to copy the data from that value into the receiver. Another common strategy is to use d.Token to process the XML object one token at a time. UnmarshalXML may not use d.RawToken.

type UnmarshalerAttr

type UnmarshalerAttr interface {
	UnmarshalXMLAttr(attr Attr) error
}

UnmarshalerAttr is the interface implemented by objects that can unmarshal an XML attribute description of themselves.

UnmarshalXMLAttr decodes a single XML attribute. If it returns an error, the outer call to Unmarshal stops and returns that error. UnmarshalXMLAttr is used only for struct fields with the "attr" option in the field tag.

type XMLParser

type XMLParser interface {
	parser.Interface
	//Init intialises the parser with a byte buffer containing xml.
	Init([]byte) error
	Reset() error
}

XMLParser is an xml parser.

func NewXMLParser

func NewXMLParser(options ...Option) XMLParser

NewXMLParser returns a new xml parser.

Notes

Bugs

  • Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values. See package json for a textual representation more suitable to data structures.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL