Documentation ¶
Overview ¶
Package xml contains a parser for XML.
Package xml implements a simple XML 1.0 parser that understands XML name spaces.
Index ¶
- Constants
- Variables
- func Compare(a, b []byte) int
- func Equal(a, b []byte) bool
- func IndexByte(s []byte, c byte) int
- func Unmarshal(data []byte, v interface{}) error
- func WithAttrPrefix(a string) func(x *xmlParser)
- func WithElemPrefix(e string) func(x *xmlParser)
- func WithTextPrefix(e string) func(x *xmlParser)
- type Attr
- type Buffer
- func (b *Buffer) Bytes() []byte
- func (b *Buffer) Cap() int
- func (b *Buffer) Grow(n int)
- func (b *Buffer) Len() int
- func (b *Buffer) Next(n int) []byte
- func (b *Buffer) Read(p []byte) (n int, err error)
- func (b *Buffer) ReadByte() (c byte, err error)
- func (b *Buffer) ReadBytes(delim byte) (line []byte, err error)
- func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error)
- func (b *Buffer) ReadRune() (r rune, size int, err error)
- func (b *Buffer) ReadString(delim byte) (line string, err error)
- func (b *Buffer) Reset()
- func (b *Buffer) String() string
- func (b *Buffer) Truncate(n int)
- func (b *Buffer) UnreadByte() error
- func (b *Buffer) UnreadRune() error
- func (b *Buffer) Write(p []byte) (n int, err error)
- func (b *Buffer) WriteByte(c byte) error
- func (b *Buffer) WriteRune(r rune) (n int, err error)
- func (b *Buffer) WriteString(s string) (n int, err error)
- func (b *Buffer) WriteTo(w io.Writer) (n int64, err error)
- type CharData
- type Comment
- type Decoder
- type Directive
- type EndElement
- type Name
- type Option
- type ProcInst
- type StartElement
- type SyntaxError
- type TagPathError
- type Token
- type UnmarshalError
- type Unmarshaler
- type UnmarshalerAttr
- type XMLParser
- Bugs
Constants ¶
const MinRead = 512
MinRead is the minimum slice size passed to a Read call by Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond what is required to hold the contents of r, ReadFrom will not grow the underlying buffer.
Variables ¶
var ErrTooLarge = errors.New("bytes.Buffer: too large")
ErrTooLarge is passed to panic if memory cannot be allocated to store data in a buffer.
var HTMLAutoClose = htmlAutoClose
HTMLAutoClose is the set of HTML elements that should be considered to close automatically.
var HTMLEntity = htmlEntity
HTMLEntity is an entity map containing translations for the standard HTML entity characters.
Functions ¶
func Compare ¶
Compare returns an integer comparing two byte slices lexicographically. The result will be 0 if a==b, -1 if a < b, and +1 if a > b. A nil argument is equivalent to an empty slice.
func Equal ¶
Equal returns a boolean reporting whether a and b are the same length and contain the same bytes. A nil argument is equivalent to an empty slice.
func IndexByte ¶
IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.
func Unmarshal ¶
Unmarshal parses the XML-encoded data and stores the result in the value pointed to by v, which must be an arbitrary struct, slice, or string. Well-formed data that does not fit into v is discarded.
Because Unmarshal uses the reflect package, it can only assign to exported (upper case) fields. Unmarshal uses a case-sensitive comparison to match XML element names to tag values and struct field names.
Unmarshal maps an XML element to a struct using the following rules. In the rules, the tag of a field refers to the value associated with the key 'xml' in the struct field's tag (see the example above).
If the struct has a field of type []byte or string with tag ",innerxml", Unmarshal accumulates the raw XML nested inside the element in that field. The rest of the rules still apply.
If the struct has a field named XMLName of type xml.Name, Unmarshal records the element name in that field.
If the XMLName field has an associated tag of the form "name" or "namespace-URL name", the XML element must have the given name (and, optionally, name space) or else Unmarshal returns an error.
If the XML element has an attribute whose name matches a struct field name with an associated tag containing ",attr" or the explicit name in a struct field tag of the form "name,attr", Unmarshal records the attribute value in that field.
If the XML element contains character data, that data is accumulated in the first struct field that has tag ",chardata". The struct field may have type []byte or string. If there is no such field, the character data is discarded.
If the XML element contains comments, they are accumulated in the first struct field that has tag ",comment". The struct field may have type []byte or string. If there is no such field, the comments are discarded.
If the XML element contains a sub-element whose name matches the prefix of a tag formatted as "a" or "a>b>c", unmarshal will descend into the XML structure looking for elements with the given names, and will map the innermost elements to that struct field. A tag starting with ">" is equivalent to one starting with the field name followed by ">".
If the XML element contains a sub-element whose name matches a struct field's XMLName tag and the struct field has no explicit name tag as per the previous rule, unmarshal maps the sub-element to that struct field.
If the XML element contains a sub-element whose name matches a field without any mode flags (",attr", ",chardata", etc), Unmarshal maps the sub-element to that struct field.
If the XML element contains a sub-element that hasn't matched any of the above rules and the struct has a field with tag ",any", unmarshal maps the sub-element to that struct field.
An anonymous struct field is handled as if the fields of its value were part of the outer struct.
A struct field with tag "-" is never unmarshalled into.
Unmarshal maps an XML element to a string or []byte by saving the concatenation of that element's character data in the string or []byte. The saved []byte is never nil.
Unmarshal maps an attribute value to a string or []byte by saving the value in the string or slice.
Unmarshal maps an XML element to a slice by extending the length of the slice and mapping the element to the newly created value.
Unmarshal maps an XML element or attribute value to a bool by setting it to the boolean value represented by the string.
Unmarshal maps an XML element or attribute value to an integer or floating-point field by setting the field to the result of interpreting the string value in decimal. There is no check for overflow.
Unmarshal maps an XML element to an xml.Name by recording the element name.
Unmarshal maps an XML element to a pointer by setting the pointer to a freshly allocated value and then mapping the element to that value.
func WithAttrPrefix ¶
func WithAttrPrefix(a string) func(x *xmlParser)
WithAttrPrefix specifies the prefix which will be added to attributes returned by the parser.
func WithElemPrefix ¶
func WithElemPrefix(e string) func(x *xmlParser)
WithElemPrefix specifies the prefix which will be added to elements returned by the parser.
func WithTextPrefix ¶
func WithTextPrefix(e string) func(x *xmlParser)
WithTextPrefix specifies the prefix which will be added to text returned by the parser.
Types ¶
type Buffer ¶
type Buffer struct {
// contains filtered or unexported fields
}
A Buffer is a variable-sized buffer of bytes with Read and Write methods. The zero value for Buffer is an empty buffer ready to use.
func NewBuffer ¶
NewBuffer creates and initializes a new Buffer using buf as its initial contents. It is intended to prepare a Buffer to read existing data. It can also be used to size the internal buffer for writing. To do that, buf should have the desired capacity but a length of zero.
In most cases, new(Buffer) (or just declaring a Buffer variable) is sufficient to initialize a Buffer.
func NewBufferString ¶
NewBufferString creates and initializes a new Buffer using string s as its initial contents. It is intended to prepare a buffer to read an existing string.
In most cases, new(Buffer) (or just declaring a Buffer variable) is sufficient to initialize a Buffer.
func (*Buffer) Bytes ¶
Bytes returns a slice of the contents of the unread portion of the buffer; len(b.Bytes()) == b.Len(). If the caller changes the contents of the returned slice, the contents of the buffer will change provided there are no intervening method calls on the Buffer.
func (*Buffer) Cap ¶
Cap returns the capacity of the buffer's underlying byte slice, that is, the total space allocated for the buffer's data.
func (*Buffer) Grow ¶
Grow grows the buffer's capacity, if necessary, to guarantee space for another n bytes. After Grow(n), at least n bytes can be written to the buffer without another allocation. If n is negative, Grow will panic. If the buffer can't grow it will panic with ErrTooLarge.
func (*Buffer) Len ¶
Len returns the number of bytes of the unread portion of the buffer; b.Len() == len(b.Bytes()).
func (*Buffer) Next ¶
Next returns a slice containing the next n bytes from the buffer, advancing the buffer as if the bytes had been returned by Read. If there are fewer than n bytes in the buffer, Next returns the entire buffer. The slice is only valid until the next call to a read or write method.
func (*Buffer) Read ¶
Read reads the next len(p) bytes from the buffer or until the buffer is drained. The return value n is the number of bytes read. If the buffer has no data to return, err is io.EOF (unless len(p) is zero); otherwise it is nil.
func (*Buffer) ReadByte ¶
ReadByte reads and returns the next byte from the buffer. If no byte is available, it returns error io.EOF.
func (*Buffer) ReadBytes ¶
ReadBytes reads until the first occurrence of delim in the input, returning a slice containing the data up to and including the delimiter. If ReadBytes encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF). ReadBytes returns err != nil if and only if the returned data does not end in delim.
func (*Buffer) ReadFrom ¶
ReadFrom reads data from r until EOF and appends it to the buffer, growing the buffer as needed. The return value n is the number of bytes read. Any error except io.EOF encountered during the read is also returned. If the buffer becomes too large, ReadFrom will panic with ErrTooLarge.
func (*Buffer) ReadRune ¶
ReadRune reads and returns the next UTF-8-encoded Unicode code point from the buffer. If no bytes are available, the error returned is io.EOF. If the bytes are an erroneous UTF-8 encoding, it consumes one byte and returns U+FFFD, 1.
func (*Buffer) ReadString ¶
ReadString reads until the first occurrence of delim in the input, returning a string containing the data up to and including the delimiter. If ReadString encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF). ReadString returns err != nil if and only if the returned data does not end in delim.
func (*Buffer) Reset ¶
func (b *Buffer) Reset()
Reset resets the buffer so it has no content. b.Reset() is the same as b.Truncate(0).
func (*Buffer) String ¶
String returns the contents of the unread portion of the buffer as a string. If the Buffer is a nil pointer, it returns "<nil>".
func (*Buffer) Truncate ¶
Truncate discards all but the first n unread bytes from the buffer. It panics if n is negative or greater than the length of the buffer.
func (*Buffer) UnreadByte ¶
UnreadByte unreads the last byte returned by the most recent read operation. If write has happened since the last read, UnreadByte returns an error.
func (*Buffer) UnreadRune ¶
UnreadRune unreads the last rune returned by ReadRune. If the most recent read or write operation on the buffer was not a ReadRune, UnreadRune returns an error. (In this regard it is stricter than UnreadByte, which will unread the last byte from any read operation.)
func (*Buffer) Write ¶
Write appends the contents of p to the buffer, growing the buffer as needed. The return value n is the length of p; err is always nil. If the buffer becomes too large, Write will panic with ErrTooLarge.
func (*Buffer) WriteByte ¶
WriteByte appends the byte c to the buffer, growing the buffer as needed. The returned error is always nil, but is included to match bufio.Writer's WriteByte. If the buffer becomes too large, WriteByte will panic with ErrTooLarge.
func (*Buffer) WriteRune ¶
WriteRune appends the UTF-8 encoding of Unicode code point r to the buffer, returning its length and an error, which is always nil but is included to match bufio.Writer's WriteRune. The buffer is grown as needed; if it becomes too large, WriteRune will panic with ErrTooLarge.
func (*Buffer) WriteString ¶
WriteString appends the contents of s to the buffer, growing the buffer as needed. The return value n is the length of s; err is always nil. If the buffer becomes too large, WriteString will panic with ErrTooLarge.
type CharData ¶
type CharData []byte
A CharData represents XML character data (raw text), in which XML escape sequences have been replaced by the characters they represent.
type Comment ¶
type Comment []byte
A Comment represents an XML comment of the form <!--comment-->. The bytes do not include the <!-- and --> comment markers.
type Decoder ¶
type Decoder struct { // Strict defaults to true, enforcing the requirements // of the XML specification. // If set to false, the parser allows input containing common // mistakes: // * If an element is missing an end tag, the parser invents // end tags as necessary to keep the return values from Token // properly balanced. // * In attribute values and character data, unknown or malformed // character entities (sequences beginning with &) are left alone. // // Setting: // // d.Strict = false; // d.AutoClose = HTMLAutoClose; // d.Entity = HTMLEntity // // creates a parser that can handle typical HTML. // // Strict mode does not enforce the requirements of the XML name spaces TR. // In particular it does not reject name space tags using undefined prefixes. // Such tags are recorded with the unknown prefix as the name space URL. Strict bool // When Strict == false, AutoClose indicates a set of elements to // consider closed immediately after they are opened, regardless // of whether an end element is present. AutoClose []string // Entity can be used to map non-standard entity names to string replacements. // The parser behaves as if these standard mappings are present in the map, // regardless of the actual map content: // // "lt": "<", // "gt": ">", // "amp": "&", // "apos": "'", // "quot": `"`, Entity map[string]string // CharsetReader, if non-nil, defines a function to generate // charset-conversion readers, converting from the provided // non-UTF-8 charset into UTF-8. If CharsetReader is nil or // returns an error, parsing stops with an error. One of the // the CharsetReader's result values must be non-nil. CharsetReader func(charset string, input io.Reader) (io.Reader, error) // DefaultSpace sets the default name space used for unadorned tags, // as if the entire XML stream were wrapped in an element containing // the attribute xmlns="DefaultSpace". DefaultSpace string // contains filtered or unexported fields }
A Decoder represents an XML parser reading a particular input stream. The parser assumes that its input is encoded in UTF-8.
func NewDecoder ¶
NewDecoder creates a new XML parser reading from r. If r does not implement io.ByteReader, NewDecoder will do its own buffering.
func (*Decoder) Decode ¶
Decode works like xml.Unmarshal, except it reads the decoder stream to find the start element.
func (*Decoder) DecodeElement ¶
func (d *Decoder) DecodeElement(v interface{}, start *StartElement) error
DecodeElement works like xml.Unmarshal except that it takes a pointer to the start XML element to decode into v. It is useful when a client reads some raw XML tokens itself but also wants to defer to Unmarshal for some elements.
func (*Decoder) InputOffset ¶
InputOffset returns the input stream byte offset of the current decoder position. The offset gives the location of the end of the most recently returned token and the beginning of the next token.
func (*Decoder) RawToken ¶
RawToken is like Token but does not verify that start and end elements match and does not translate name space prefixes to their corresponding URLs.
func (*Decoder) Skip ¶
Skip reads tokens until it has consumed the end element matching the most recent start element already consumed. It recurs if it encounters a start element, so it can be used to skip nested structures. It returns nil if it finds an end element matching the start element; otherwise it returns an error describing the problem.
func (*Decoder) Token ¶
Token returns the next XML token in the input stream. At the end of the input stream, Token returns nil, io.EOF.
Slices of bytes in the returned token data refer to the parser's internal buffer and remain valid only until the next call to Token. To acquire a copy of the bytes, call CopyToken or the token's Copy method.
Token expands self-closing elements such as <br/> into separate start and end elements returned by successive calls.
Token guarantees that the StartElement and EndElement tokens it returns are properly nested and matched: if Token encounters an unexpected end element, it will return an error.
Token implements XML name spaces as described by http://www.w3.org/TR/REC-xml-names/. Each of the Name structures contained in the Token has the Space set to the URL identifying its name space when known. If Token encounters an unrecognized name space prefix, it uses the prefix as the Space rather than report an error.
type Directive ¶
type Directive []byte
A Directive represents an XML directive of the form <!text>. The bytes do not include the <! and > markers.
type Name ¶
type Name struct {
Space, Local string
}
A Name represents an XML name (Local) annotated with a name space identifier (Space). In tokens returned by Decoder.Token, the Space identifier is given as a canonical URL, not the short prefix used in the document being parsed.
type Option ¶
type Option func(x *xmlParser)
Option is used set options when creating a new XMLParser
type StartElement ¶
A StartElement represents an XML start element.
func (StartElement) Copy ¶
func (e StartElement) Copy() StartElement
func (StartElement) End ¶
func (e StartElement) End() EndElement
End returns the corresponding XML end element.
type SyntaxError ¶
A SyntaxError represents a syntax error in the XML input stream.
func (*SyntaxError) Error ¶
func (e *SyntaxError) Error() string
type TagPathError ¶
A TagPathError represents an error in the unmarshalling process caused by the use of field tags with conflicting paths.
func (*TagPathError) Error ¶
func (e *TagPathError) Error() string
type Token ¶
type Token interface{}
A Token is an interface holding one of the token types: StartElement, EndElement, CharData, Comment, ProcInst, or Directive.
type UnmarshalError ¶
type UnmarshalError string
An UnmarshalError represents an error in the unmarshalling process.
func (UnmarshalError) Error ¶
func (e UnmarshalError) Error() string
type Unmarshaler ¶
type Unmarshaler interface {
UnmarshalXML(d *Decoder, start StartElement) error
}
Unmarshaler is the interface implemented by objects that can unmarshal an XML element description of themselves.
UnmarshalXML decodes a single XML element beginning with the given start element. If it returns an error, the outer call to Unmarshal stops and returns that error. UnmarshalXML must consume exactly one XML element. One common implementation strategy is to unmarshal into a separate value with a layout matching the expected XML using d.DecodeElement, and then to copy the data from that value into the receiver. Another common strategy is to use d.Token to process the XML object one token at a time. UnmarshalXML may not use d.RawToken.
type UnmarshalerAttr ¶
UnmarshalerAttr is the interface implemented by objects that can unmarshal an XML attribute description of themselves.
UnmarshalXMLAttr decodes a single XML attribute. If it returns an error, the outer call to Unmarshal stops and returns that error. UnmarshalXMLAttr is used only for struct fields with the "attr" option in the field tag.
Notes ¶
Bugs ¶
Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values. See package json for a textual representation more suitable to data structures.