xmltree

package module
v0.0.0-...-13fca47 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 28, 2023 License: MIT Imports: 10 Imported by: 0

README

XML Tree tool

The xmltree package converts xml documents to a tree data structure, and provides convenient methods for manipulating and searching through that tree.

Requires go 1.9 or greater for golang.org/x/html dependency.

This xmltree module was originally cloned from aqwari.net/xml.

Documentation

Overview

Package xmltree converts XML documents into a tree of Go values.

The xmltree package provides types and routines for accessing and manipulating XML documents as trees, along with functionality to resolve XML namespace prefixes at any point in the tree.

Index

Examples

Constants

View Source
const (
	XML_Tag      = Kind(0)
	XML_CharData = Kind(1 << iota)
	XML_Comment
	XML_ProcInst
	XML_Directive
)

Variables

This section is empty.

Functions

func Encode

func Encode(w io.Writer, el *Element) error

Encode writes the XML encoding of the Element to w. Encode returns any errors encountered writing to w.

func EncodeIndent

func EncodeIndent(w io.Writer, el *Element, prefix, indent string) error

EncodeIndent is like Encode, but adds line breaks for each successive element. Each line begins with prefix and is followed by zero or more copies of indent according to the nesting depth.

func Equal

func Equal(a, b *Element) bool

Equal returns true if two xmltree.Elements are equal, ignoring differences in white space, sub-element order, and namespace prefixes.

func Marshal

func Marshal(el *Element) []byte

Marshal produces the XML encoding of an Element as a self-contained document. The xmltree package may adjust the declarations of XML namespaces if the Element has been modified, or is part of a larger scope, such that the document produced by Marshal is a valid XML document.

The return value of Marshal will use the utf-8 encoding regardless of the original encoding of the source document.

Example
var input = []byte(`<?xml version="1.0" encoding="UTF-8"?>
	<toc>
	  <chapter-list>
	    <chapter>
	      <title>Civilizing Huck.Miss Watson.Tom Sawyer Waits.</title>
	      <number>1</number>
	    </chapter>
	    <chapter>
	      <title>The Boys Escape Jim.Torn Sawyer's Gang.Deep-laid Plans.</title>
	      <number>2</number>
	    </chapter>
	    <chapter>
	      <title>A Good Going-over.Grace Triumphant."One of Tom Sawyers's Lies".</title>
	      <number>3</number>
	    </chapter>
	    <chapter>
	      <title>Huck and the Judge.Superstition.</title>
	      <number>4</number>
	    </chapter>
	  </chapter-list>
	</toc>`)

var chapters []xmltree.Element
root, err := xmltree.ParseXML(bytes.NewReader(input))
if err != nil {
	log.Fatal(err)
}

for _, el := range root.Find(&xmltree.Selector{Name: xml.Name{Local: "chapter"}}) {
	title := el.MatchOne(&xmltree.Selector{Name: xml.Name{Local: "title"}})
	el.Children = nil
	el.Content = title.Content
	chapters = append(chapters, *el)
}
root.Children = chapters
fmt.Printf("%s\n", xmltree.MarshalIndent(root, "", "  "))
Output:

<toc>
  <chapter>Civilizing Huck.Miss Watson.Tom Sawyer Waits.</chapter>
  <chapter>The Boys Escape Jim.Torn Sawyer's Gang.Deep-laid Plans.</chapter>
  <chapter>A Good Going-over.Grace Triumphant."One of Tom Sawyers's Lies".</chapter>
  <chapter>Huck and the Judge.Superstition.</chapter>
</toc>

func MarshalIndent

func MarshalIndent(el *Element, prefix, indent string) []byte

MarshalIndent is like Marshal, but adds line breaks for each successive element. Each line begins with prefix and is followed by zero or more copies of indent according to the nesting depth.

func Unmarshal

func Unmarshal(el *Element, v interface{}) error

Unmarshal parses the XML encoding of the Element and stores the result in the value pointed to by v. Unmarshal follows the same rules as xml.Unmarshal, but only parses the portion of the XML document contained by the Element.

Example
var input = []byte(`<mediawiki xml:lang="en">
	  <page>
	    <title>Page title</title>
	    <restrictions>edit=sysop:move=sysop</restrictions>
	    <revision>
	      <timestamp>2001-01-15T13:15:00Z</timestamp>
	      <contributor><username>Foobar</username></contributor>
	      <comment>I have just one thing to say!</comment>
	      <text>A bunch of [[text]] here.</text>
	      <minor />
	    </revision>
	    <revision>
	      <timestamp>2001-01-15T13:10:27Z</timestamp>
	      <contributor><ip>10.0.0.2</ip></contributor>
	      <comment>new!</comment>
	      <text>An earlier [[revision]].</text>
	    </revision>
	  </page>
	  
	  <page>
	    <title>Talk:Page title</title>
	    <revision>
	      <timestamp>2001-01-15T14:03:00Z</timestamp>
	      <contributor><ip>10.0.0.2</ip></contributor>
	      <comment>hey</comment>
	      <text>WHYD YOU LOCK PAGE??!!! i was editing that jerk</text>
	    </revision>
	  </page>
	</mediawiki>`)

type revision struct {
	Timestamp   string   `xml:"timestamp"`
	Contributor string   `xml:"contributor>ip"`
	Comment     string   `xml:"comment"`
	Text        []string `xml:"text"`
}

root, err := xmltree.ParseXML(bytes.NewReader(input))
if err != nil {
	log.Fatal(err)
}

// Pull all <revision> items from the input
for _, el := range root.Find(&xmltree.Selector{Name: xml.Name{Local: "revision"}}) {
	var rev revision
	if err := xmltree.Unmarshal(el, &rev); err != nil {
		log.Print(err)
		continue
	}
	fmt.Println(rev.Timestamp, rev.Comment)
}
Output:

2001-01-15T13:15:00Z I have just one thing to say!
2001-01-15T13:10:27Z new!
2001-01-15T14:03:00Z hey

Types

type Element

type Element struct {
	// What is this element's kind
	Type Kind
	// Details about the Element if is a labeled tag
	xml.StartElement
	// The XML namespace scope at this element's location in the
	// document.
	Scope
	// The content contained within this element's end tags if no child
	// Elements are present.
	Content string
	// Sub-elements contained within this element.
	Children []Element
}

An Element represents a single element in an XML document. Elements may have zero or more children. The byte array used by the Content field is shared among all elements in the document, and should not be modified. An Element also captures xml namespace prefixes, so that arbitrary QNames in attribute values can be resolved.

func Parse

func Parse(doc io.Reader) (*Element, error)

Parse builds a tree of Elements by reading an XML document. The reader passed to Parse is expected to be a valid XML document with a single root element. All non XML Tag elements and Tagged content will be omitted from the tree (such as comments).

func ParseXML

func ParseXML(doc io.Reader) (*Element, error)

ParseXML builds a tree of Elements by reading an XML document for tagged entities only. The reader passed to Parse is expected to be a valid XML document with a single root element. All non XML Tag elements and Tagged content will be omitted from the tree (such as comments).

func (*Element) AddClass

func (el *Element) AddClass(class string)

AddClass adds a class attribute to an Element's existing class. If the class already exists, nothing is done

func (*Element) Attr

func (el *Element) Attr(space, local string) string

Attr gets the value of the first attribute whose name matches the space and local arguments. If space is the empty string, only attributes' local names are considered when looking for a match. If an attribute could not be found, the empty string is returned.

func (*Element) Each

func (el *Element) Each(fn func(*Element) error) (err error)

The Each method calls Func for each of the Element's children. If the Func returns a non-nil error, Each will return it immediately.

func (*Element) Find

func (el *Element) Find(match *Selector) []*Element

Find returns a slice of matching child Element(s) in a depth-first matching a search.

func (*Element) FindFunc

func (el *Element) FindFunc(fn func(*Element) bool) []*Element

FindFunc traverses the Element tree in depth-first order and returns a slice of Elements for which the function fn returns true.

Example
data := `
	  <People>
        <Person>
            <FullName>Grace R. Emlin</FullName>
            <Email where="home">
                <Addr>gre@example.com</Addr>
            </Email>
            <Email where='work'>
                <Addr>gre@work.com</Addr>
            </Email>
        </Person>
        <Person>
            <FullName>Michael P. Thompson</FullName>
            <Email where="home">
                <Addr>michaelp@example.com</Addr>
            </Email>
            <Email where='work'>
                <Addr>michaelp@work.com</Addr>
                <Addr>michael.thompson@work.com</Addr>
            </Email>
        </Person>
    </People>
	`

root, err := xmltree.ParseXML(strings.NewReader(data))
if err != nil {
	log.Fatal(err)
}

workEmails := root.FindFunc(func(el *xmltree.Element) bool {
	return el.Name.Local == "Email" && el.Attr("", "where") == "work"
})

for _, el := range workEmails {
	for _, addr := range el.Children {
		fmt.Printf("%s\n", addr.Content)
	}
}
Output:

gre@work.com
michaelp@work.com
michael.thompson@work.com

func (*Element) FindOne

func (el *Element) FindOne(match *Selector) *Element

FindOne returns a pointer to the first matching child of Element with a given match or nil if none matched.

func (*Element) First

func (el *Element) First() *Element

First returns a pointer to the first child of Element

func (*Element) Flatten

func (el *Element) Flatten() []*Element

Flatten produces a slice of Element pointers referring to the children of el, and their children, in depth-first order.

func (*Element) GetContent

func (el *Element) GetContent() string

Returns string of content if available

func (*Element) Last

func (el *Element) Last() *Element

Last returns a pointer to the first child of Element

func (*Element) Match

func (el *Element) Match(match *Selector) []*Element

Match returns a slice of matching child Element(s) matching a search.

func (*Element) MatchOne

func (el *Element) MatchOne(match *Selector) *Element

MatchOne returns a pointer to the first matching child of Element with a given match or nil if none matched.

func (*Element) RemoveAttr

func (el *Element) RemoveAttr(space, local string)

RemoveAttr removes an XML attribute from an Element's existing Attributes. If the attribute does not exist, no operation is done.

func (*Element) RemoveClass

func (el *Element) RemoveClass(class string)

RemoveClass adds a class attribute to an Element's existing class. If the class already exists, nothing is done

func (*Element) RemoveEmpty

func (el *Element) RemoveEmpty()

RemoveEmpty cleans up the tree of any empty elements.

func (*Element) RemoveLocalNS

func (el *Element) RemoveLocalNS() error

RemoteLocalNS will try to find a namespace which is already declared and use that namespace prefix instead of a locally defined one.

func (*Element) SetAttr

func (el *Element) SetAttr(space, local, value string)

SetAttr adds an XML attribute to an Element's existing Attributes. If the attribute already exists, it is replaced.

func (*Element) SetContent

func (el *Element) SetContent(val string)

Sets the content if available

func (*Element) SimplifyNS

func (el *Element) SimplifyNS()

SimplifyNS will try to find a namespace which is already declared and is used majorly in the file and use that namespace as the default instead of using prefix for everies in the XML file. Note: make sure a name is defined for every prefix used in the file, or deep lying "xmlns=" may be added.

func (*Element) String

func (el *Element) String() string

String returns the XML encoding of an Element and its children as a string.

func (*Element) WalkDepthFunc

func (el *Element) WalkDepthFunc(fn func(*Element) bool)

The WalkDepthFunc method calls Func for each of the Element's children in a depth-first order. If the Func returns true the children will continue to be considered, otherwise the depth is no longer searched.

func (*Element) WalkFunc

func (el *Element) WalkFunc(fn func(*Element) error) (err error)

The WalkFunc method calls Func for each of the Element's children in a depth-first order. If the Func returns a non-nil error, WalkFunc will return it immediately.

type Kind

type Kind uint8

type Scope

type Scope struct {
	// contains filtered or unexported fields
}

A Scope represents the xml namespace scope at a given position in the document.

func (*Scope) JoinScope

func (outer *Scope) JoinScope(inner *Scope) *Scope

The JoinScope method joins two Scopes together. When resolving prefixes using the returned scope, the prefix list in the argument Scope is searched before that of the receiver Scope.

func (*Scope) Prefix

func (scope *Scope) Prefix(name xml.Name) (qname string)

Prefix is the inverse of Resolve. It uses the closest prefix defined for a namespace to create a string of the form prefix:local. If the namespace cannot be found, or is the default namespace, an unqualified name is returned.

func (*Scope) Resolve

func (scope *Scope) Resolve(qname string) xml.Name

Resolve translates an XML QName (namespace-prefixed string) to an xml.Name with a canonicalized namespace in its Space field. This can be used when working with XSD documents, which put QNames in attribute values. If qname does not have a prefix, the default namespace is used.If a namespace prefix cannot be resolved, the returned value's Space field will be the unresolved prefix. Use the ResolveNS function to detect when a namespace prefix cannot be resolved.

func (*Scope) ResolveDefault

func (scope *Scope) ResolveDefault(qname, defaultns string) xml.Name

ResolveDefault is like Resolve, but allows for the default namespace to be overridden. The namespace of strings without a namespace prefix (known as an NCName in XML terminology) will be defaultns.

func (*Scope) ResolveNS

func (scope *Scope) ResolveNS(qname string) (xml.Name, bool)

The ResolveNS method is like Resolve, but returns false for its second return value if a namespace prefix cannot be resolved.

type Selector

type Selector struct {
	xml.Name
	Depth int
	Attr  []xml.Attr
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL