xmlquery

package

v0.0.0-...-34b2b92 Latest Latest Go to latest Published: Jun 23, 2022 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/zhangdapeng520/zdpgo_xpath

Documentation ¶

Overview ¶

Package xmlquery provides extract data from XML documents using XPath expression.

Index ¶

Variables
func AddAttr(n *Node, key, val string)
func AddChild(parent, n *Node)
func AddSibling(sibling, n *Node)
func FindEach(top *Node, expr string, cb func(int, *Node))
func FindEachWithBreak(top *Node, expr string, cb func(int, *Node) bool)
func RemoveFromTree(n *Node)
type Attr
type DecoderOptions
type Node
- func Find(top *Node, expr string) []*Node
- func FindOne(top *Node, expr string) *Node
- func LoadURL(url string) (*Node, error)
- func Parse(r io.Reader) (*Node, error)
- func ParseWithOptions(r io.Reader, options ParserOptions) (*Node, error)
- func Query(top *Node, expr string) (*Node, error)
- func QueryAll(top *Node, expr string) ([]*Node, error)
- func QuerySelector(top *Node, selector *xpath.Expr) *Node
- func QuerySelectorAll(top *Node, selector *xpath.Expr) []*Node
- func (n *Node) InnerText() string
- func (n *Node) OutputXML(self bool) string
- func (n *Node) SelectAttr(name string) string
- func (n *Node) SelectElement(name string) *Node
- func (n *Node) SelectElements(name string) []*Node
type NodeNavigator
- func CreateXPathNavigator(top *Node) *NodeNavigator
- func (x *NodeNavigator) Copy() xpath.NodeNavigator
- func (x *NodeNavigator) Current() *Node
- func (x *NodeNavigator) LocalName() string
- func (x *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool
- func (x *NodeNavigator) MoveToChild() bool
- func (x *NodeNavigator) MoveToFirst() bool
- func (x *NodeNavigator) MoveToNext() bool
- func (x *NodeNavigator) MoveToNextAttribute() bool
- func (x *NodeNavigator) MoveToParent() bool
- func (x *NodeNavigator) MoveToPrevious() bool
- func (x *NodeNavigator) MoveToRoot()
- func (x *NodeNavigator) NamespaceURL() string
- func (x *NodeNavigator) NodeType() xpath.NodeType
- func (x *NodeNavigator) Prefix() string
- func (x *NodeNavigator) String() string
- func (x *NodeNavigator) Value() string
type NodeType
type ParserOptions
type StreamParser
- func CreateStreamParser(r io.Reader, streamElementXPath string, streamElementFilter ...string) (*StreamParser, error)
- func CreateStreamParserWithOptions(r io.Reader, options ParserOptions, streamElementXPath string, ...) (*StreamParser, error)
- func (sp *StreamParser) Read() (*Node, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var DisableSelectorCache = false

DisableSelectorCache will disable caching for the query selector if value is true.

View Source

var SelectorCacheMaxEntries = 50

SelectorCacheMaxEntries allows how many selector object can be caching. Default is 50. Will disable caching if SelectorCacheMaxEntries <= 0.

Functions ¶

func AddAttr ¶

func AddAttr(n *Node, key, val string)

AddAttr adds a new attribute specified by 'key' and 'val' to a node 'n'.

func AddChild ¶

func AddChild(parent, n *Node)

AddChild adds a new node 'n' to a node 'parent' as its last child.

func AddSibling ¶

func AddSibling(sibling, n *Node)

AddSibling adds a new node 'n' as a sibling of a given node 'sibling'. Note it is not necessarily true that the new node 'n' would be added immediately after 'sibling'. If 'sibling' isn't the last child of its parent, then the new node 'n' will be added at the end of the sibling chain of their parent.

func FindEach ¶

func FindEach(top *Node, expr string, cb func(int, *Node))

FindEach searches the html.Node and calls functions cb. Important: this method is deprecated, instead, use for .. = range Find(){}.

func FindEachWithBreak ¶

func FindEachWithBreak(top *Node, expr string, cb func(int, *Node) bool)

FindEachWithBreak functions the same as FindEach but allows to break the loop by returning false from the callback function `cb`. Important: this method is deprecated, instead, use .. = range Find(){}.

func RemoveFromTree ¶

func RemoveFromTree(n *Node)

RemoveFromTree removes a node and its subtree from the document tree it is in. If the node is the root of the tree, then it's no-op.

Types ¶

type Attr ¶

type Attr struct {
	Name         xml.Name
	Value        string
	NamespaceURI string
}

type DecoderOptions ¶

type DecoderOptions struct {
	Strict    bool
	AutoClose []string
	Entity    map[string]string
}

DecoderOptions implement the very same options than the standard encoding/xml package. Please refer to this documentation: https://golang.org/pkg/encoding/xml/#Decoder

type Node ¶

type Node struct {
	Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node

	Type         NodeType
	Data         string
	Prefix       string
	NamespaceURI string
	Attr         []Attr
	// contains filtered or unexported fields
}

A Node consists of a NodeType and some Data (tag name for element nodes, content for text) and are part of a tree of Nodes.

func Find ¶

func Find(top *Node, expr string) []*Node

Find is like QueryAll but panics if `expr` is not a valid XPath expression. See `QueryAll()` function.

func FindOne ¶

func FindOne(top *Node, expr string) *Node

FindOne is like Query but panics if `expr` is not a valid XPath expression. See `Query()` function.

func LoadURL ¶

func LoadURL(url string) (*Node, error)

LoadURL loads the XML document from the specified URL.

func Parse ¶

func Parse(r io.Reader) (*Node, error)

Parse returns the parse tree for the XML from the given Reader.

func ParseWithOptions ¶

func ParseWithOptions(r io.Reader, options ParserOptions) (*Node, error)

ParseWithOptions is like parse, but with custom options

func Query ¶

func Query(top *Node, expr string) (*Node, error)

Query searches the XML Node that matches by the specified XPath expr, and returns first matched element.

func QueryAll ¶

func QueryAll(top *Node, expr string) ([]*Node, error)

QueryAll searches the XML Node that matches by the specified XPath expr. Returns an error if the expression `expr` cannot be parsed.

func QuerySelector ¶

func QuerySelector(top *Node, selector *xpath.Expr) *Node

QuerySelector returns the first matched XML Node by the specified XPath selector.

func QuerySelectorAll ¶

func QuerySelectorAll(top *Node, selector *xpath.Expr) []*Node

QuerySelectorAll searches all of the XML Node that matches the specified XPath selectors.

func (*Node) InnerText ¶

func (n *Node) InnerText() string

InnerText returns the text between the start and end tags of the object.

func (*Node) OutputXML ¶

func (n *Node) OutputXML(self bool) string

OutputXML returns the text that including tags name.

func (*Node) SelectAttr ¶

func (n *Node) SelectAttr(name string) string

SelectAttr returns the attribute value with the specified name.

func (*Node) SelectElement ¶

func (n *Node) SelectElement(name string) *Node

SelectElement finds child elements with the specified name.

func (*Node) SelectElements ¶

func (n *Node) SelectElements(name string) []*Node

SelectElements finds child elements with the specified name.

type NodeNavigator ¶

type NodeNavigator struct {
	// contains filtered or unexported fields
}

func CreateXPathNavigator ¶

func CreateXPathNavigator(top *Node) *NodeNavigator

CreateXPathNavigator creates a new xpath.NodeNavigator for the specified XML Node.

func (*NodeNavigator) Copy ¶

func (x *NodeNavigator) Copy() xpath.NodeNavigator

func (*NodeNavigator) Current ¶

func (x *NodeNavigator) Current() *Node

func (*NodeNavigator) LocalName ¶

func (x *NodeNavigator) LocalName() string

func (*NodeNavigator) MoveTo ¶

func (x *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

func (*NodeNavigator) MoveToChild ¶

func (x *NodeNavigator) MoveToChild() bool

func (*NodeNavigator) MoveToFirst ¶

func (x *NodeNavigator) MoveToFirst() bool

func (*NodeNavigator) MoveToNext ¶

func (x *NodeNavigator) MoveToNext() bool

func (*NodeNavigator) MoveToNextAttribute ¶

func (x *NodeNavigator) MoveToNextAttribute() bool

func (*NodeNavigator) MoveToParent ¶

func (x *NodeNavigator) MoveToParent() bool

func (*NodeNavigator) MoveToPrevious ¶

func (x *NodeNavigator) MoveToPrevious() bool

func (*NodeNavigator) MoveToRoot ¶

func (x *NodeNavigator) MoveToRoot()

func (*NodeNavigator) NamespaceURL ¶

func (x *NodeNavigator) NamespaceURL() string

func (*NodeNavigator) NodeType ¶

func (x *NodeNavigator) NodeType() xpath.NodeType

func (*NodeNavigator) Prefix ¶

func (x *NodeNavigator) Prefix() string

func (*NodeNavigator) String ¶

func (x *NodeNavigator) String() string

func (*NodeNavigator) Value ¶

func (x *NodeNavigator) Value() string

type NodeType ¶

type NodeType uint

A NodeType is the type of a Node.

const (
	// DocumentNode is a document object that, as the root of the document tree,
	// provides access to the entire XML document.
	DocumentNode NodeType = iota
	// DeclarationNode is the document type declaration, indicated by the
	// following tag (for example, <!DOCTYPE...> ).
	DeclarationNode
	// ElementNode is an element (for example, <item> ).
	ElementNode
	// TextNode is the text content of a node.
	TextNode
	// CharDataNode node <![CDATA[content]]>
	CharDataNode
	// CommentNode a comment (for example, <!-- my comment --> ).
	CommentNode
	// AttributeNode is an attribute of element.
	AttributeNode
)

type ParserOptions ¶

type ParserOptions struct {
	Decoder *DecoderOptions
}

type StreamParser ¶

type StreamParser struct {
	// contains filtered or unexported fields
}

StreamParser enables loading and parsing an XML document in a streaming fashion.

func CreateStreamParser ¶

func CreateStreamParser(r io.Reader, streamElementXPath string, streamElementFilter ...string) (*StreamParser, error)

CreateStreamParser creates a StreamParser. Argument streamElementXPath is required. Argument streamElementFilter is optional and should only be used in advanced scenarios.

Scenario 1: simple case:

xml := `<AAA><BBB>b1</BBB><BBB>b2</BBB></AAA>`
sp, err := CreateStreamParser(strings.NewReader(xml), "/AAA/BBB")
if err != nil {
    panic(err)
}
for {
    n, err := sp.Read()
    if err != nil {
        break
    }
    fmt.Println(n.OutputXML(true))
}

Output will be:

<BBB>b1</BBB>
<BBB>b2</BBB>

Scenario 2: advanced case:

xml := `<AAA><BBB>b1</BBB><BBB>b2</BBB></AAA>`
sp, err := CreateStreamParser(strings.NewReader(xml), "/AAA/BBB", "/AAA/BBB[. != 'b1']")
if err != nil {
    panic(err)
}
for {
    n, err := sp.Read()
    if err != nil {
        break
    }
    fmt.Println(n.OutputXML(true))
}

Output will be:

<BBB>b2</BBB>

As the argument names indicate, streamElementXPath should be used for providing xpath query pointing to the target element node only, no extra filtering on the element itself or its children; while streamElementFilter, if needed, can provide additional filtering on the target element and its children.

CreateStreamParser returns an error if either streamElementXPath or streamElementFilter, if provided, cannot be successfully parsed and compiled into a valid xpath query.

func CreateStreamParserWithOptions ¶

func CreateStreamParserWithOptions(
	r io.Reader,
	options ParserOptions,
	streamElementXPath string,
	streamElementFilter ...string,
) (*StreamParser, error)

CreateStreamParserWithOptions is like CreateStreamParser, but with custom options

func (*StreamParser) Read ¶

func (sp *StreamParser) Read() (*Node, error)

Read returns a target node that satisfies the XPath specified by caller at StreamParser creation time. If there is no more satisfying target nodes after reading the rest of the XML document, io.EOF will be returned. At any time, any XML parsing error encountered will be returned, and the stream parsing stopped. Calling Read() after an error is returned (including io.EOF) results undefined behavior. Also note, due to the streaming nature, calling Read() will automatically remove any previous target node(s) from the document tree.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL