htmlnode

package module
v0.0.0-...-2915b32 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 26, 2015 License: GPL-3.0 Imports: 5 Imported by: 0

README

Htmlnode

Package htmlnode provides functions for searching, traversing and printing parsed HTML. It is based on the html.Node data type from the golang.org/x/net/html package.

Documentation at https://godoc.org/xi2.org/x/htmlnode.

Download and install with go get xi2.org/x/htmlnode.

Documentation

Overview

Package htmlnode provides functions for searching, traversing and printing parsed HTML. It is based on the html.Node data type from the golang.org/x/net/html package.

Note: The API is presently experimental and may change.

Example

The most useful function in the package is probably Find, and is best illustrated with an example.

Suppose we have a parsed HTML tree as follows:

    R
      E html
        E head
        E body
          E div id="lowframe" style="position: fixed;" ...
          C  #lowframe
          E div id="topbar"
            E div class="container"
              E div class="top-heading" id="heading-wide"
                E a href="/"
                  T The Go Programming Language
              E div class="top-heading" id="heading-narrow"
                E a href="/"
(1)               T Go
              E a href="#" id="menu-button"
                E span id="menu-button-arrow"
                  T ▽
              E form method="GET" action="/search"
                E div id="menu"
(2)               E a href="/doc/"
                    T Documents
(3)               E a href="/pkg/"
                    T Packages
(4)               E a href="/project/"
                    T The Project
(5)               E a href="/help/"
                    T Help
(6)               E a href="/blog/"
                    T Blog
(7)               E a href="http://play.golang.org/" ...
                    T Play
                  E input type="text" id="search" name="q" ...

This is actually a section of the golang.org front page and demonstrates the Print function in this package. Some of the nodes are numbered and referred to below.

The following call to Find will return the nodes numbered (2) -> (7) in a slice:

Find(root, `<form><div><a>`)

This is because tracing these nodes down from the root, they end with the three element nodes <form>, <div> and <a>. It does not matter that the <div> specified is missing the id="menu" attribute. All that matters are that its attributes (none) are a subset of those in the tree. If, however, we were to use:

Find(root, `<form><div id="someotherid"><a>`)

we would get no results since the id="someotherid" does not match in the tree.

Another example. Calling:

Find(root, `<a href=/>Go`)

returns node (1), so you can pick out non-element nodes too.

A note on fragments

The fragment passed to Find has to parse in the context of having a generic element node as its parent. So it is fine to call:

Find(root, `<table><tr><td>`)

on some document, but if you were to instead use:

Find(root, `<tr><td>`)

you would get an empty slice, since the fragment `<tr><td>` is not valid directly under an arbitrary element node and will not parse. However, it should always be possible to specify more parent nodes in the fragment, even if they are not within the tree being searched.

To illustrate this, suppose you have a subtree that looks like this

E tr
  E td
  E td
  E td
  E td

and you want to call Find to get the <td> elements. You cannot use

Find(subtree, `<td>`)

but it is still OK to use

Find(subtree, `<table><tr><td>`)

even though there is no <table> in subtree. The matcher will look look in subtree's parents.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Attr

func Attr(n *html.Node, key string) (string, bool)

Attr returns the Val field of the first attribute in n.Attr whose Key field is equal to key. The second return value indicates if the node has such an attribute. If no such attribute exists Attr returns ("",false). Note that the Namespace fields of n.Attr are not compared.

func AttrNS

func AttrNS(n *html.Node, namespace, key string) (string, bool)

AttrNS is like Attr but additionally compares the Namespace fields in the attributes.

func Compare

func Compare(n1, n2 *html.Node) bool

Compare returns true if node n1 has the same Type, Data and Namespace fields as n2, and if the attributes of n2 are equal to or are a subset of the attributes of n1.

func Find

func Find(root *html.Node, fragment string) []*html.Node

Find is for locating nodes matching fragment within root. It first converts fragment into a leaf node (call it n2) using the Leaf function. It then does a depth first search of root and returns the slice of all nodes n in root which satisfy Match(n,n2). If there are no such nodes it returns the empty slice.

Please note that fragment must parse in the context of having a generic element node as its parent, since it is passed to Leaf. See "A note on fragments" in the introduction for more details.

func Flatten

func Flatten(root *html.Node) string

Flatten walks the tree under root finding all html.TextNodes and returns the string resulting from appending all their Data fields.

func Leaf

func Leaf(fragment string) *html.Node

Leaf converts an HTML fragment into a parse tree (without html/head/body ElementNodes or DoctypeNode), and then from the root of this tree repeatedly follows FirstChild until it finds a leaf node. This leaf node is returned as its result. In order to parse fragment, Leaf calls html.ParseFragment with a context of html.Node{Type: html.ElementNode}. If there is an error parsing fragment or no nodes are returned then Leaf returns a node of type html.ErrorNode. The return value of Leaf is intended to be passed to Match as its second argument.

func Match

func Match(n1 *html.Node, n2 *html.Node) bool

Match compares the slice of nodes obtained by tracing n1's root node down to n1 with the equivalent slice obtained by tracing n2's root down to n2. Call these slices ns1 and ns2. If the tail of ns1 matches ns2 with respect to Compare then Match returns true.

func Next

func Next(n *html.Node, root *html.Node) (*html.Node, int)

Next returns the next node in a depth first traversal of the tree at root (where the current node is node n), together with a delta indicating by how much it has descended or ascended the tree (descending being positive). When there are no more nodes it returns nil. If a value of nil is supplied for root it is assumed that the root node is the first node encountered with Parent == nil.

func NextSibElt

func NextSibElt(n *html.Node) *html.Node

NextSibElt returns the next sibling of node n with type html.ElementNode (or nil if no such sibling).

func Prev

func Prev(n *html.Node, root *html.Node) (*html.Node, int)

Prev behaves like Next, but returns the previous node instead.

func PrevSibElt

func PrevSibElt(n *html.Node) *html.Node

PrevSibElt behaves like NextSibElt, but returns the previous sibling html.ElementNode instead.

func Print

func Print(root *html.Node) error

Print calls PrintTree, using os.Stdout as the io.Writer and with colour set to true.

func PrintTree

func PrintTree(w io.Writer, root *html.Node, colour bool) error

PrintTree prints the tree at root to the supplied io.Writer using String to print the nodes. It uses indention to convey the document structure. Like String, it can optionally colourize the output. It skips printing whitespace-only nodes of type html.TextNode.

PrintTree returns any error it gets when calling fmt.Fprintf.

func String

func String(n *html.Node, colour bool) string

String returns a human readable representation of the single node n, with optional terminal colouring using ANSI escape codes. The representation begins with a capital letter indicating the html.NodeType. These are one of: X - ErrorNode, T - TextNode, R - DocumentNode, E - ElementNode, C - CommentNode, D - DoctypeNode.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL