scrape: github.com/yhat/scrape Index | Files | Directories

package scrape

import "github.com/yhat/scrape"

Package scrape provides a searching api on top of golang.org/x/net/html

Index

Package Files

scrape.go

func Attr Uses

func Attr(node *html.Node, key string) string

Attr returns the value of an HTML attribute.

func Find Uses

func Find(node *html.Node, matcher Matcher) (n *html.Node, ok bool)

Find returns the first node which matches the matcher using depth-first search. If no node is found, ok will be false.

root, err := html.Parse(resp.Body)
if err != nil {
    // handle error
}
matcher := func(n *html.Node) bool {
    return n.DataAtom == atom.Body
}
body, ok := scrape.Find(root, matcher)

func FindAll Uses

func FindAll(node *html.Node, matcher Matcher) []*html.Node

FindAll returns all nodes which match the provided Matcher. After discovering a matching node, it will _not_ discover matching subnodes of that node.

func FindAllNested Uses

func FindAllNested(node *html.Node, matcher Matcher) []*html.Node

FindAllNested returns all nodes which match the provided Matcher and _will_ discover matching subnodes of matching nodes.

func FindNextSibling Uses

func FindNextSibling(node *html.Node, matcher Matcher) (n *html.Node, ok bool)

Find returns the first node which matches the matcher using next sibling search. If no node is found, ok will be false.

root, err := html.Parse(resp.Body)
if err != nil {
    // handle error
}
matcher := func(n *html.Node) bool {
    return n.DataAtom == atom.Body
}
body, ok := scrape.FindNextSibling(root, matcher)

func FindParent Uses

func FindParent(node *html.Node, matcher Matcher) (n *html.Node, ok bool)

FindParent searches up HTML tree from the current node until either a match is found or the top is hit.

func FindPrevSibling Uses

func FindPrevSibling(node *html.Node, matcher Matcher) (n *html.Node, ok bool)

Find returns the first node which matches the matcher using previous sibling search. If no node is found, ok will be false.

root, err := html.Parse(resp.Body)
if err != nil {
    // handle error
}
matcher := func(n *html.Node) bool {
    return n.DataAtom == atom.Body
}
body, ok := scrape.FindPrevSibling(root, matcher)

func Text Uses

func Text(node *html.Node) string

Text returns text from all descendant text nodes joined. For control over the join function, see TextJoin.

func TextJoin Uses

func TextJoin(node *html.Node, join func([]string) string) string

TextJoin returns a string from all descendant text nodes joined by a caller provided join function.

type Matcher Uses

type Matcher func(node *html.Node) bool

Matcher should return true when a desired node is found.

func ByClass Uses

func ByClass(class string) Matcher

ByClass returns a Matcher which matches all nodes with the provided class.

func ById Uses

func ById(id string) Matcher

ById returns a Matcher which matches all nodes with the provided id.

func ByTag Uses

func ByTag(a atom.Atom) Matcher

ByTag returns a Matcher which matches all nodes of the provided tag type.

root, err := html.Parse(resp.Body)
if err != nil {
    // handle error
}
title, ok := scrape.Find(root, scrape.ByTag(atom.Title))

Directories

PathSynopsis
example

Package scrape imports 3 packages (graph) and is imported by 24 packages. Updated 2017-07-10. Refresh now. Tools for package owners.