soup

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2023 License: MIT Imports: 4 Imported by: 0

README

Soup

Soup is a tiny library for working with HTML. It provides a simple API for extracting data.

Quick Start

Install dependencies.

go get github.com/cfichtmueller/soup

Extract some data.

// Load a web page
res, err := http.Get("https://example.com")
if err != nil {
	return err
}

// Parse the page
p, err := soup.Parse(res)
if err != nil {
	return err
}

// Extract data from the page
products := p.AllWithClassNameR("product")
for _, product := range products {
	link := product.FirstWithTag("a")
	if link != nil {
        name := link.TextContent()
		url := link.Attr("href")
		fmt.Println(name, ":", url)
    }
}

The name is inspired by jsoup.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AllWithClassName

func AllWithClassName(node *html.Node, className string) []*html.Node

AllWithClassName returns all children that have the given class name.

func AllWithClassNameR

func AllWithClassNameR(node *html.Node, className string) []*html.Node

AllWithClassNameR is the recursive variant of AllWithClassName

func AllWithTag

func AllWithTag(node *html.Node, tagName string) []*html.Node

AllWithTag returns all children with the given tag.

func AllWithTagR

func AllWithTagR(node *html.Node, tagName string) []*html.Node

AllWithTagR is the recursive variant of AllWithTag

func Attr

func Attr(node *html.Node, attr string) string

Attr returns the attribute value or an empty string if the attribute isn't found.

func FirstWithClassName

func FirstWithClassName(node *html.Node, className string) *html.Node

FirstWithClassName returns the first child with the given class.

func FirstWithClassNameR

func FirstWithClassNameR(node *html.Node, className string) *html.Node

FirstWithClassNameR is the recursive variant of FirstWithClassName.

func FirstWithId

func FirstWithId(node *html.Node, id string) *html.Node

FirstWithId returns the first child with the given id.

func FirstWithIdR

func FirstWithIdR(node *html.Node, id string) *html.Node

FirstWithIdR is the recursive variant of FirstWithId

func FirstWithTag

func FirstWithTag(node *html.Node, tagName string) *html.Node

FirstWithTag returns the first child with the given tag name

func FirstWithTagR

func FirstWithTagR(node *html.Node, tagName string) *html.Node

FirstWithTagR is the recursive variant of FirstWithTag

func HasClass

func HasClass(node *html.Node, className string) bool

func SelectAll

func SelectAll(node *html.Node, selector Selector) []*html.Node

SelectAll selects all child node that match the given Selector

func SelectFirst

func SelectFirst(node *html.Node, selector Selector) *html.Node

SelectFirst selects the first child node that matches the given selector

func TextContent

func TextContent(node *html.Node) string

Types

type Node

type Node struct {
	// contains filtered or unexported fields
}

func Parse

func Parse(r io.Reader) (*Node, error)

Parse parses a node from a reader

func (*Node) AllWithClassName

func (n *Node) AllWithClassName(className string) []*Node

AllWithClassName returns all child nodes that have the given class name.

func (*Node) AllWithClassNameR

func (n *Node) AllWithClassNameR(className string) []*Node

AllWithClassNameR is the recursive variant of AllWithClassName.

func (*Node) AllWithTag

func (n *Node) AllWithTag(tagName string) []*Node

AllWithTag returns all child nodes with the given tag.

func (*Node) AllWithTagR

func (n *Node) AllWithTagR(tagName string) []*Node

AllWithTagR is the recursive variant of AllWithTag

func (*Node) Attr

func (n *Node) Attr(attr string) string

Attr returns the attribute value or an empty string if the attribute isn't found.

func (*Node) FirstWithClassName

func (n *Node) FirstWithClassName(className string) *Node

FirstWithClassName returns the first child with the given class.

func (*Node) FirstWithClassNameR

func (n *Node) FirstWithClassNameR(className string) *Node

FirstWithClassNameR is the recursive variant of FirstWithClassName.

func (*Node) FirstWithId

func (n *Node) FirstWithId(id string) *Node

FirstWithId returns the first child with the given id.

func (*Node) FirstWithIdR

func (n *Node) FirstWithIdR(id string) *Node

FirstWithIdR is the recursive variant of FirstWithId.

func (*Node) FirstWithTag

func (n *Node) FirstWithTag(tag string) *Node

FirstWithTag returns the first child node with the given tag.

func (*Node) FirstWithTagR

func (n *Node) FirstWithTagR(tag string) *Node

FirstWithTagR is the recursive variant of FirstWithTag.

func (*Node) HasClass

func (n *Node) HasClass(className string) bool

HasClass returns true if the node has the given class

func (*Node) SelectAll

func (n *Node) SelectAll(selector Selector) []*Node

SelectAll selects all child node that match the given Selector.

func (*Node) SelectFirst

func (n *Node) SelectFirst(selector Selector) *Node

SelectFirst selects the first child node that matches the given selector.

func (*Node) String

func (n *Node) String() string

func (*Node) TextContent

func (n *Node) TextContent() string

TextContent returns the text content of the node

type Selector

type Selector struct {
	// Selects an element with a given id. Takes precedence over ClassName
	Id string
	// Selects an element with a given class. Takes precedence over Tag
	ClassName string
	// Selects an element with a given tag
	Tag string
	// Perform a recursive search. That is, include the node's children in the search.
	Recursive bool
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL