htmlparsing

package module
v0.0.0-...-fdc9217 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2016 License: MIT Imports: 14 Imported by: 0

README

htmlparsing

GoDoc MIT licensed

A convenience wrapper around the gokogiri library.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BreakSimpleCaptcha

func BreakSimpleCaptcha(image io.Reader) (string, error)

BreakSimpleCaptcha uses the tesseract OCR command to recognise text in a simple captcha

func DumpHTML

func DumpHTML(node xml.Node, filename string)

DumpHTML dumps a html node into a file. Panics on errors.

func First

func First(node xml.Node, expression string) (xml.Node, error)

First returns the first child node of node which matches expression

func URLValues

func URLValues(parameters map[string]string) url.Values

URLValues converts a string map to URL values suitable for form submission

Types

type Client

type Client struct {
	pester.Client
	// contains filtered or unexported fields
}

Client wraps a http client with error handling and retries

func NewClient

func NewClient(settings *Settings) *Client

NewClient initialises a client from the specified settings

func NewCookiedClient

func NewCookiedClient(settings *Settings) (*Client, error)

NewCookiedClient initialises a client with a cookie jar. It will store cookies between requests.

func (*Client) OpenPage

func (client *Client) OpenPage(
	url string, formData url.Values,
) ([]byte, error)

OpenPage reads the web page at the given url. It performs a GET request if formData is nil, and a POST request otherwise.

func (*Client) ParsePage

func (client *Client) ParsePage(
	url string, formData url.Values,
) (*htmlParser.HtmlDocument, error)

ParsePage parses a html page at the given URL. It performs a GET request if formData is nil, and a POST request otherwise.

func (*Client) ParsePageWithEncoding

func (client *Client) ParsePageWithEncoding(
	url string, formData url.Values, encoding []byte,
) (*htmlParser.HtmlDocument, error)

ParsePage parses a html page at the given URL. It performs a GET request if formData is nil, and a POST request otherwise. Uses the specified encoding to decode the given page

type Settings

type Settings struct {
	Transport                http.RoundTripper
	Timeout                  time.Duration
	MaxHttpRetries           int
	MaxServerErrorRetries    int
	HttpRetryInterval        time.Duration
	ServerErrorRetryInterval time.Duration
	Encoding                 []byte
}

Settings contains settings for making http connections

func SensibleSettings

func SensibleSettings() *Settings

SensibleSettings returns a Settings object initialised with sensible defaults

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL