surf

package module
v2.0.0-...-f6105f7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2017 License: MIT Imports: 18 Imported by: 0

README

Surf v2.0

Build Status GoDoc Documentation MIT License

Surf is a Go (golang) library that implements a virtual web browser that you control pragmatically. Surf isn't just another Go solution for downloading content from the web. Surf is designed to behave like web browser, and includes: cookie management, history, bookmarking, user agent spoofing (with a nifty user agent builder), submitting forms, DOM selection and traversal via jQuery style CSS selectors, scraping assets like images, stylesheets, and other features.

Installation

Download the library using go.
go get gopkg.in/headzoo/surf.v2

Import the library into your project.
import "gopkg.in/headzoo/surf.v2"

Quick Start
package main

import (
	"gopkg.in/headzoo/surf.v2"
	"fmt"
)

func main() {
	bow := surf.NewBrowser()
	err := bow.Open("http://golang.org")
	if err != nil {
		panic(err)
	}
	
	// Outputs: "The Go Programming Language"
	fmt.Println(bow.Title())
}
Documentation

Complete documentation is available on Read the Docs.

Credits

Surf was started by Sean Hickey (headzoo) to learn more about the Go programming language. The idea to create Surf was born in this Reddit thread.

Twitter

Surf uses the awesome goquery by Martin Angers, and was written using Intellij and the golang plugin.

Contributions have been made to Surf by the following awesome developers:

Contributing

Issues and pull requests are always welcome. Code changes are made to the dev branch. Once a milestone has been reached the branch will be merged in with master, and a new version tag created. Do not make your changes against the master branch, or they will may be ignored.

See CONTRIBUTING.md for more information.

License

Surf is released open source software released under The MIT License (MIT). See LICENSE.md for more information.

Documentation

Index

Constants

View Source
const (
	OnError    = "error"
	OnLoad     = "load"
	OnUnload   = "unload"
	OnRequest  = "request"
	OnResponse = "response"
)
View Source
const (
	Name    = "Surf"
	Version = "2.0"
)
View Source
const (
	DefaultSendReferer         = true
	DefaultMetaRefreshHandling = true
	DefaultFollowRedirects     = true
	DefaultNumDownloadWorkers  = 4
)
View Source
const (
	HeaderUserAgent   = "User-Agent"
	HeaderHost        = "Host"
	HeaderReferer     = "Referer"
	HeaderContentType = "Content-Type"
)
View Source
const (
	MethodGET  = "GET"
	MethodPOST = "POST"
	MethodHEAD = "HEAD"
)
View Source
const (
	ContentTypeTextPlain = "text/plain"
	ContentTypeTextHtml  = "text/html"
)

Variables

View Source
var (
	// Debugging turns debugging messages on and off.
	Debugging bool = false

	// UserAgent is the User-Agent header value sent with requests.
	UserAgent string = agent.Create(Name, Version)

	// JarState is the current browser state.
	JarState *jar.State = &jar.State{}

	// JarCookies stores cookies for every site visited by the browser.
	JarCookies http.CookieJar = jar.NewMemoryCookies()

	// JarBookmarks stores the saved bookmarks.
	JarBookmarks jar.BookmarksJar = jar.NewMemoryBookmarks()

	// JarHistory stores the visited pages.
	JarHistory jar.History = jar.NewMemoryHistory()

	// transport specifies the mechanism by which individual HTTP
	// requests are made.
	Transport http.RoundTripper

	// RequestHeaders are additional headers to send with each request.
	RequestHeaders http.Header = jar.NewMemoryHeaders()

	// NumDownloadWorkers is the number of workers to download page assets.
	NumDownloadWorkers int = DefaultNumDownloadWorkers

	// SendReferer instructs a Browser to send the Referer header.
	SendReferer bool = DefaultSendReferer

	// MetaRefreshHandling instructs a Browser to handle the refresh meta tag.
	MetaRefreshHandling bool = DefaultMetaRefreshHandling

	// FollowRedirects instructs a Browser to follow Location headers.
	FollowRedirects bool = DefaultFollowRedirects
)

Functions

func DownloadElement

func DownloadElement(asset DownloadableElement, out io.Writer) (int64, error)

DownloadElement copies a remote file to the given writer.

func DownloadElementAsync

func DownloadElementAsync(asset DownloadableElement, out io.Writer, c AsyncDownloadChannel)

DownloadElementAsync downloads an element asynchronously and notifies the given channel when the download is complete.

Types

type AnchorElement

type AnchorElement struct {
	BaseElement

	// Text is the text appearing between the opening and closing anchor tag.
	Text string
}

AnchorElement stores the properties of a page link.

func NewAnchorElement

func NewAnchorElement(u *gourl.URL, id, text string) *AnchorElement

NewAnchorElement creates and returns a new *AnchorElement instance.

type AsyncDownloadChannel

type AsyncDownloadChannel chan *AsyncDownloadResult

AsyncDownloadChannel is a channel upon which the results of an async download are passed.

type AsyncDownloadResult

type AsyncDownloadResult struct {
	// Element is a pointer to the Downloadable asset that was downloaded.
	Element DownloadableElement

	// Writer where the asset data was written.
	Writer io.Writer

	// Size is the number of bytes written to the io.Writer.
	Size int64

	// Error contains any error that occurred during the download or nil.
	Error error
}

AsyncDownloadResult has the results of an asynchronous download.

type BaseDownloadableElement

type BaseDownloadableElement struct {
	BaseElement
}

BaseDownloadableElement is an element that may be downloaded.

func (*BaseDownloadableElement) Download

func (at *BaseDownloadableElement) Download(out io.Writer) (int64, error)

Download writes the element to the given io.Writer type.

func (*BaseDownloadableElement) DownloadAsync

func (at *BaseDownloadableElement) DownloadAsync(out io.Writer, ch AsyncDownloadChannel)

DownloadAsync downloads the element asynchronously.

type BaseElement

type BaseElement struct {
	// contains filtered or unexported fields
}

BaseElement implements Element.

func (*BaseElement) ID

func (at *BaseElement) ID() string

ID returns the asset ID or an empty string when not available.

func (*BaseElement) Type

func (at *BaseElement) Type() ElementType

Type returns the asset type.

func (*BaseElement) URL

func (at *BaseElement) URL() *gourl.URL

URL returns the asset URL.

type Browser

type Browser struct {
	*EventTarget

	Document  *Document
	Navigator *Navigator
	Location  *gourl.URL
	Headers   http.Header
	Response  *http.Response
	// contains filtered or unexported fields
}

Browser...

func NewBrowser

func NewBrowser() *Browser

NewBrowser returns a new *Browser instance.

func (*Browser) Back

func (b *Browser) Back() bool

Back loads the previously requested page.

func (*Browser) BookmarkOpen

func (b *Browser) BookmarkOpen(name string) error

BookmarkOpen calls SendGET() with the URL for the bookmark with the given name.

func (*Browser) BookmarkSave

func (b *Browser) BookmarkSave(name string) error

BookmarkSave saves the page URL in the bookmarks with the given name.

func (*Browser) History

func (b *Browser) History() jar.History

History returns the browser history. See https://developer.mozilla.org/en-US/docs/Web/API/Window/history

func (*Browser) Reload

func (b *Browser) Reload() error

Reload duplicates the last successful request.

func (*Browser) SavePage

func (b *Browser) SavePage(dir string, perm os.FileMode) (saveFile string, errs []error)

SavePage the current page and all assets to the given directory.

func (*Browser) SendFormGET

func (b *Browser) SendFormGET(url string, data gourl.Values) error

SendFormGET appends the data values to the given URL and sends a GET request.

func (*Browser) SendFormPOST

func (b *Browser) SendFormPOST(url string, data gourl.Values) error

SendFormPOST requests the given URL using the POST method with the given data.

func (*Browser) SendGET

func (b *Browser) SendGET(url string) error

SendGET requests the given URL using the GET method.

func (*Browser) SendHEAD

func (b *Browser) SendHEAD(url string) error

SendHEAD requests the given URL using the HEAD method.

func (*Browser) SendMultipartPOST

func (b *Browser) SendMultipartPOST(u string, fields gourl.Values, files FileSet) error

SendMultipartPOST requests the given URL using the POST method with the given data using multipart/form-data format.

func (*Browser) SendPOST

func (b *Browser) SendPOST(url string, contentType string, body io.Reader) error

SendPOST requests the given URL using the POST method.

type Document

type Document struct {
	*EventTarget

	// Location stores the current url.
	Location *gourl.URL
	// contains filtered or unexported fields
}

Document stores the details of the current browser document.

func NewDocument

func NewDocument(b *Browser) *Document

NewDocument returns a *Document instance.

func (*Document) Anchors

func (doc *Document) Anchors() []*AnchorElement

Anchors returns an array of every anchor tag found in the page. See https://developer.mozilla.org/en-US/docs/Web/API/Document/anchors

func (*Document) Body

func (doc *Document) Body() *goquery.Selection

Body returns the page body as a string of html. See https://developer.mozilla.org/en-US/docs/Web/API/Document/body

func (*Document) CharacterSet

func (doc *Document) CharacterSet() string

CharacterSet returns the document character set, e.g. "utf-8". See https://developer.mozilla.org/en-US/docs/Web/API/Document/characterSet

func (*Document) Click

func (doc *Document) Click(expr string) error

Click clicks on the page element matched by the given expression.

func (*Document) ContentType

func (doc *Document) ContentType() string

Content type returns the document content type, e.g. "text/html". See https://developer.mozilla.org/en-US/docs/Web/API/Document/contentType

func (*Document) Cookie

func (doc *Document) Cookie() []*http.Cookie

Cookie returns the cookies for the document.

func (*Document) Form

func (doc *Document) Form(expr string) (Submittable, error)

Form returns the form in the current page that matches the given expr.

func (*Document) Forms

func (doc *Document) Forms() []Submittable

Forms returns an array of every form in the page. See https://developer.mozilla.org/en-US/docs/Web/API/Document/forms

func (*Document) GetElementByID

func (doc *Document) GetElementByID(id string) *goquery.Selection

GetElementById returns a reference to the element by its ID. See https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementById

func (*Document) GetElementsByClassName

func (doc *Document) GetElementsByClassName(className string) *goquery.Selection

GetElementsByClassName returns every element with the given class. See https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementsByClassName

func (*Document) GetElementsByName

func (doc *Document) GetElementsByName(name string) *goquery.Selection

GetElementsByName returns the elements with the given name. See https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementsByName

func (*Document) GetElementsByTagName

func (doc *Document) GetElementsByTagName(name string) *goquery.Selection

GetElementsByTagName returns the elements with the given tag. See https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementsByTagName

func (*Document) Images

func (doc *Document) Images() []*ImageElement

Images returns an array of every image found in the page. See https://developer.mozilla.org/en-US/docs/Web/API/Document/images

func (*Document) InnerHTML

func (doc *Document) InnerHTML() string

InnerHTML returns the document html.

func (*Document) QuerySelector

func (doc *Document) QuerySelector(selector string) *goquery.Selection

QuerySelector returns the first element matching the given selector. See https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector

func (*Document) QuerySelectorAll

func (doc *Document) QuerySelectorAll(selector string) *goquery.Selection

QuerySelectorAll returns all of the elements matching the given selector. See https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll

func (*Document) Scripts

func (doc *Document) Scripts() []*ScriptElement

Scripts returns an array of every script linked to the document. See https://developer.mozilla.org/en-US/docs/Web/API/Document/scripts

func (*Document) Stylesheets

func (doc *Document) Stylesheets() []*StylesheetElement

Stylesheets returns an array of every stylesheet linked to the document. See https://developer.mozilla.org/en-US/docs/Web/API/Document/styleSheets

func (*Document) Title

func (doc *Document) Title() string

Title returns the page title. See https://developer.mozilla.org/en-US/docs/Web/API/Document/title

func (*Document) Write

func (doc *Document) Write(w io.Writer) (int64, error)

Write writes the contents of the document to the given writer.

type DownloadableElement

type DownloadableElement interface {
	Element

	// Download writes the contents of the element to the given writer.
	//
	// Returns the number of bytes written.
	Download(out io.Writer) (int64, error)

	// DownloadAsync downloads the contents of the element asynchronously.
	//
	// An instance of AsyncDownloadResult will be sent down the given channel
	// when the download is complete.
	DownloadAsync(out io.Writer, ch AsyncDownloadChannel)
}

DownloadableElement represents an element that may be downloaded.

type Element

type Element interface {
	// URL returns the asset URL.
	URL() *gourl.URL

	// ID returns the asset ID or an empty string when not available.
	ID() string

	// Type returns the type of element.
	Type() ElementType
}

Element represents a page element, such as an image or stylesheet.

type ElementType

type ElementType uint16

ElementType describes a type of page element, such as an image or stylesheet.

const (
	// ElementTypeLink describes a *Link element.
	ElementTypeLink ElementType = iota

	// ElementTypeImage describes an *Image element.
	ElementTypeImage

	// ElementTypeStylesheet describes a *Stylesheet element.
	ElementTypeStylesheet

	// ElementTypeScript describes a *Script element.
	ElementTypeScript
)

type Event

type Event struct {
	Name   string
	Target interface{}
	Args   *EventArgs
}

Event stores the details of an event.

type EventArgValues

type EventArgValues map[string]interface{}

type EventArgs

type EventArgs struct {
	Values EventArgValues
	Error  error
	// contains filtered or unexported fields
}

EventArgs stores arguments to an event.

func NewEventArgs

func NewEventArgs(values EventArgValues) *EventArgs

NewEventArgs returns a new *EventArgs instance.

func (*EventArgs) GetBool

func (a *EventArgs) GetBool(key string) bool

GetBool returns the value at key as a bool.

func (*EventArgs) GetFloat64

func (a *EventArgs) GetFloat64(key string) float64

GetFloat64 returns the value at key as a float64.

func (*EventArgs) GetInt

func (a *EventArgs) GetInt(key string) int

GetInt returns the value at key as an int.

func (*EventArgs) GetInt64

func (a *EventArgs) GetInt64(key string) int64

GetInt64 returns the value at key as an int64.

func (*EventArgs) GetString

func (a *EventArgs) GetString(key string) string

GetString returns the value at key as a string.

func (*EventArgs) IsDefaultPrevented

func (a *EventArgs) IsDefaultPrevented() bool

IsDefaultPrevented returns true when PreventDefault() has been called.

func (*EventArgs) IsStopped

func (a *EventArgs) IsStopped() bool

IsStopped returns true when StopPropagation() has been called.

func (*EventArgs) PreventDefault

func (a *EventArgs) PreventDefault()

Cancels the event if it is cancelable, without stopping further dispatching of the event.

func (*EventArgs) StopPropagation

func (a *EventArgs) StopPropagation()

StopPropagation prevents further dispatching of the event.

type EventListenerFunc

type EventListenerFunc func(e *Event)

type EventTarget

type EventTarget struct {
	// contains filtered or unexported fields
}

EventTarget represents an object which dispatches events.

func NewEventTarget

func NewEventTarget() *EventTarget

NewEventTarget returns a *EventTarget instance.

func (*EventTarget) AddEventListener

func (t *EventTarget) AddEventListener(event string, fn EventListenerFunc)

AddEventListener registers a listener on the target.

func (*EventTarget) DispatchEvent

func (t *EventTarget) DispatchEvent(event string, target interface{}, args *EventArgs) error

DispatchEvent dispatches the given event to any registered listeners.

func (*EventTarget) RemoveEventListener

func (t *EventTarget) RemoveEventListener(event string, fn EventListenerFunc)

RemoveEventListener removes a registered listener on the target.

type File

type File struct {
	// contains filtered or unexported fields
}

File represents a input type file, that includes the fileName and a io.reader

type FileSet

type FileSet map[string]*File

FileSet represents a map of files used to port multipart

type Form

type Form struct {
	// contains filtered or unexported fields
}

Form is the default form element.

func NewForm

func NewForm(b Browser, s *goquery.Selection) *Form

NewForm creates and returns a *Form type.

func (*Form) Action

func (f *Form) Action() string

Action returns the form action URL. The URL will always be absolute.

func (*Form) Click

func (f *Form) Click(button string) error

Click submits the form by clicking the button with the given name.

func (*Form) ClickByValue

func (f *Form) ClickByValue(name, value string) error

Click submits the form by clicking the button with the given name and value.

func (*Form) Dom

func (f *Form) Dom() *goquery.Selection

Dom returns the inner *goquery.Selection.

func (*Form) File

func (f *Form) File(name string, fileName string, data io.Reader) error

File sets the value for an form input type file, it returns an ElementNotFound error if the field does not exists

func (*Form) Input

func (f *Form) Input(name, value string) error

Input sets the value of a form field. it returns an ElementNotFound error if the field does not exists

func (*Form) Method

func (f *Form) Method() string

Method returns the form method, eg "GET" or "POST".

func (*Form) Set

func (f *Form) Set(name, value string) error

Set will set the value of a form field if it exists, or create and set it if it does not.

func (*Form) SetFile

func (f *Form) SetFile(name string, fileName string, data io.Reader)

SetFile sets the value for an form input type file, It adds the field to the form if necessary

func (*Form) Submit

func (f *Form) Submit() error

Submit submits the form. Clicks the first button in the form, or submits the form without using any button when the form does not contain any buttons.

type ImageElement

type ImageElement struct {
	BaseDownloadableElement

	// Alt is the value of the image alt attribute if available.
	Alt string

	// Title is the value of the image title attribute if available.
	Title string
}

ImageElement stores the properties of an image.

func NewImageElement

func NewImageElement(url *gourl.URL, id, alt, title string) *ImageElement

NewImageElement creates and returns a new *ImageElement instance.

type Navigator struct {
}

Navigator represents the state and the identity of the user agent. See https://developer.mozilla.org/en-US/docs/Web/API/Navigator

func NewNavigator

func NewNavigator() *Navigator

NewNavigator returns a *Navigator instance.

func (n *Navigator) AppName() string

AppName returns the name of the browser. See https://developer.mozilla.org/en-US/docs/Web/API/NavigatorID/appName

func (n *Navigator) AppVersion() string

AppVersion returns the browser version. See https://developer.mozilla.org/en-US/docs/Web/API/NavigatorID/appVersion

func (n *Navigator) UserAgent() string

UserAgent returns the browser user agent. See https://developer.mozilla.org/en-US/docs/Web/API/NavigatorID/userAgent

type ScriptElement

type ScriptElement struct {
	BaseDownloadableElement

	// Type is the value of the type attribute. Defaults to "text/javascript" when not specified.
	TypeAttr string
}

ScriptElement stores the properties of a linked script.

func NewScriptElement

func NewScriptElement(url *gourl.URL, id, typ string) *ScriptElement

NewScriptElement creates and returns a new *ScriptElement instance.

type StylesheetElement

type StylesheetElement struct {
	BaseDownloadableElement

	// Media is the value of the media attribute. Defaults to "all" when not specified.
	Media string

	// TypeAttr is the value of the type attribute. Defaults to "text/css" when not specified.
	TypeAttr string
}

StylesheetElement stores the properties of a linked stylesheet.

func NewStylesheetElement

func NewStylesheetElement(url *gourl.URL, id, media, typ string) *StylesheetElement

NewStylesheetElement creates and returns a new *StylesheetElement instance.

type Submittable

type Submittable interface {
	Method() string
	Action() string
	Input(name, value string) error
	Set(name, value string) error
	File(name string, fileName string, data io.Reader) error
	SetFile(name string, fileName string, data io.Reader)
	Click(button string) error
	ClickByValue(name, value string) error
	Submit() error
	Dom() *goquery.Selection
}

Submittable represents an element that may be submitted, such as a form.

Directories

Path Synopsis
Package agent generates user agents strings for well known browsers and for custom browsers.
Package agent generates user agents strings for well known browsers and for custom browsers.
Package errors contains error types specific to the Surf library.
Package errors contains error types specific to the Surf library.
Package jar has containers for storing data, such as bookmarks and cookies.
Package jar has containers for storing data, such as bookmarks and cookies.
Package util contains some utility methods used by other packages.
Package util contains some utility methods used by other packages.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL