phantomjs

package module
v0.0.0-...-6499a20 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2018 License: MIT Imports: 11 Imported by: 3

README

deprecation warning

active phantomjs development has ended, in favor of using Chrome's new headless functionality (reference). Instead of using this library, consider using a go package that uses this new api such as chromedp.

phantomjs godoc Status

This is a Go wrapper for the phantomjs command line program. It provides the full webpage API and has a strongly typed API. The wrapper provides an idiomatic Go interface while allowing you to communicate with the underlying WebKit and JavaScript engine in a seamless way.

Installing

First, install phantomjs on your machine. This can be done using your package manager (such as apt-get or brew). Then install this package using the Go toolchain:

$ go get -u github.com/benbjohnson/phantomjs

Usage

Starting the process

This wrapper works by communicating with a separate phantomjs process over HTTP. The process can take several seconds to start up and shut down so you should do that once and then share the process. There is a package-level variable called phantomjs.DefaultProcess that exists for this purpose.

package main

import (
	"github.com/benbjohnson/phantomjs"
)

func main() {
	// Start the process once.
	if err := phantomjs.DefaultProcess.Open(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	defer phantomjs.DefaultProcess.Close()

	// Do other stuff in your program.
	doStuff()
}

You can have multiple processes, however, you will need to change the port used for each one so they do not conflict. This library uses port 20202 by default.

Working with WebPage

The WebPage will be the primary object you work with in phantomjs. Typically you will create a web page from a Process and then either open a URL or you can set the content directly:

// Create a web page.
// IMPORTANT: Always make sure you close your pages!
page, err := p.CreateWebPage()
if err != nil {
	return err
}
defer page.Close()

// Open a URL.
if err := page.Open("https://google.com"); err != nil {
	return err
}

The HTTP API uses a reference map to track references between the Go library and the phantomjs process. Because of this, it is important to always Close() your web pages or else you can experience memory leaks.

Executing JavaScript

You can synchronously execute JavaScript within the context of a web page by by using the Evaluate() function. This example below opens Hacker News, retrieves the text and URL from the first link, and prints it to the terminal.

// Open a URL.
if err := page.Open("https://news.ycombinator.com"); err != nil {
	return err
}

// Read first link.
info, err := page.Evaluate(`function() {
	var link = document.body.querySelector('.itemlist .title a');
	return { title: link.innerText, url: link.href };
}`)
if err != nil {
	return err
}

// Print title and URL.
link := info.(map[string]interface{})
fmt.Println("Hacker News Top Link:")
fmt.Println(link["title"])
fmt.Println(link["url"])
fmt.Println()

You can pass back any object from Evaluate() that can be marshaled over JSON.

Rendering web pages

Another common task with PhantomJS is to render a web page to an image. Once you have opened your web page, simply set the viewport size and call the Render() method:

// Open a URL.
if err := page.Open("https://news.ycombinator.com"); err != nil {
	return err
}

// Setup the viewport and render the results view.
if err := page.SetViewportSize(1024, 800); err != nil {
	return err
}
if err := page.Render("hackernews.png", "png", 100); err != nil {
	return err
}

You can also use the RenderBase64() to return a base64 encoded image to your program instead of writing the file to disk.

Documentation

Index

Constants

View Source
const (
	ShiftKey = 0x02000000
	CtrlKey  = 0x04000000
	AltKey   = 0x08000000
	MetaKey  = 0x10000000
	Keypad   = 0x20000000
)

Keyboard modifiers.

View Source
const (
	DefaultPort    = 20202
	DefaultBinPath = "phantomjs"
)

Default settings.

Variables

View Source
var DefaultProcess = NewProcess()

DefaultProcess is a global, shared process. It must be opened before use.

View Source
var (
	// ErrInjectionFailed is returned by InjectJS when injection fails.
	ErrInjectionFailed = errors.New("injection failed")
)

Functions

This section is empty.

Types

type OpenWebPageSettings

type OpenWebPageSettings struct {
	Method string `json:"method"`
}

OpenWebPageSettings represents the settings object passed to WebPage.Open().

type PaperSize

type PaperSize struct {
	// Dimensions of the paper.
	// This can also be specified via Format.
	Width  string
	Height string

	// Supported formats: "A3", "A4", "A5", "Legal", "Letter", "Tabloid".
	Format string

	// Margins around the paper.
	Margin *PaperSizeMargin

	// Supported orientations: "portrait", "landscape".
	Orientation string
}

PaperSize represents the size of a webpage when rendered as a PDF.

Units can be specified in "mm", "cm", "in", or "px". If no unit is specified then "px" is used.

type PaperSizeMargin

type PaperSizeMargin struct {
	Top    string
	Bottom string
	Left   string
	Right  string
}

PaperSizeMargin represents the margins around the paper.

type Position

type Position struct {
	Top  int
	Left int
}

Position represents a coordinate on the page, in pixels.

type Process

type Process struct {

	// Path to the 'phantomjs' binary.
	BinPath string

	// HTTP port used to communicate with phantomjs.
	Port int

	// Output from the process.
	Stdout io.Writer
	Stderr io.Writer
	// contains filtered or unexported fields
}

Process represents a PhantomJS process.

func NewProcess

func NewProcess() *Process

NewProcess returns a new instance of Process.

func (*Process) Close

func (p *Process) Close() (err error)

Close stops the process.

func (*Process) CreateWebPage

func (p *Process) CreateWebPage() (*WebPage, error)

CreateWebPage returns a new instance of a "webpage".

func (*Process) Open

func (p *Process) Open() error

Open start the phantomjs process with the shim script.

func (*Process) Path

func (p *Process) Path() string

Path returns a temporary path that the process is run from.

func (*Process) URL

func (p *Process) URL() string

URL returns the process' API URL.

type Rect

type Rect struct {
	Top    int
	Left   int
	Width  int
	Height int
}

Rect represents a rectangle used by WebPage.ClipRect().

type Ref

type Ref struct {
	// contains filtered or unexported fields
}

Ref represents a reference to an object in phantomjs.

func (*Ref) ID

func (r *Ref) ID() string

ID returns the reference identifier.

type WebPage

type WebPage struct {
	// contains filtered or unexported fields
}

WebPage represents an object returned from "webpage.create()".

func CreateWebPage

func CreateWebPage() (*WebPage, error)

CreateWebPage returns a new instance of a "webpage" using the default process.

func (*WebPage) AddCookie

func (p *WebPage) AddCookie(cookie *http.Cookie) (bool, error)

AddCookie adds a cookie to the page. Returns true if the cookie was successfully added.

func (*WebPage) CanGoBack

func (p *WebPage) CanGoBack() (bool, error)

CanGoBack returns true if the page can be navigated back.

func (*WebPage) CanGoForward

func (p *WebPage) CanGoForward() (bool, error)

CanGoForward returns true if the page can be navigated forward.

func (*WebPage) ClearCookies

func (p *WebPage) ClearCookies() error

ClearCookies deletes all cookies visible to the current URL.

func (*WebPage) ClipRect

func (p *WebPage) ClipRect() (Rect, error)

ClipRect returns the clipping rectangle used when rendering. Returns nil if no clipping rectangle is set.

func (*WebPage) Close

func (p *WebPage) Close() error

Close releases the web page and its resources.

func (*WebPage) Content

func (p *WebPage) Content() (string, error)

Content returns content of the webpage enclosed in an HTML/XML element.

func (*WebPage) Cookies

func (p *WebPage) Cookies() ([]*http.Cookie, error)

Cookies returns a list of cookies visible to the current URL.

func (*WebPage) CustomHeaders

func (p *WebPage) CustomHeaders() (http.Header, error)

CustomHeaders returns a list of additional headers sent with the web page.

func (*WebPage) DeleteCookie

func (p *WebPage) DeleteCookie(name string) (bool, error)

DeleteCookie removes a cookie with a matching name. Returns true if the cookie was successfully deleted.

func (*WebPage) Evaluate

func (p *WebPage) Evaluate(script string) (interface{}, error)

Evaluate executes a JavaScript function in the context of the web page. Returns the value returned by the function.

func (*WebPage) EvaluateAsync

func (p *WebPage) EvaluateAsync(script string, delay time.Duration) error

EvaluateAsync executes a JavaScript function and returns immediately. Execution is delayed by delay. No value is returned.

func (*WebPage) EvaluateJavaScript

func (p *WebPage) EvaluateJavaScript(script string) (interface{}, error)

EvaluateJavaScript executes a JavaScript function. Returns the value returned by the function.

func (*WebPage) FocusedFrameName

func (p *WebPage) FocusedFrameName() (string, error)

FocusedFrameName returns the name of the currently focused frame.

func (*WebPage) FrameContent

func (p *WebPage) FrameContent() (string, error)

FrameContent returns the content of the current frame.

func (*WebPage) FrameCount

func (p *WebPage) FrameCount() (int, error)

FrameCount returns the total number of frames.

func (*WebPage) FrameName

func (p *WebPage) FrameName() (string, error)

FrameName returns the name of the current frame.

func (*WebPage) FrameNames

func (p *WebPage) FrameNames() ([]string, error)

FrameNames returns an list of frame names.

func (*WebPage) FramePlainText

func (p *WebPage) FramePlainText() (string, error)

FramePlainText returns the plain text representation of the current frame content.

func (*WebPage) FrameTitle

func (p *WebPage) FrameTitle() (string, error)

FrameTitle returns the title of the current frame.

func (*WebPage) FrameURL

func (p *WebPage) FrameURL() (string, error)

FrameURL returns the URL of the current frame.

func (*WebPage) Go

func (p *WebPage) Go(index int) error

Go navigates to the page in history by relative offset. A positive index moves forward, a negative index moves backwards.

func (*WebPage) GoBack

func (p *WebPage) GoBack() error

GoBack navigates back to the previous page.

func (*WebPage) GoForward

func (p *WebPage) GoForward() error

GoForward navigates to the next page.

func (*WebPage) IncludeJS

func (p *WebPage) IncludeJS(url string) error

IncludeJS includes an external script from url. Returns after the script has been loaded.

func (*WebPage) InjectJS

func (p *WebPage) InjectJS(filename string) error

InjectJS injects an external script from the local filesystem.

The script will be loaded from the Process.Path() directory. If it cannot be found then it is loaded from the library path.

func (*WebPage) LibraryPath

func (p *WebPage) LibraryPath() (string, error)

LibraryPath returns the path used by InjectJS() to resolve scripts. Initially it is set to Process.Path().

func (*WebPage) NavigationLocked

func (p *WebPage) NavigationLocked() (bool, error)

NavigationLocked returns true if the navigation away from the page is disabled.

func (*WebPage) OfflineStoragePath

func (p *WebPage) OfflineStoragePath() (string, error)

OfflineStoragePath returns the path used by offline storage.

func (*WebPage) OfflineStorageQuota

func (p *WebPage) OfflineStorageQuota() (int, error)

OfflineStorageQuota returns the number of bytes that can be used for offline storage.

func (*WebPage) Open

func (p *WebPage) Open(url string) error

Open opens a URL.

func (*WebPage) OwnsPages

func (p *WebPage) OwnsPages() (bool, error)

OwnsPages returns true if this page owns pages opened in other windows.

func (*WebPage) Page

func (p *WebPage) Page(name string) (*WebPage, error)

Page returns an owned page by window name. Returns nil if the page cannot be found.

func (*WebPage) PageWindowNames

func (p *WebPage) PageWindowNames() ([]string, error)

PageWindowNames returns an list of owned window names.

func (*WebPage) Pages

func (p *WebPage) Pages() ([]*WebPage, error)

Pages returns a list of owned pages.

func (*WebPage) PaperSize

func (p *WebPage) PaperSize() (PaperSize, error)

PaperSize returns the size of the web page when rendered as a PDF.

func (*WebPage) PlainText

func (p *WebPage) PlainText() (string, error)

PlainText returns the plain text representation of the page.

func (*WebPage) Reload

func (p *WebPage) Reload() error

Reload reloads the current web page.

func (*WebPage) Render

func (p *WebPage) Render(filename, format string, quality int) error

Render renders the web page to a file with the given format and quality settings. This supports the "PDF", "PNG", "JPEG", "BMP", "PPM", and "GIF" formats.

func (*WebPage) RenderBase64

func (p *WebPage) RenderBase64(format string) (string, error)

RenderBase64 renders the web page to a base64 encoded string.

func (*WebPage) ScrollPosition

func (p *WebPage) ScrollPosition() (Position, error)

ScrollPosition returns the current scroll position of the page.

func (*WebPage) SendKeyboardEvent

func (p *WebPage) SendKeyboardEvent(eventType string, key string, modifier int) error

SendKeyboardEvent sends a keyboard event as if it came from the user. It is not a synthetic event.

The eventType can be "keyup", "keypress", or "keydown".

The key argument is a string or a key listed here: https://github.com/ariya/phantomjs/commit/cab2635e66d74b7e665c44400b8b20a8f225153a

Keyboard modifiers can be joined together using the bitwise OR operator.

func (*WebPage) SendMouseEvent

func (p *WebPage) SendMouseEvent(eventType string, mouseX, mouseY int, button string) error

SendMouseEvent sends a mouse event as if it came from the user. It is not a synthetic event.

The eventType can be "mouseup", "mousedown", "mousemove", "doubleclick", or "click". The mouseX and mouseY specify the position of the mouse on the screen. The button argument specifies the mouse button clicked (e.g. "left").

func (*WebPage) SetClipRect

func (p *WebPage) SetClipRect(rect Rect) error

SetClipRect sets the clipping rectangle used when rendering. Set to nil to render the entire webpage.

func (*WebPage) SetContent

func (p *WebPage) SetContent(content string) error

SetContent sets the content of the webpage.

func (*WebPage) SetContentAndURL

func (p *WebPage) SetContentAndURL(content, url string) error

SetContentAndURL sets the content and URL of the page.

func (*WebPage) SetCookies

func (p *WebPage) SetCookies(cookies []*http.Cookie) error

SetCookies sets a list of cookies visible to the current URL.

func (*WebPage) SetCustomHeaders

func (p *WebPage) SetCustomHeaders(header http.Header) error

SetCustomHeaders sets a list of additional headers sent with the web page.

This function does not support multiple headers with the same name. Only the first value for a header key will be used.

func (*WebPage) SetFrameContent

func (p *WebPage) SetFrameContent(content string) error

SetFrameContent sets the content of the current frame.

func (*WebPage) SetLibraryPath

func (p *WebPage) SetLibraryPath(path string) error

SetLibraryPath sets the library path used by InjectJS().

func (*WebPage) SetNavigationLocked

func (p *WebPage) SetNavigationLocked(value bool) error

SetNavigationLocked sets whether navigation away from the page should be disabled.

func (*WebPage) SetOwnsPages

func (p *WebPage) SetOwnsPages(v bool) error

SetOwnsPages sets whether this page owns pages opened in other windows.

func (*WebPage) SetPaperSize

func (p *WebPage) SetPaperSize(size PaperSize) error

SetPaperSize sets the size of the web page when rendered as a PDF.

func (*WebPage) SetScrollPosition

func (p *WebPage) SetScrollPosition(pos Position) error

SetScrollPosition sets the current scroll position of the page.

func (*WebPage) SetSettings

func (p *WebPage) SetSettings(settings WebPageSettings) error

SetSettings sets various settings on the web page.

The settings apply only during the initial call to the page.open function. Subsequent modification of the settings object will not have any impact.

func (*WebPage) SetViewportSize

func (p *WebPage) SetViewportSize(width, height int) error

SetViewportSize sets the size of the viewport.

func (*WebPage) SetZoomFactor

func (p *WebPage) SetZoomFactor(factor float64) error

SetZoomFactor sets the zoom factor when rendering the page.

func (*WebPage) Settings

func (p *WebPage) Settings() (WebPageSettings, error)

Settings returns the settings used on the web page.

func (*WebPage) Stop

func (p *WebPage) Stop() error

Stop stops the web page.

func (*WebPage) SwitchToFocusedFrame

func (p *WebPage) SwitchToFocusedFrame() error

SwitchToFocusedFrame changes the current frame to the frame that is in focus.

func (*WebPage) SwitchToFrameName

func (p *WebPage) SwitchToFrameName(name string) error

SwitchToFrameName changes the current frame to a frame with a given name.

func (*WebPage) SwitchToFramePosition

func (p *WebPage) SwitchToFramePosition(pos int) error

SwitchToFramePosition changes the current frame to the frame at the given position.

func (*WebPage) SwitchToMainFrame

func (p *WebPage) SwitchToMainFrame() error

SwitchToMainFrame switches the current frame to the main frame.

func (*WebPage) SwitchToParentFrame

func (p *WebPage) SwitchToParentFrame() error

SwitchToParentFrame switches the current frame to the parent of the current frame.

func (*WebPage) Title

func (p *WebPage) Title() (string, error)

Title returns the title of the web page.

func (*WebPage) URL

func (p *WebPage) URL() (string, error)

URL returns the current URL of the web page.

func (*WebPage) UploadFile

func (p *WebPage) UploadFile(selector, filename string) error

UploadFile uploads a file to a form element specified by selector.

func (*WebPage) ViewportSize

func (p *WebPage) ViewportSize() (width, height int, err error)

ViewportSize returns the size of the viewport on the browser.

func (*WebPage) WindowName

func (p *WebPage) WindowName() (string, error)

WindowName returns the window name of the web page.

func (*WebPage) ZoomFactor

func (p *WebPage) ZoomFactor() (float64, error)

ZoomFactor returns zoom factor when rendering the page.

type WebPageSettings

type WebPageSettings struct {
	JavascriptEnabled             bool
	LoadImages                    bool
	LocalToRemoteURLAccessEnabled bool
	UserAgent                     string
	Username                      string
	Password                      string
	XSSAuditingEnabled            bool
	WebSecurityEnabled            bool
	ResourceTimeout               time.Duration
}

WebPageSettings represents various settings on a web page.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL