screenshot

package module
v0.4.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2023 License: GPL-3.0 Imports: 24 Imported by: 1

README

ScreenShot

golang GoDoc Go Report Issues Size Tag View examples License


Purpose

Sometimes you don't want just standard web links in your web presentation but a preview image showing the page you're linking to. That is where this package comes in. It generates – by way of calling the external Chrome browser – an image of the web page a given URL addresses. Those image files are stored locally and may be used as often as you want without additional external network traffic.

Installation

You can use Go to install this package for you:

go get github.com/mwat56/screenshot

After that you can import it the usual Go way to use the library.

Usage

There are only two functions you have to worry about:

// SetImageDir sets the directory to use for storing the generated
// screenshot images.
//
// If `aDirectory` is empty or invalid the system's temp directory is used.
//
// `aDirectory` The directory to store the generated images.
func SetImageDir(aDirectory string) { … }

This function should be called before any other one to make sure the generated screenshots end up where you want them to be. The default is the system's temp directory (e.g. /tmp under GNU/Linux).

To actually create the screenshot image you'd call:

// CreateImage generates an image of `aURL` and stores it in `ImageDir()`,
// returning the file name of the saved image or an error in case of problems.
//
//	`aURL` The address of the web page to process.
func CreateImage(aURL string) (string, error) { … }

The returned string is the name of the generated image file (without its path). If you combine it with the directory returned by ImageDir() you get the complete path/filename to locally access the image.

Generating a screenshot image usually takes between one and five seconds, depending on the actual web-page in question; however, it can take considerably longer. To avoid hanging the program the CreateImage() function uses a timeout of half a minute.

And, finally, not all web-pages can be rendered properly and turned into an image. In case of errors (like network-errors or problem while storing the image file) CreateImage() returns an empty filename and an error.

There are a couple more functions (mostly property GETters and SETters) which you will probably barely need; for details refer to the source code documentation.

Libraries

The Go library controlling a headless instance of the Chrome browser

is required for this package to work. Under Linux this browser is usually part of your distribution (as chromium-browser).

To resize the screenshot if required by the ImageHeight()/ImageWidth() values the

must be part of your Go installation (if not, run: go get -u golang.org/x/image/draw).

Example

In the source code's sub-directory app/ there's a demo program (screenshot.go) allowing you to generate a screenshot image of an URL given on the commandline.

To run it call e.g.

#> cd app
#> go build screenshot.go
#> ./screenshot

It will show you all available commandline options e.g.:

Usage: ./screenshot [OPTIONS]

-bc
	allow the browser to handle web cookies (default false)
-be
	skip sites with Certificate errors (default false)
-bm
	let browser emulate a mobile device (default false)
-bs
	let browser show scrollbars if available (default false)
-bt int
	max. time (seconds) allowed to process a single web page (default 32)
-ia
	accept the respective other image format (default true)
-id string
	directory for storing the screenshot image (default "/tmp")
-ih int
	max. height of the screenshot image (default 768)
-io
	overwrite an existing image (default false)
-iq int
	quality of the screenshot image (default 75)
-is float
	the browser's scale factor for the screenshot image (default 0.00)
-iw int
	max. width of the screenshot image (default 896)
-ja string
	name of text-file that contains sites better avoiding JavaScript
	(default "/home/matthias/devel/Go/src/github.com/mwat56/screenshot/app/hostsavoidjs.list")
-jn string
	name of text-file that contains sites needing JavaScript
	(default "/home/matthias/devel/Go/src/github.com/mwat56/screenshot/app/hostsneedjs.list")
-jp navigator.platform
	Identifier the JavaScript navigator.platform should use (default "Linux x86_64")
-js
	allow browser's use of JavaScript (default false)
-ju string
	description of the UserAgent the browser should report
	(default "Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0")
-u string
	(*required*) the URL for the browser's screenshot
-v	verbose (default false)

As noted before you'll only need the -u string option, obviously.

You can use this program to generate screenshot images "by hand" and fiddle with the various commandline options to see what difference it makes if you change them.

History

Prior to this a few years back I wrote the pageview package which used the external wkhtmltoimage program; and in most cases it worked just fine. However, once in a while wkhtmltoimage produced a segmentation fault (core dumped) – reproducible. For a while I thought I could live with it, but over time it happened more often (i.e. with additional URLs). Fiddling around with various commandline options provided no improvement. In the end I started to look around, searching for alternative approaches – short of writing my own URL retrieval and rendering system. That's when I found ChromeDP and hence this package came into existence.

Licence

    Copyright © 2022 M.Watermann, 10247 Berlin, Germany
                    All rights reserved
                EMail : <support@mwat.de>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the GNU General Public License along with this program. If not, see the GNU General Public License for details.


Documentation

Overview

Package screenshot implements a web page link preview (snapshot image).

Copyright © 2022 M.Watermann, 10247 Berlin, Germany
                All rights reserved
            EMail : <support@mwat.de>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the GNU General Public License along with this program. If not, see the [GNU General Public License](http://www.gnu.org/licenses/gpl.html) for details.

Copyright © 2022 M.Watermann, 10247 Berlin, Germany
                All rights reserved
            EMail : <support@mwat.de>

Index

Constants

View Source
const (
	// Default `UserAgent` string:
	DefaultAgent = `Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0`

	// Default `Platform` string to use ba JavaScript:
	DefaultPlatform = `Linux x86_64`

	// Filename of list of hosts/domains where JS should be avoided:
	HostsAvoidJS = `hostsavoidjs.list`

	// Filename of list of hosts/domains where JS is needed:
	HostsNeedJS = `hostsneedjs.list`
)

Variables

This section is empty.

Functions

func AcceptOther added in v0.2.0

func AcceptOther() bool

AcceptOther returns whether to respect the respective other image format.

The `CreateImage()` function checks whether a screenshot image already exists and – if so – doesn't create a new one. The filename extension (and it's image format) is determined by the `ImageQuality()` setting: See the comments there. Now, assume current `ImageType()` is configured `png` and `CreateImage()` is called: To check whether there's already a screenshot present it looks for the appropriate image file with a `png` extension. If it exists no further work is done. However, if `AcceptOther()` is true (i.e. the default) the other `ImageType()` (`jpeg` in this example) is checked as well, and if that file exists no further work is done and `CreateImage()` will return the already existing filename.

See also `ImageOverwrite()`.

func AvoidJSfile added in v0.4.0

func AvoidJSfile() string

AvoidJSfile returns the name of the path/file containing hosts/domains where to avoid running JavaScript.

NOTE: This value is used only if the `JavaScript()` property is `true`.

func CertErrors

func CertErrors() bool

CertErrors returns whether to skip sites with certificate errors; defaults to `false` which in consequence ignores such errors.

func Cookies

func Cookies() bool

Cookies returns whether to allow web cookies during page retrieval; defaults to `false` for safety and speed reasons.

func CreateImage

func CreateImage(aURL string) (string, error)

CreateImage generates an image of `aURL` and stores it in `ImageDir()`, returning the file name of the saved image or an error in case of problems.

In case the `ImageAge()` or `AcceptOther()` properties determine that the requested screenshot image already exists this function does not in fact create another screenshot but returns that existing filename. See also the comments to the `SetAcceptOther()` function.

`aURL` The address of the web page to process.

func ImageAge

func ImageAge() int64

ImageAge returns the maximum age (in hours) of the locally stored screenshot images.

func ImageDir

func ImageDir() string

ImageDir returns the directory to store the generated screenshot images.

func ImageHeight

func ImageHeight() int

ImageHeight is the max. height of the virtual screen used to render. The initial default value is `768`.

NOTE: This is the max. height of the screenshot. Depending on the actual web-site and its rendering by the used 'Chrome' instance the generated image's height could be less.

The value `0` (zero) renders the entire page top to bottom, calculating the actual height from the page content.

func ImageOverwrite added in v0.3.0

func ImageOverwrite() bool

ImageOverwrite returns whether an existing file should be overwritten.

By default (i.e. with this value `false`) `CreateImage()` will not replace an already existing image file by a new screenshot. With this property set `true` the`CreateImage()` function will overwrite any existing file regardless of e.g. age (see `ImageAge()`) or quality (see `ImageQuality()`).

func ImageQuality

func ImageQuality() int

ImageQuality returns the desired image quality.

func ImageScale

func ImageScale() float64

ImageScale returns the virtual browser's scale factor for the generated screenshot image.

func ImageType

func ImageType() string

ImageType returns the type/format of the screenshot file generated.

NOTE: The image type/format depends on the given `ImageQuality()`: `quality == 100` results in a `png` image, `quality < 100` results in a `jpeg` image.

If the URL to shoot points to an image file (i.e. ".gif", ".jpeg", ".jpg", ".png", ".svg") the result of this function might be _wrong_ because the actually generated image depends on the type of the requested image.

func ImageWidth

func ImageWidth() int

ImageWidth is the width in pixels of the imaginary screen used to render. The default value is `896`.

NOTE: This is the max. width of the screenshot. Depending on the actual web-site and its rendering by the running 'Chrome' instance the generated image could be smaller.

func JavaScript

func JavaScript() bool

JavaScript returns whether to allow JavaScript during page retrieval; defaults to `false` for safety and speed reasons.

func MaxProcessTime

func MaxProcessTime() int64

MaxProcessTime returns the timeout (in seconds) used to retrieve & render a requested web page. The initial default value is `32`.

func Mobile

func Mobile() bool

Mobile returns whether the virtual browser should emulate a mobile device.

func NeedJSfile added in v0.4.0

func NeedJSfile() string

NeedJSfile returns the name of the path/file containing hosts/domains requiring JavaScript to be active/working.

NOTE: This value is used only if the `JavaScript()` option is set `false`.

func PathFile

func PathFile(aURL string) string

PathFile returns the complete local path/file of `aURL`.

NOTE: This function does not check whether the image file for `aURL` actually exists in the local filesystem but just reports the default path-/filename computed by string operations.

`aURL` The address of the web page to process.

func Platform

func Platform() string

Platform returns the text the JS `navigator.platform` should return.

NOTE: This value is used only if the `JavaScript()` option is set `true`.

func ReadWaitTime added in v0.1.2

func ReadWaitTime() int64

ReadWaitTime returns the number of minutes to wait before an Avoid/Need hosts file is re-read. The initial default value is `1`.

func Scrollbars

func Scrollbars() bool

Scrollbars returns whether the virtual browser will show scrollbars (if available in web-page).

func SetAcceptOther added in v0.2.0

func SetAcceptOther(doUse bool)

AcceptOther returns whether to respect the respective other image format.

(See comments to the `AcceptOther()` function.)

`doUse` If `true` (i.e. the default) an existing screenshot image

of the "other" format will satisfy.

func SetAvoidJSfile added in v0.4.0

func SetAvoidJSfile(aFilename string)

SetAvoidJSfile configures the name of the file containing hosts/domains where to avoid running JavaScript.

NOTE: This value is used only if the `JavaScript()` property is `true`. An invalid filename disables the feature.

`aFilename` The path/filename of sites with JavaScript to avoid.

func SetCertErrors

func SetCertErrors(doIgnore bool)

SetCertErrors determines whether to skip sites with certificate errors or process the respective page anyway.

`doIgnore` If `false` (i.e. the default) all certificate errors will

be ignored and web-sites will be processed regardless of such errors.

func SetCookies

func SetCookies(doAllow bool)

SetCookies determines whether to allow web cookies during page retrieval or not.

`anAllow` If `false` (i.e. the default) no cookies will be available

during page retrieval, otherwise (i.e. `true`) they will be used.

func SetImageAge

func SetImageAge(aMaxAge int64)

SetImageAge sets the maximum age of locally stored screenshot images before they may get updated by a new call to `CreateImage(…)`.

Usually you'll want this property at its default value (`0`, zero) which disables an age check because usually you want an image of the page at the time you linked to it.

`aMaxAge` is the age (in hours) a page image can have before

requesting it again.

func SetImageDir

func SetImageDir(aDirectory string)

SetImageDir sets the directory to use for storing the generated screenshot images.

If `aDirectory` is empty or invalid the system's temp directory is used.

`aDirectory` The directory to store the generated images.

func SetImageHeight

func SetImageHeight(aHeight int)

SetImageHeight sets the height in pixels of the screenshot images to generate. The initial default value is `768`.

See comments of `ImageHeight()`.

Setting this value to `0` will result in an image containing the whole web-page (which might be quite long); so the actual height of the generated screenshot would be unpredictable.

`aHeight` The new height of the images to generate.

func SetImageOverwrite added in v0.3.0

func SetImageOverwrite(doAllow bool)

SetImageOverwrite sets an existing file should be overwritten.

See comments of `ImageOverwrite()`:

`doAllow` Whether an existing file should be overwritten.

func SetImageQuality

func SetImageQuality(aQuality int)

SetImageQuality changes the quality of the screenshot image to be generated. Values are supported between `1` and `100`; default is `75`.

`aQuality` the new desired image quality.

func SetImageScale

func SetImageScale(aFactor float64)

SetImageScale sets the virtual browser's scale factor for the generated screenshot image.

`aFactor` the new scale factor; `0` disables scaling.

func SetImageWidth

func SetImageWidth(aWidth int)

SetImageWidth sets the width of the images to generate. The initial default value is `896`.

See comments of `ImageWidth()`.

`aWidth` The new width of the images to generate.

func SetJavaScript

func SetJavaScript(doAllow bool)

SetJavaScript determines whether to activate the JavaScript engine during page retrieval or not.

`doAllow` If `false` (i.e. the default) no JavaScript will be available

during page retrieval, otherwise (i.e. `true`) it will be activated.

func SetMaxProcessTime

func SetMaxProcessTime(aProcessTime int64)

SetMaxProcessTime sets the timeout used to retrieve & render a requested web page.

NOTE: A wrong (i.e. negative) value and `0` (zero) resets the timeout value to its default of 32 seconds.

`aProcessTime` The new max. seconds allowed to process a web page.

func SetMobile

func SetMobile(aMobile bool)

SetMobile sets whether to emulate mobile device. This includes viewport meta tag, overlay scrollbars, text autosizing and more.

`aMobile` Whether the virtual browser should emulate a mobile

device.

func SetNeedJSfile added in v0.4.0

func SetNeedJSfile(aFilename string)

SetNeedJSfile configures the name of the file containing hosts/domains requiring JavaScript to be active/working.

NOTE: This value is used only if the `JavaScript()` option is set `false`. An invalid filename disables the feature.

`aFilename` The path/filename of sites with required JavaScript.

func SetPlatform

func SetPlatform(aPlatform string)

SetPlatform sets the text the JS `navigator.platform` should return.

NOTE: This value is used only if the `JavaScript()` option is set `true`.

`aPlatform` The platform identifier to use for `navigator.platform`.

func SetReadWaitTime added in v0.1.2

func SetReadWaitTime(aMinutes int64)

SetReadWaitTime sets the number of minutes to wait before an Avoid/Need hosts file is re-read.

Usually you'll want this property at its default value (`1`, one) which seems to be a reasonable compromise between batch processing (i.e. looping through a list of URLs to process) and mitigation of disk accesses. An invalid (i.e. negative) value and `0` (zero) resets this property to its default of `1` (one) minute.

`aMinutes` is the number of minutes to wait before an Avoid/Need

hosts file is re-read.

func SetScrollbars

func SetScrollbars(aScrollbar bool)

SetScrollbars sets whether the virtual browser will show scrollbars (if available in web-page).

NOTE: This feature is currently considered EXPERIMENTAL and might not work as expected.

`aScrollbar` Flag whether to show scrollbars (if available).

func SetUserAgent

func SetUserAgent(anAgent string)

SetUserAgent changes the current `User Agent` setting to `anAgent`.

NOTE: This value is used by the virtual browser in its page requests (and showing up in the page provider's logfile); if the `JavaScript()` option is set `true` the JS-engine will return this value if requested.

An invalid (empty) value resets this property to its current default of `Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0`.

`anAgent` The new `User Agent` setting.

func String

func String() string

String returns a string of lines showing the currently configured screenshot options.

func UserAgent

func UserAgent() string

UserAgent returns the current `User Agent` setting.

NOTE: This value is used only if the `JavaScript()` option is set `true`.

Types

type TScreenshotParams added in v0.4.2

type TScreenshotParams struct {
	// Flag whether to accept the respective other image format
	AcceptOther bool

	// Flag whether certificate errors should be ignored.
	CertErrors bool

	// Dis-/Allow use of web cookies
	Cookies bool

	// Path/filename of a list of web hosts/domains where JavaScript
	// running should be avoided (defaults to a file in user's homedir).
	HostsAvoidJSfile string

	// Path/filename of a list of web hosts/domains where JavaScript
	// is required to work (defaults to a file in user's homedir).
	HostsNeedJSfile string

	// Max. age of cached page screenshot images (in hours).
	ImageAge int64

	// Directory to store the generated screenshot images.
	ImageDir string

	// Max. height of the screenshot image to generate.
	ImageHeight int

	// Dis-/Allow to overwrite pre-existing screenshot files.
	ImageOverwrite bool

	// Quality (in percent) of the screenshot image to generate.
	ImageQuality int

	// The virtual browser's scale factor value.
	// 0 disables the override.
	ImageScale float64

	// Max. width of the screenshot image to generate.
	ImageWidth int

	// Flag whether to dis-/allow JavaScript in retrieved pages.
	JavaScript bool

	// Timeout (in seconds) for page processing.
	MaxProcessTime int64

	// Flag whether to emulate a mobile device or not.
	// This includes viewport meta tag, overlay scrollbars, text
	// autosizing and more.
	Mobile bool

	// The identifier the JavaScript `navigator.platform` should return.
	Platform string

	// Flag whether to show the scraped web-page's scrollbars.
	Scrollbars bool

	// User Agent to use when queuing external sites.
	UserAgent string
}

TScreenshotParams bundles all available configuration options and pass them to the `Setup()` function in a single call.

func Options

func Options() *TScreenshotParams

Options returns the currently configured screenshot options.

func (*TScreenshotParams) Do added in v0.4.2

Do uses its options' values to configure the runtime options for taking screenshots.

NOTE: While it is perfectly legal (from Go's point of view) to omit those fields you don't care about please be aware that those missing fields will nevertheless be set (by `Go`): with the respective data type's default value. And since there's no way to distinguish the automatically set default value of a missing field from a user provided value you have to handle such a situation carefully. Depending on the number of options you want to set you might want to prefer calling the various `SetXxxx()` functions (if there are less than half of the available options to set). Or – if you want to set the majority of the options – you'd provide the options you do not want to change with their already existing values by calling the respective GETter function of the option in question, like:

myOptions := &TScreenshotParams{
	// set fields …
	ImageHeight:  myHeightValue,
	ImageQuality: myQualityValue,
	// …
	// say, you don't want to change the width option
	ImageWidth:   screenshot.ImageWidth(),
}
myOptions.Do()
// continue with your program …

Directories

Path Synopsis
Copyright © 2022 M.Watermann, 10247 Berlin, Germany
Copyright © 2022 M.Watermann, 10247 Berlin, Germany

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL