chromedl

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2021 License: MIT Imports: 15 Imported by: 0

README

========================
 Chrome File Downloader
========================

.. contents::
   :depth: 2

The sole purpose of this package is to download files from the Internets with
headless Chrome bypassing the Cloudflare and maybe some other annoying browser
checks.

It does so by implementing the solutions posted in "`bypass headless chrome
detection issue`_" for chromedp_.

This library may help you if the other download methods don't work, i.e. curl or
the standard `http.Get()`.

The implementation is based on this `chromedp example`_.

Thanks to `@ZekeLu`_ for huge help in getting this going.

Compatibility
-------------

Tested with:

* Chrome (stable) v90.0.4430.93.
* github.com/chromedp/chromedp v0.6.12
* github.com/chromedp/cdproto v0.0.0-20210323015217-0942afbea50e

Newer versions of Chrome will require some code changes, as described in `this
issue`_, as it uses calls that are deprecated in newer protocol version in order
to be compatible with current stable version of Chrome (see above).

When using headless-shell docker image, please use the following tag::

  FROM chromedp/headless-shell:90.0.4430.93


LICENCES
--------
chromedp_: Copyright (c) 2016-2020 Kenneth Shaw


.. _`this issue`: https://github.com/chromedp/chromedp/issues/807
.. _`chromedp example`: https://github.com/chromedp/examples/tree/master/download_file
.. _`@ZekeLu`: https://github.com/ZekeLu
.. _chromedp: https://github.com/chromedp/chromedp
.. _`bypass headless chrome detection issue`: https://github.com/chromedp/chromedp/issues/396

Documentation

Overview

Package ChromeDL uses chromedp to download the files. It may come handy when one needs to get a file from a protected website that doesn't allow regular methods, such as curl or http.Get().

It is heavily based on https://github.com/chromedp/examples/tree/master/download_file with minor modifications.

Index

Examples

Constants

View Source
const DefaultUA = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"

DefaultUA is the default user agent string that will be used by the browser instance. Can be changed

Variables

View Source
var ErrNoChrome = errors.New("no chrome instance in the context")

ErrNoChrome indicates that there's no chrome instance in the context.

Functions

func Download added in v0.1.1

func Download(ctx context.Context, uri string, opts ...Option) (io.Reader, error)

Download downloads a file from the provided uri using the chromedp capabilities. It will return the reader with the file contents (buffered), and an error if any. If the error is present, reader may not be nil if the file was downloaded and read successfully. It will store the file in the temporary directory once the download is complete, then buffer it and try to cleanup afterwards. Set the timeout on context if required, by default no timeout is set. Optionally one can pass the configuration options for the downloader.

Example
const rbnzRates = "https://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/tables/b1/hb1-daily.xlsx?revision=5fa61401-a877-4607-b7ae-2e060c09935d"
r, err := Download(context.Background(), rbnzRates)
if err != nil {
	log.Fatal(err)
}
data, err := ioutil.ReadAll(r)
if err != nil {
	log.Fatal(err)
}
fmt.Printf("file size > 0: %v\n", len(data) > 0)
fmt.Printf("file signature: %s\n", string(data[0:2]))
Output:

file size > 0: true
file signature: PK

func Get

func Get(url string) (*http.Response, error)

Get is drop-in replacement for http.Get.

Types

type Instance added in v0.1.0

type Instance struct {
	// contains filtered or unexported fields
}

Instance is the browser instance that will be used for downloading files.

func New added in v0.1.0

func New(options ...Option) (*Instance, error)

New creates a new Instance, starting up the headless chrome to do the download. Once finished, call Stop to terminate the browser.

func NewWithChromeCtx added in v0.1.1

func NewWithChromeCtx(taskCtx context.Context, options ...Option) (*Instance, error)

NewWithChromeCtx creates new Instance for existing browser instance. Stop will not terminate the browser, but will cancel the event listener.

func (*Instance) Download added in v0.1.1

func (bi *Instance) Download(ctx context.Context, uri string) (io.Reader, error)

Download downloads the file returning the reader with contents.

func (*Instance) Get added in v0.1.0

func (bi *Instance) Get(url string) (*http.Response, error)

Get partly emulates http.Get to some extent and is meant to be drop-in replacement for http.Get in the callers code.

func (*Instance) Stop added in v0.1.0

func (bi *Instance) Stop() error

type Option added in v0.1.0

type Option func(*config)

func OptUserAgent added in v0.1.0

func OptUserAgent(ua string) Option

OptUserAgent allows setting the user agent for the browser.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL