grab

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2017 License: MIT Imports: 19 Imported by: 0

README

grab

GoDoc Build Status Go Report Card

Downloading the internet, one go routine at a time!

$ go get github.com/cavaliercoder/grab

Grab is a Go package for downloading files from the internet with the following rad features:

  • Monitor download progress asynchronously
  • Auto-resume incomplete downloads
  • Guess filename from content header or URL path
  • Safely cancel downloads
  • Validate downloads using checksums
  • Download batches of files asynchronously

For a full walkthrough, see: http://cavaliercoder.com/blog/downloading-large-files-in-go.html

Requires Go v1.4+

Example

The following code can be used to create a cut-down 'wget'-like binary which simply downloads each URL given on the command line to the current working directory.

Files are downloaded three at a time with progress updates printed periodically.

package main

import (
	"fmt"
	"github.com/cavaliercoder/grab"
	"os"
	"time"
)

func main() {
	// validate command args
	if len(os.Args) < 2 {
		fmt.Fprintf(os.Stderr, "usage: %s url [url]...\n", os.Args[0])
		os.Exit(1)
	}

	// create a custom client
	client := grab.NewClient()
	client.UserAgent = "Grab example"

	// create request for each URL given on the command line
	reqs := make([]*grab.Request, 0)
	for _, url := range os.Args[1:] {
		req, err := grab.NewRequest(url)
		if err != nil {
			fmt.Fprintf(os.Stderr, "%v\n", err)
			os.Exit(1)
		}

		reqs = append(reqs, req)
	}

	// start file downloads, 3 at a time
	fmt.Printf("Downloading %d files...\n", len(reqs))
	respch := client.DoBatch(3, reqs...)

	// start a ticker to update progress every 200ms
	t := time.NewTicker(200 * time.Millisecond)

	// monitor downloads
	completed := 0
	inProgress := 0
	responses := make([]*grab.Response, 0)
	for completed < len(reqs) {
		select {
		case resp := <-respch:
			// a new response has been received and has started downloading
			// (nil is received once, when the channel is closed by grab)
			if resp != nil {
				responses = append(responses, resp)
			}

		case <-t.C:
			// clear lines
			if inProgress > 0 {
				fmt.Printf("\033[%dA\033[K", inProgress)
			}

			// update completed downloads
			for i, resp := range responses {
				if resp != nil && resp.IsComplete() {
					// print final result
					if resp.Error != nil {
						fmt.Fprintf(os.Stderr, "Error downloading %s: %v\n", resp.Request.URL(), resp.Error)
					} else {
						fmt.Printf("Finished %s %d / %d bytes (%d%%)\n", resp.Filename, resp.BytesTransferred(), resp.Size, int(100*resp.Progress()))
					}

					// mark completed
					responses[i] = nil
					completed++
				}
			}

			// update downloads in progress
			inProgress = 0
			for _, resp := range responses {
				if resp != nil {
					inProgress++
					fmt.Printf("Downloading %s %d / %d bytes (%d%%)\033[K\n", resp.Filename, resp.BytesTransferred(), resp.Size, int(100*resp.Progress()))
				}
			}
		}
	}

	t.Stop()

	fmt.Printf("%d files successfully downloaded.\n", len(reqs))
}

License

Copyright (c) 2015 Ryan Armstrong

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Documentation

Overview

Package grab provides a HTTP client implementation specifically geared for downloading large files with progress feedback, pause and resume and checksum validation features.

For a full walkthrough, see: http://cavaliercoder.com/blog/downloading-large-files-in-go.html

Please log any issues at: https://github.com/cavaliercoder/grab/issues

If the given destination path for a transfer request is a directory, the file transfer will be stored in that directory and the file's name will be determined using Content-Disposition headers in the server's response or from the last segment of the path of the URL.

An empty destination string or "." means the transfer will be stored in the current working directory.

If a destination file already exists, grab will assume it is a complete or partially complete download of the requested file. If the remote server supports resuming interrupted downloads, grab will resume downloading from the end of the partial file. If the server does not support resumed downloads, the file will be retransferred in its entirety. If the file is already complete, grab will return successfully.

Index

Constants

This section is empty.

Variables

View Source
var DefaultClient = NewClient()

DefaultClient is the default client and is used by all Get convenience functions.

Functions

func GetAsync

func GetAsync(dst, src string) (<-chan *Response, error)

GetAsync sends a file transfer request and returns a channel to receive the file transfer response context.

The Response is sent via the returned channel and the channel closed as soon as the HTTP/1.1 GET request has been served; before the file transfer begins.

The Response may then be used to monitor the progress of the file transfer while it is in process.

Any error which occurs during the file transfer will be set in the returned Response.Error field as soon as the Response.IsComplete method returns true.

GetAsync is a wrapper for DefaultClient.DoAsync.

func GetBatch

func GetBatch(workers int, dst string, sources ...string) (<-chan *Response, error)

GetBatch executes multiple requests with the given number of workers and immediately returns a channel to receive the Responses as they become available. Excess requests are queued until a worker becomes available. The channel is closed once all responses have been sent.

GetBatch requires that the destination path is an existing directory. If not, an error is returned which may be identified with IsBadDestination.

If zero is given as the worker count, one worker will be created for each given request and all requests will start at the same time.

Each response is sent through the channel once the request is initiated via HTTP GET or an error has occurred, but before the file transfer begins.

Any error which occurs during any of the file transfers will be set in the associated Response.Error field as soon as the Response.IsComplete method returns true.

GetBatch is a wrapper for DefaultClient.DoBatch.

func IsBadDestination

func IsBadDestination(err error) bool

IsBadDestination returns a boolean indicating whether the error is known to report that the given destination path is not valid for the requested operation.

func IsChecksumMismatch

func IsChecksumMismatch(err error) bool

IsChecksumMismatch returns a boolean indicating whether the error is known to report that the downloaded file did not match the expected checksum value.

func IsContentLengthMismatch

func IsContentLengthMismatch(err error) bool

IsContentLengthMismatch returns a boolean indicating whether the error is known to report that a HTTP response indicated that the requested file is not the expected length.

func IsNoFilename

func IsNoFilename(err error) bool

IsNoFilename returns a boolean indicating whether the error is known to report that a destination filename could not be determined from the Content-Disposition headers of a HTTP response or the requested URL path.

Types

type Client

type Client struct {
	// HTTPClient specifies the http.Client which will be used for communicating
	// with the remote server during the file transfer.
	HTTPClient *http.Client

	// UserAgent specifies the User-Agent string which will be set in the
	// headers of all requests made by this client.
	//
	// The user agent string may be overridden in the headers of each request.
	UserAgent string
}

A Client is a file download client.

Clients are safe for concurrent use by multiple goroutines.

func NewClient

func NewClient() *Client

NewClient returns a new file download Client, using default configuration.

func (*Client) CancelRequest

func (c *Client) CancelRequest(req *Request)

CancelRequest cancels an in-flight request by closing its connection.

func (*Client) Do

func (c *Client) Do(req *Request) (*Response, error)

Do sends a file transfer request and returns a file transfer response context, following policy (e.g. redirects, cookies, auth) as configured on the client's HTTPClient.

An error is returned if caused by client policy (such as CheckRedirect), or if there was an HTTP protocol error.

Do is a synchronous, blocking operation which returns only once a download request is completed or fails. For non-blocking operations which enable the monitoring of transfers in process, see DoAsync and DoBatch.

func (*Client) DoAsync

func (c *Client) DoAsync(req *Request) <-chan *Response

DoAsync sends a file transfer request and returns a channel to receive the file transfer response context.

The Response is sent via the returned channel and the channel closed as soon as the HTTP/1.1 GET request has been served; before the file transfer begins.

The Response may then be used to monitor the progress of the file transfer while it is in process.

Any error which occurs during the file transfer will be set in the returned Response.Error field as soon as the Response.IsComplete method returns true.

func (*Client) DoBatch

func (c *Client) DoBatch(workers int, reqs ...*Request) <-chan *Response

DoBatch executes multiple requests with the given number of workers and immediately returns a channel to receive the Responses as they become available. Excess requests are queued until a worker becomes available. The channel is closed once all responses have been sent.

If zero is given as the worker count, one worker will be created for each given request and all requests will start at the same time.

Each response is sent through the channel once the request is initiated via HTTP GET or an error has occurred, but before the file transfer begins.

Any error which occurs during any of the file transfers will be set in the associated Response.Error field as soon as the Response.IsComplete method returns true.

func (*Client) DoChannel

func (c *Client) DoChannel(workers int, reqs <-chan *Request) <-chan *Response

DoChannel executes multiple requests with the given number of workers and immediately returns a channel to receive the Responses as they become available. Excess requests are queued until a worker becomes available. The channel is closed once the reqs channel is closed and all responses have been sent.

If zero is given as the worker count, one worker will be created.

Each response is sent through the channel once the request is initiated via HTTP GET or an error has occurred, but before the file transfer begins.

Any error which occurs during any of the file transfers will be set in the associated Response.Error field as soon as the Response.IsComplete method returns true.

type Request

type Request struct {
	// Label is an arbitrary string which may used to label a Request with a
	// user friendly name.
	Label string

	// Tag is an arbitrary interface which may be used to relate a Request to
	// other data.
	Tag interface{}

	// HTTPRequest specifies the http.Request to be sent to the remote server to
	// initiate a file transfer. It includes request configuration such as URL,
	// protocol version, HTTP method, request headers and authentication.
	HTTPRequest *http.Request

	// Filename specifies the path where the file transfer will be stored in
	// local storage.
	//
	// An empty string means the transfer will be stored in the current working
	// directory.
	Filename string

	// CreateMissing specifies that any missing directories in the Filename path
	// should be automatically created.
	CreateMissing bool

	// SkipExisting specifies that any files at the given Filename path, that
	// already exist will be naively skipped; without checking file size or
	// checksum.
	SkipExisting bool

	// Size specifies the expected size of the file transfer if known. If the
	// server response size does not match, the transfer is cancelled and an
	// error returned.
	Size uint64

	// BufferSize specifies the size in bytes of the buffer that is used for
	// transferring the requested file. Larger buffers may result in faster
	// throughput but will use more memory and result in less frequent updates
	// to the transfer progress statistics. Default: 4096.
	BufferSize uint

	// Hash specifies the hashing algorithm that will be used to compute the
	// checksum value of the transferred file.
	//
	// If Checksum or Hash is nil, no checksum validation occurs.
	Hash hash.Hash

	// Checksum specifies the expected checksum value of the transferred file.
	//
	// If Checksum or Hash is nil, no checksum validation occurs.
	Checksum []byte

	// RemoveOnError specifies that any completed download should be deleted if
	// it fails checksum validation.
	RemoveOnError bool

	// NotifyOnClose specifies a channel that will notified when the requested
	// transfer is completed, either successfully or with an error.
	NotifyOnClose chan<- *Response
	// contains filtered or unexported fields
}

A Request represents an HTTP file transfer request to be sent by a Client.

func NewRequest

func NewRequest(urlStr string) (*Request, error)

NewRequest returns a new file transfer Request suitable for use with Client.Do.

func (*Request) SetChecksum

func (c *Request) SetChecksum(algorithm string, checksum []byte) error

SetChecksum sets the expected checksum value and hashing algorithm to use when validating a completed file transfer.

The following hashing algorithms are supported:

md5
sha1
sha256
sha512

func (*Request) URL

func (c *Request) URL() *url.URL

URL returns the URL to be requested from the remote server.

type Response

type Response struct {

	// The Request that was sent to obtain this Response.
	Request *Request

	// HTTPResponse specifies the HTTP response received from the remote server.
	//
	// The response Body should not be used as it will be consumed and closed by
	// grab.
	HTTPResponse *http.Response

	// Filename specifies the path where the file transfer is stored in local
	// storage.
	Filename string

	// Size specifies the total expected size of the file transfer.
	Size uint64

	// Error specifies any error that may have occurred during the file
	// transfer.
	//
	// This should not be read until IsComplete returns true.
	Error error

	// Start specifies the time at which the file transfer started.
	Start time.Time

	// End specifies the time at which the file transfer completed.
	//
	// This should not be read until IsComplete returns true.
	End time.Time

	// DidResume specifies that the file transfer resumed a previously
	// incomplete transfer.
	DidResume bool
	// contains filtered or unexported fields
}

Response represents the response to a completed or in-process download request.

For asynchronous operations, the Response also provides context for the file transfer while it is process. All functions are safe to use from multiple go-routines.

func Get

func Get(dst, src string) (*Response, error)

Get sends a file transfer request and returns a file transfer response context, following policy (e.g. redirects, cookies, auth) as configured on the client's HTTPClient.

An error is returned if caused by client policy (such as CheckRedirect), or if there was an HTTP protocol error.

Get is a synchronous, blocking operation which returns only once a download request is completed or fails. For non-blocking operations which enable the monitoring of transfers in process, see GetAsync, GetBatch or use a Client.

Get is a wrapper for DefaultClient.Do.

func (*Response) AverageBytesPerSecond

func (c *Response) AverageBytesPerSecond() float64

AverageBytesPerSecond returns the average bytes transferred per second over the duration of the file transfer.

func (*Response) BytesTransferred

func (c *Response) BytesTransferred() uint64

BytesTransferred returns the number of bytes which have already been downloaded, including any data used to resume a previous download.

func (*Response) Duration

func (c *Response) Duration() time.Duration

Duration returns the duration of a file transfer. If the transfer is in process, the duration will be between now and the start of the transfer. If the transfer is complete, the duration will be between the start and end of the completed transfer process.

func (*Response) ETA

func (c *Response) ETA() time.Time

ETA returns the estimated time at which the the download will complete. If the transfer has already complete, the actual end time will be returned.

func (*Response) IsComplete

func (c *Response) IsComplete() bool

IsComplete indicates whether the Response transfer context has completed with either a success or failure. If the transfer was unsuccessful, Response.Error will be non-nil.

func (*Response) Progress

func (c *Response) Progress() float64

Progress returns the ratio of bytes which have already been downloaded over the total file size as a fraction of 1.00.

Multiply the returned value by 100 to return the percentage completed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL