httpstream

package module
v0.0.0-...-0f8caf3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2021 License: Apache-2.0 Imports: 16 Imported by: 0

README

httpstream

httpstream provides parallel download of a single http file that can be streamed incrementally. It uses multiple http connections and 'byte range' GET calls to do so. The downloaded portions of the file are then assembled into their original order and streamed as the contiguous run of blocks (from the start of the file) is received. This allows for a simple reader to consume the data as if it were read serially. Typical use of the package is:

    dl := httpstream.New(ctx, url)
    _, err := io.Copy(os.Stdout, dl)

A command line tool is also provided, github.com/cosnicolaou/httpstream/cmd/httpstream, which can be used as follows:

$ go run ./cmd/httpstream --url=https://dumps.wikimedia.org/wikidatawiki/entities/20191202/wikidata-20191202-all.json.bz2 --output=20191202.bz2 --md5=316e7d034c50f072a1b2738ef366bd76

The combination of parallel download and streaming allow for significantly reducing the latency of large file downloads run on multi-core machines that are the first step in a more complex pipeline. For example, this package can be used in conjunction with github.com/cosnicolaou/pbzip2 to overlap download and decompression. For the example file above, the download completes in around 30 minutes vs 2-3 hours. As always, some degree of tuning will be required for each site. The number of parallel connections can be controlled as can the size of each byte range request. Different values for these can have dramatic effects on the achieved througput. Larger values also decrease the frequency of status updates and hence any progress bar updates.

The package provides support for:

  • verifying the md5 or sha1 of the downloaded file
  • implementing a progress bar by sending progress updates over a channel.

A planned additional feature is the ability to checkpoint and resume a given download.

Documentation

Overview

Copyright 2020 Cosmos Nicolaou. All rights reserved. Use of this source code is governed by the Apache-2.0 license that can be found in the LICENSE file.

Copyright 2020 Cosmos Nicolaou. All rights reserved. Use of this source code is governed by the Apache-2.0 license that can be found in the LICENSE file.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Downloader

type Downloader struct {
	// contains filtered or unexported fields
}

Downloader represents a concurrent, streaming, http downloader.

func New

func New(ctx context.Context, url string, opts ...Option) *Downloader

New returns a new instance of Downloader.

func (*Downloader) ContentLength

func (dl *Downloader) ContentLength() int64

ContentLength returns the content length header for the file being downloaded.

func (*Downloader) Finish

func (dl *Downloader) Finish()

func (*Downloader) Read

func (dl *Downloader) Read(buf []byte) (int, error)

Read implements io.Reader.

func (*Downloader) Reader

func (dl *Downloader) Reader() io.Reader

Reader returns an io.Reader.

type Option

type Option func(*options)

Option represen ts an option to NewDownloader.

func Chunksize

func Chunksize(n int64) Option

Chunksize sets the size of each byte range chunk to be requested.

func Concurrency

func Concurrency(n int) Option

Concurrency sets the degree of concurrency to use, that is, the number of download threads.

func SendUpdates

func SendUpdates(ch chan<- Progress) Option

SendUpdates enables sending progreess updates to the specified channel.

func Verbose

func Verbose(v bool) Option

Verbose controls verbose logging.

func VerifyMD5

func VerifyMD5(sum string) Option

func VerifySHA1

func VerifySHA1(sum string) Option

type Progress

type Progress struct {
	Size int
}

Progress represents a progress update

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL