chonker

package module
v1.2.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2024 License: MIT Imports: 13 Imported by: 0

README

Chonker

Chonker

Go Reference

Download large files as parallel chunks using HTTP Range Requests in Go.

CI

Chonker works on Go 1.19, oldstable, and stable releases.

What does Chonker do?

Chonker speeds up downloads from cloud services like Amazon S3 & CloudFront. It does this in two ways.

  1. Download small pieces of a file (a.k.a a chunk) using HTTP Range requests.
  2. Download chunks in parallel.

Why?

Chonker allows CDN services to cache and serve files even if the entire file is bigger than the individual object cache limit.

It also overcomes the per-connection limit that blob storage services often have by opening connections in parallel.

The Go standard library HTTP Client downloads files as buffered streams of bytes. The Client fetches bytes into a request buffer as fast as it can, and you read bytes from the buffer as fast as you can.

Why is this a problem?

This works great when one beefy connection to an origin server can use the entire available network bandwidth between you and it. Blob file services like Amazon S3 and caching CDNs like Amazon CloudFront impose per-connection limits, but support an almost unlimited number of connections from each client.

If you are downloading a large file from S3, its almost always better to download the file in chunks, using parallel connections.

See the S3 Developer Guide and the CloudFront Developer Guide for more information on cache sizes and parallel GETs.

Use Chonker

Use chonker.Do to fetch a response for a request, or create a "chonky" http.Transport that fetches requests using HTTP Range sub-requests.

Chonker integrates well with Go download libraries. Grab and other download managers can use a http.Client with a "chonky" http.Transport. In turn, Chonker functions accept HTTP clients that could provide automatic retries or detailed logs. See Heimdall or go-retryablehttp for more.

Chonk

Chonk is a Go program that uses the chonker library to download a URL into a local file. Run chonk -h for usage details.

go build -o chonk ./cmd/chonk

./chonk https://example.com
test.sh

test.sh is a BASH shell script that exercises the chonk program with a list of files of varying sizes. Run test.sh -h for usage details.

License

The Chonker cat illustration is from Freepik

Chonker is available under the terms of the MIT license.

See LICENSE for the full license text.

Documentation

Overview

Package chonker implements automatic ranged HTTP requests.

A ranged request is a request that is fetched in chunks using several HTTP requests. Chunks are fetched in separate goroutines by sending HTTP Range requests to the server. Chunks are then concatenated and returned as a single io.Reader. Chunks are chunkSize bytes long. A maximum of workers chunks are fetched concurrently. If the server does not support range requests, the request fails.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidArgument = errors.New(
		"chonker: chunkSize and workers must be greater than zero",
	)
	ErrMultipleRangesUnsupported = errors.New("chonker: multiple ranges not supported")
)
View Source
var (
	// ErrRangeNoOverlap is returned by ParseRange if first-byte-pos of
	// all of the byte-range-spec values is greater than the content size.
	ErrRangeNoOverlap = errors.New("chonker: ranges failed to overlap")

	// ErrInvalidRange is returned by ParseRange on invalid input.
	ErrInvalidRange = errors.New("chonker: invalid range")

	// ErrUnsatisfiedRange is returned by ParseContentRange if the range is not satisfied.
	ErrUnsatisfiedRange = errors.New("chonker: unsatisfied range")
)
View Source
var ErrRangeUnsupported = errors.New("chonker: server does not support range requests")
View Source
var (
	// StatsForNerds exposes Prometheus metrics for chonker requests.
	// Metric names are prefixed with "chonker_".
	// Metrics are labeled with and grouped by request host URL.
	//
	// Rhe following metrics are exposed for a request to https://example.com:
	//
	// chonker_http_requests_fetching{host="example.com"}
	// chonker_http_requests_total{host="example.com"}
	// chonker_http_requests_total{host="example.com",range="false"}
	// chonker_http_request_chunks_fetching{host="example.com",stage="do"}
	// chonker_http_request_chunks_fetching{host="example.com",stage="copy"}
	// chonker_http_request_chunks_total{host="example.com"}
	// chonker_http_request_chunk_duration_seconds{host="example.com"}
	// chonker_http_request_chunk_bytes{host="example.com"}
	//
	// You can surface these metrics in your application using the
	// [metrics.RegisterSet] function.
	//
	// [metrics.RegisterSet]: https://pkg.go.dev/github.com/VictoriaMetrics/metrics#RegisterSet
	StatsForNerds = metrics.NewSet()
)

Functions

func Do

func Do(c *http.Client, r *Request) (*http.Response, error)

Do sends an HTTP request and returns an HTTP response, following policy (such as redirects, cookies, auth) as configured on the client. It is a wrapper around http.Client.Do that adds support for ranged requests. A ranged request is a request that is fetched in chunks using several HTTP requests. Chunks are chunkSize bytes long. A maximum of workers chunks are fetched concurrently. HTTP HEAD requests are not fetched in chunks.

Example
req, err := http.NewRequest(http.MethodGet, "http://example.com", nil)
if err != nil {
	panic(err)
}

resp, err := Do(nil, &Request{
	Request:   req,
	chunkSize: 64,
	workers:   8,
})
if err != nil {
	panic(err)
}
defer resp.Body.Close()
Output:

func NewClient

func NewClient(c *http.Client, chunkSize uint64, workers uint) (*http.Client, error)

NewClient returns a new http.Client configured with a http.RoundTripper transport that fetches requests in chunks.

Example
client, err := NewClient(nil, 64, 8)
if err != nil {
	panic(err)
}

// Use the client.
resp, err := client.Get("http://example.com")
if err != nil {
	panic(err)
}
defer resp.Body.Close()
Output:

func NewRoundTripper

func NewRoundTripper(c *http.Client, chunkSize uint64, workers uint) (http.RoundTripper, error)

NewRoundTripper returns a new http.RoundTripper that fetches requests in chunks.

Example
transport, err := NewRoundTripper(nil, 64, 8)
if err != nil {
	panic(err)
}

// Use the transport with a http.Client.
client := &http.Client{Transport: transport}
resp, err := client.Get("http://example.com")
if err != nil {
	panic(err)
}
defer resp.Body.Close()
Output:

Types

type Chunk

type Chunk struct {
	Start  uint64
	Length uint64
}

Chunk represents a byte range.

func Chunks

func Chunks(chunkSize, offset, size uint64) []Chunk

Chunks divides the range [offset, size) into chunks of size chunkSize.

func ParseContentRange

func ParseContentRange(s string) (*Chunk, uint64, error)

ParseContentRange parses a Content-Range header string as per RFC 7233. It returns the chunk describing the returned content range, and the size of the content. ErrUnsatisfiedRange is returned if the range is not satisfied.

func ParseRange

func ParseRange(s string, size uint64) ([]Chunk, error)

ParseRange parses a Range header string as per RFC 7233. ErrNoOverlap is returned if none of the ranges fit inside content size. This function is a copy of the parseRange function from the Go standard library net/http/fs.go with minor modifications.

func (Chunk) ContentRangeHeader added in v1.0.1

func (c Chunk) ContentRangeHeader(size uint64) string

ContentRangeHeader returns a Content-Range header value. Size is the total size of the content. Calling this method on a zero-value Chunk will return an unsatisfied range. For more information on the Content-Range header, see the MDN article on the Content-Range header.

func (Chunk) RangeHeader added in v1.0.1

func (c Chunk) RangeHeader() string

RangeHeader returns a RangeHeader header value. A zero length is treated as a single byte range. For more information on the Range header, see the MDN article on the Range header.

type Request

type Request struct {
	*http.Request
	// contains filtered or unexported fields
}

Request is a ranged http.Request. It is a wrapper around http.Request that adds support for ranged requests. If the server does not support range requests, the request fails. To succeed even if the server does not support range requests, use WithContinueSansRange.

func NewRequest

func NewRequest(
	method, url string,
	body io.Reader,
	chunkSize uint64,
	workers uint,
) (*Request, error)

NewRequest returns a new Request. See NewRequestWithContext for more information.

Example
req, err := NewRequest(http.MethodGet, "http://example.com", nil, 64, 8)
if err != nil {
	panic(err)
}
fmt.Println(req.URL)
Output:

http://example.com

func NewRequestWithContext

func NewRequestWithContext(
	ctx context.Context,
	method, url string,
	body io.Reader,
	chunkSize uint64, workers uint,
) (*Request, error)

NewRequestWithContext returns a new Request. It is a wrapper around http.NewRequestWithContext that adds support for ranged requests. A ranged request is a request that is fetched in chunks using several HTTP requests. Chunks are chunkSize bytes long. A maximum of workers chunks are fetched concurrently.

func (*Request) WithContinueSansRange added in v1.1.11

func (r *Request) WithContinueSansRange() *Request

WithContinueSansRange configures r to use ranged sub-requests opportunistically. If the server does not support range requests, the request succeeds anyway.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL