fetch

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2023 License: AGPL-3.0 Imports: 28 Imported by: 4

Documentation

Overview

Package fetch the http resource

Index

Constants

View Source
const (
	// DefaultMaxBodySize fetch.Response default max body size
	DefaultMaxBodySize int64 = 1024 * 1024 * 1024
	// DefaultRetryTimes fetch.RequestConfig retry times
	DefaultRetryTimes = 3
	// DefaultTimeout fetch.RequestConfig timeout
	DefaultTimeout = time.Minute
)

Variables

View Source
var (
	// DefaultRetryHTTPCodes retry fetch.RequestConfig error status code
	DefaultRetryHTTPCodes = []int{http.StatusInternalServerError, http.StatusBadGateway, http.StatusServiceUnavailable,
		http.StatusGatewayTimeout, http.StatusRequestTimeout}
	// DefaultHeaders defaults fetch.RequestConfig headers
	DefaultHeaders = map[string]string{
		"Accept":          "*/*",
		"Accept-Encoding": "gzip, deflate, br",
		"Accept-Language": "en-US,en;",
		"User-Agent":      "cloudcat",
	}
)
View Source
var ErrNoDateHeader = errors.New("no Date header")

ErrNoDateHeader indicates that the HTTP headers contained no Date header.

Functions

func CachedResponse

func CachedResponse(c cloudcat.Cache, req *http.Request) (resp *http.Response, err error)

CachedResponse returns the cached http.Response for req if present, and nil otherwise.

func Date

func Date(respHeaders http.Header) (date time.Time, err error)

Date parses and returns the value of the Date header.

func DefaultRoundTripper

func DefaultRoundTripper() http.RoundTripper

DefaultRoundTripper the fetch default RoundTripper

func DefaultTemplateFuncMap

func DefaultTemplateFuncMap() template.FuncMap

DefaultTemplateFuncMap The default template function map

func DoByte

func DoByte(fetch cloudcat.Fetch, req *http.Request) ([]byte, error)

DoByte do request and read response body.

func DoString

func DoString(fetch cloudcat.Fetch, req *http.Request) (string, error)

DoString do request and read response body as string.

func NewFetcher

func NewFetcher(opt Options) cloudcat.Fetch

NewFetcher returns a new Fetch instance

func NewRequest

func NewRequest(method, u string, body any, headers map[string]string) (*http.Request, error)

NewRequest returns a new RequestConfig given a method, URL, optional body, optional headers.

func NewTemplateRequest

func NewTemplateRequest(funcs template.FuncMap, tpl string, arg any) (*http.Request, error)

NewTemplateRequest returns a new RequestConfig given a http template with argument.

func RoundRobinProxy

func RoundRobinProxy(req *http.Request) (*url.URL, error)

RoundRobinProxy returns a proxy URL on specific request.

func WithRequestConfig

func WithRequestConfig(req *http.Request, c RequestConfig) *http.Request

WithRequestConfig returns a shallow copy of req with its context changed to ctx with RequestConfig.

func WithRequestProxy

func WithRequestProxy(req *http.Request, proxy ...string) *http.Request

WithRequestProxy returns a shallow copy of req with its context changed to ctx with proxy.

Types

type CacheTransport

type CacheTransport struct {
	Policy Policy
	// The RoundTripper interface actually used to make requests
	// If nil, http.DefaultTransport is used
	Transport http.RoundTripper
	Cache     cloudcat.Cache
	// If true, responses returned from the cache will be given an extra header, X-From-Cache
	MarkCachedResponses bool
}

CacheTransport is an implementation of http.RoundTripper that will return values from a cache where possible (avoiding a network request) and will additionally add validators (etag/if-modified-since) to repeated requests allowing servers to return 304 / Not Modified

func NewTransport

func NewTransport(c cloudcat.Cache) *CacheTransport

NewTransport returns new CacheTransport with the provided Cache implementation and MarkCachedResponses set to true

func (*CacheTransport) Client

func (t *CacheTransport) Client() *http.Client

Client returns an *http.Client that caches responses.

func (*CacheTransport) RoundTrip

func (t *CacheTransport) RoundTrip(req *http.Request) (resp *http.Response, err error)

RoundTrip is a wrapper for caching requests. If there is a fresh Response already in cache, then it will be returned without connecting to the server.

func (*CacheTransport) RoundTripDummy

func (t *CacheTransport) RoundTripDummy(req *http.Request) (resp *http.Response, err error)

RoundTripDummy has no awareness of any HTTP Cache-Control directives. Every request and its corresponding response are cached. When the same request is seen again, the response is returned without transferring anything from the Internet.

func (*CacheTransport) RoundTripRFC2616

func (t *CacheTransport) RoundTripRFC2616(req *http.Request) (resp *http.Response, err error)

RoundTripRFC2616 provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness, aimed at production and used in continuous runs to avoid downloading unmodified data (to save bandwidth and speed up crawls).

If there is a stale Response, then any validators it contains will be set on the new request to give the server a chance to respond with NotModified. If this happens, then the cached Response will be returned.

func (*CacheTransport) SetProxy

func (t *CacheTransport) SetProxy(proxy func(*http.Request) (*url.URL, error))

SetProxy specifies a function to return a proxy for a given RequestConfig.

type Options

type Options struct {
	CharsetDetectDisabled bool              `yaml:"charset-detect-disabled"`
	MaxBodySize           int64             `yaml:"max-body-size"`
	RetryTimes            int               `yaml:"retry-times"`
	RetryHTTPCodes        []int             `yaml:"retry-http-codes"`
	Timeout               time.Duration     `yaml:"timeout"`
	CachePolicy           Policy            `yaml:"cache-policy"`
	RoundTripper          http.RoundTripper `yaml:"-"`
}

Options The Fetch instance options

type Policy

type Policy string

Policy has no awareness of any HTTP Cache-Control directives.

const (

	// Dummy policy is useful for testing spiders faster (without having to wait for downloads every time)
	// and for trying your spider offline, when an Internet connection is not available.
	// The goal is to be able to “replay” a spider run exactly as it ran before.
	Dummy Policy = "dummy"

	// RFC2616 This policy provides a RFC2616 compliant HTTP cache, i.e. with HTTP Cache-Control awareness,
	// aimed at production and used in continuous runs to avoid downloading unmodified data
	// (to save bandwidth and speed up crawls).
	RFC2616 Policy = "rfc2616"
)

type RequestConfig

type RequestConfig struct {
	// Proxy on this RequestConfig
	Proxy []string

	// Optional response body encoding. Leave empty for automatic detection.
	// If you're having issues with auto-detection, set this.
	Encoding string
	// contains filtered or unexported fields
}

RequestConfig the *http.Request config

func GetRequestConfig

func GetRequestConfig(req *http.Request) RequestConfig

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL