wayback

package
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2023 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

View Source
const CRAWL_STORAGE = "https://web.archive.org/web"
View Source
const INDEX_SERVER = "https://web.archive.org/cdx/search/cdx"

Variables

This section is empty.

Functions

This section is empty.

Types

type Wayback

type Wayback struct {
	MaxTimeout int // Request timeout
	MaxRetries int // Max number of request retries if timeouted
}

func New

func New(timeout, retries int) (*Wayback, error)

func (*Wayback) FetchPages

func (wb *Wayback) FetchPages(config common.RequestConfig, results chan []*common.CdxResponse, errors chan error)

FetchPages ... Concurrent way to GetPages. Makes request to WebArchive CDX API and return observations in a channel.

func (*Wayback) GetFile

func (wb *Wayback) GetFile(page *common.CdxResponse) ([]byte, error)

Download file from WebArchive using a link from CDX response

func (*Wayback) GetNumPages

func (wb *Wayback) GetNumPages(url string) (int, error)

Return the number of pages located in WebArchive for given url

func (*Wayback) GetPages

func (wb *Wayback) GetPages(config common.RequestConfig) ([]*common.CdxResponse, error)

GetPages ... Makes request to WebArchive CDX API to gather all url observations

func (Wayback) Name

func (Wayback) Name() string

func (*Wayback) ParseResponse

func (wb *Wayback) ParseResponse(resp []byte) ([]*common.CdxResponse, error)

Parse response from https://web.archive.org/cdx/search/cdx CDX server

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL