Documentation ¶
Index ¶
- Constants
- type Wayback
- func (wb *Wayback) FetchPages(config common.RequestConfig, results chan []*common.CdxResponse, ...)
- func (wb *Wayback) GetFile(page *common.CdxResponse) ([]byte, error)
- func (wb *Wayback) GetNumPages(url string) (int, error)
- func (wb *Wayback) GetPages(config common.RequestConfig) ([]*common.CdxResponse, error)
- func (Wayback) Name() string
- func (wb *Wayback) ParseResponse(resp []byte) ([]*common.CdxResponse, error)
Constants ¶
View Source
const CRAWL_STORAGE = "https://web.archive.org/web"
View Source
const INDEX_SERVER = "https://web.archive.org/cdx/search/cdx"
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Wayback ¶
type Wayback struct { MaxTimeout int // Request timeout MaxRetries int // Max number of request retries if timeouted }
func (*Wayback) FetchPages ¶
func (wb *Wayback) FetchPages(config common.RequestConfig, results chan []*common.CdxResponse, errors chan error)
FetchPages ... Concurrent way to GetPages. Makes request to WebArchive CDX API and return observations in a channel.
func (*Wayback) GetFile ¶
func (wb *Wayback) GetFile(page *common.CdxResponse) ([]byte, error)
Download file from WebArchive using a link from CDX response
func (*Wayback) GetNumPages ¶
Return the number of pages located in WebArchive for given url
func (*Wayback) GetPages ¶
func (wb *Wayback) GetPages(config common.RequestConfig) ([]*common.CdxResponse, error)
GetPages ... Makes request to WebArchive CDX API to gather all url observations
func (*Wayback) ParseResponse ¶
func (wb *Wayback) ParseResponse(resp []byte) ([]*common.CdxResponse, error)
Parse response from https://web.archive.org/cdx/search/cdx CDX server
Click to show internal directories.
Click to hide internal directories.