crawler

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2020 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package crawler provides a website crawler.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	URL         *url.URL
	Concurrency int
	Allow404    bool
	HTTPClient  *http.Client
	// contains filtered or unexported fields
}

A Crawler is in charge of visiting or "crawling" all pages and assets of a particular URL.

func (*Crawler) Queue

func (c *Crawler) Queue(u *url.URL)

Queue a given URL. This method is non-blocking.

func (*Crawler) Resources

func (c *Crawler) Resources() <-chan Resource

Resources returns a channel of resources visited by the crawler.

func (*Crawler) Run

func (c *Crawler) Run(ctx context.Context) error

Run starts the crawling process and waits for completion.

func (*Crawler) Start

func (c *Crawler) Start(ctx context.Context) error

Start crawling workers asynchronously. Use Wait() to block until completion.

func (*Crawler) Wait

func (c *Crawler) Wait() error

Wait for all pending targets to be crawled.

type Resource

type Resource struct {
	Target
	StatusCode int
	Duration   time.Duration
	Body       io.ReadCloser
	Error      error
}

A Resource is representation of the response to a Target request, for a particular page or asset.

type Target

type Target struct {
	Parent *url.URL
	URL    *url.URL
}

A Target is a target URL to crawl, with optional Parent page URL.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL