Documentation ¶
Overview ¶
Package crawler implements an internal website crawler. In other words, it limits itself to a single domain.
The crawler will emit page.Pages, which include URIs to themselves and their connected resources. Resources may be assets or links to other pages.
It doesn't store anything beyond the set of visited pages.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct { Site *url.URL Pages chan *page.Page Errors chan error Done chan bool // contains filtered or unexported fields }
Crawler contains the site URL it is running on and the set of visited pages. To obtain results, the caller should `select` over Crawler.Pages, Crawler.Errors, and Crawler.Done. When Crawler.Done is emitted, it has finished all work and closed all channels.
func New ¶
New will create a new Crawler which works on a given domain. Domains may be given with the scheme, http://www.website.com, or not, www.website.com.