crawler

package
v0.0.0-...-7a5efd7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 2, 2014 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package crawler implements an internal website crawler. In other words, it limits itself to a single domain.

The crawler will emit page.Pages, which include URIs to themselves and their connected resources. Resources may be assets or links to other pages.

It doesn't store anything beyond the set of visited pages.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	Site   *url.URL
	Pages  chan *page.Page
	Errors chan error
	Done   chan bool
	// contains filtered or unexported fields
}

Crawler contains the site URL it is running on and the set of visited pages. To obtain results, the caller should `select` over Crawler.Pages, Crawler.Errors, and Crawler.Done. When Crawler.Done is emitted, it has finished all work and closed all channels.

func New

func New(siteURL string) (*Crawler, error)

New will create a new Crawler which works on a given domain. Domains may be given with the scheme, http://www.website.com, or not, www.website.com.

func (*Crawler) Start

func (c *Crawler) Start(workers int)

Start begins the web crawling process. Initializes data structures and starts download workers.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL