crawler

package

v0.0.0-...-7a5efd7 Latest Latest Go to latest Published: Jun 2, 2014 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/awfulaxolotl/amerigo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package crawler implements an internal website crawler. In other words, it limits itself to a single domain.

The crawler will emit page.Pages, which include URIs to themselves and their connected resources. Resources may be assets or links to other pages.

It doesn't store anything beyond the set of visited pages.

Index ¶

type Crawler
- func New(siteURL string) (*Crawler, error)
- func (c *Crawler) Start(workers int)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Crawler ¶

type Crawler struct {
	Site   *url.URL
	Pages  chan *page.Page
	Errors chan error
	Done   chan bool
	// contains filtered or unexported fields
}

Crawler contains the site URL it is running on and the set of visited pages. To obtain results, the caller should `select` over Crawler.Pages, Crawler.Errors, and Crawler.Done. When Crawler.Done is emitted, it has finished all work and closed all channels.

func New ¶

func New(siteURL string) (*Crawler, error)

New will create a new Crawler which works on a given domain. Domains may be given with the scheme, http://www.website.com, or not, www.website.com.

func (*Crawler) Start ¶

func (c *Crawler) Start(workers int)

Start begins the web crawling process. Initializes data structures and starts download workers.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL