spider

package
v0.0.0-...-fd45b34 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2022 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

View Source
const (
	StateCrawling = SpiderState(0)
	StateStopped  = SpiderState(1)
)

Variables

This section is empty.

Functions

This section is empty.

Types

type CrawledSite

type CrawledSite struct {
	*Site
	CrawledAt time.Time
	Response  *http.Response
}

CrawledSite respresents a crawled site.

type CrawledSiteHandler

type CrawledSiteHandler func(site CrawledSite, spider *Spider)

CrawledSiteHandler is a closure type that defines a function that is called upon a spider crawling a site.

type ShouldCrawlURLHandler

type ShouldCrawlURLHandler func(foundAt, url *url.URL, spider *Spider) bool

ShouldCrawlURLHandler is a closure type that defines a function that is called upon to check whether or not the spider should crawl a url.

type Site

type Site struct {
	URL     string
	FoundAt *string
}

Site represents a found site.

type Spider

type Spider struct {
	OnCrawl     CrawledSiteHandler
	Logger      *log.Logger
	WorkerCount uint
	SendDelay   *time.Duration
	CrawlDelay  *time.Duration
	ShouldCrawl ShouldCrawlURLHandler
	// contains filtered or unexported fields
}

Spider defines an instance of a web crawler.

func New

func New(workerCount uint, args ...interface{}) (*Spider, error)

New creates a new spider workerCount defines how many workers for both crawling and handling crawled sites.

func (*Spider) Send

func (s *Spider) Send(urls ...string) (err error)

Send allows you to externally send urls to the spider for handling

func (*Spider) SendSites

func (s *Spider) SendSites(sites ...Site) (err error)

SendSites allows you to externally send urls to the spider for handling

func (*Spider) SendSitesMap

func (s *Spider) SendSitesMap(sites map[string]Site) (err error)

SendSitesMap allows you to externally send urls to the spider for handling

func (*Spider) Start

func (s *Spider) Start() error

Start will start the spider.

func (*Spider) Stop

func (s *Spider) Stop()

Stop safely stops the spider.

type SpiderState

type SpiderState uint

SpiderState represents the current state of the spider.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL