talpa

package
v0.0.0-...-8b9afd7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2017 License: GPL-3.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultErrBack = func(res *gen.Response) {
	Logger.Errorln(res.Error)
}

默认的请求出错处理函数

View Source
var Logger = logrus.New()

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

提供核心的爬虫工作分发机制, 只能运行一次

func NewCrawler

func NewCrawler(spiders []Spider, rs RequestScheduler, d Downloader, is JobScheduler, s Scraper) *Crawler

NewCrawler 初始化实例, limit 为并发限制, 为 0 表示不限制

func (*Crawler) Closed

func (c *Crawler) Closed() bool

func (*Crawler) Start

func (c *Crawler) Start()

启动一个工作队列, 如果后台工作未完成将会启动失败并返回错误

func (*Crawler) Stop

func (c *Crawler) Stop()

强制停止工作, 即使任务正在运行

func (*Crawler) Wait

func (c *Crawler) Wait()

等待工作完成

type Downloader

type Downloader interface {
	Open()
	Close()
	Fetch(req *gen.Request, h Helper)
	NumWaitingJobs() int
	NumWorkers() int
}

func NewDownloader

func NewDownloader(limit int) Downloader

type Helper

type Helper interface {
	PutRequest(reqs ...*gen.Request)
	PutJob(jobs ...func())
}

提供给响应回调的参数, 用于将新的请求或者需要处理的内容入队

type JobScheduler

type JobScheduler interface {
	Put(job ...func())
	Get(number int64) []func()
	// contains filtered or unexported methods
}

func NewJobScheduler

func NewJobScheduler(hint int64) JobScheduler

type RequestScheduler

type RequestScheduler interface {
	Put(reqs ...*gen.Request)
	Get(number int64) []*gen.Request
	// contains filtered or unexported methods
}

func NewRequestScheduler

func NewRequestScheduler(hint int64) RequestScheduler

type Scraper

type Scraper interface {
	Open()
	Close()
	Send(job func())
	NumWaitingJobs() int
	NumWorkers() int
}

func NewScraper

func NewScraper(limit int) Scraper

type Spider

type Spider interface {
	// 生成初始的请求
	// 所有生成的请求都需要在Context设置一个"CallBack"用于对响应的解析
	// 可选的, 可以设置一个"ErrBack"用于处理发送请求时可能产生的错误
	StartRequests() []*gen.Request
}

用于定义爬虫的接口类型

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL