Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var DefaultErrBack = func(res *gen.Response) { Logger.Errorln(res.Error) }
默认的请求出错处理函数
View Source
var Logger = logrus.New()
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
提供核心的爬虫工作分发机制, 只能运行一次
func NewCrawler ¶
func NewCrawler(spiders []Spider, rs RequestScheduler, d Downloader, is JobScheduler, s Scraper) *Crawler
NewCrawler 初始化实例, limit 为并发限制, 为 0 表示不限制
type Downloader ¶
type Downloader interface { Open() Close() Fetch(req *gen.Request, h Helper) NumWaitingJobs() int NumWorkers() int }
func NewDownloader ¶
func NewDownloader(limit int) Downloader
type JobScheduler ¶
type JobScheduler interface { Put(job ...func()) Get(number int64) []func() // contains filtered or unexported methods }
func NewJobScheduler ¶
func NewJobScheduler(hint int64) JobScheduler
type RequestScheduler ¶
type RequestScheduler interface { Put(reqs ...*gen.Request) Get(number int64) []*gen.Request // contains filtered or unexported methods }
func NewRequestScheduler ¶
func NewRequestScheduler(hint int64) RequestScheduler
type Scraper ¶
func NewScraper ¶
Click to show internal directories.
Click to hide internal directories.