Documentation ¶
Overview ¶
Package fetcher @description implements a crawler fetcher
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type BrowserFetcher ¶
BrowserFetcher is a fetcher which simulates browser
type Context ¶ added in v0.1.0
Context is the crawling context
func (*Context) Output ¶ added in v0.1.1
func (c *Context) Output(data interface{}) *collector.OutputData
func (*Context) OutputJS ¶ added in v0.1.0
func (c *Context) OutputJS(reg string) ParseResult
OutputJS 用于 JS 代码中解析正则表达式,获取爬取结果
func (*Context) OutputStruct ¶ added in v0.1.3
func (c *Context) OutputStruct(dataStruct collector.DataStruct) *collector.OutputData
func (*Context) ParseJSReg ¶ added in v0.1.0
func (c *Context) ParseJSReg(ruleName string, reg string) ParseResult
ParseJSReg 用于 JS 代码中解析正则表达式,获取请求任务列表
type Fetcher ¶
type Fetcher interface { // Get Fetch the html content according to url Get(url *Request) ([]byte, error) }
Fetcher defines the crawler engine behaviors
type ParseResult ¶
type ParseResult struct { Requests []*Request Items []interface{} }
ParseResult defines the result after parsing crawled response
type Property ¶ added in v0.1.0
type Property struct { // The unique signature of the Task Name string `json:"name"` Url string `json:"url"` Cookie string `json:"cookie"` WaitTime time.Duration `json:"wait_time"` // Mark whether the site can be crawled repeated Reload bool `json:"reload"` MaxDepth int64 `json:"max_depth"` // Headers needs to be added to http headers Headers map[string]string `json:"headers"` }
type RedirectFetcher ¶ added in v0.1.3
RedirectFetcher is a fetcher that deals with redirected links
type Request ¶
type Request struct { Task *Task Url string Method string Depth int64 Priority int64 RuleName string TempData *Temp // contains filtered or unexported fields }
Request represents a single crawler request
func (Request) UniqueSign ¶ added in v0.0.9
UniqueSign builds the unique sign for each request
type Rule ¶ added in v0.1.0
type Rule struct { ItemFields []string ParseFunc func(*Context) (ParseResult, error) }
Rule represents the rule corresponding to the request
type Task ¶ added in v0.0.9
type Task struct { Property Visited map[string]bool VisitedLock sync.Mutex RootReq *Request Fetcher Fetcher Rule RuleTree Logger *log.Logger Storage collector.Store Limiter limiter.MultiLimiter }
Task represents a complete crawl task
Click to show internal directories.
Click to hide internal directories.