crawler

package module
v0.0.0-...-aeae1c0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 6, 2016 License: MIT Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// DefaultMethod is "GET"
	DefaultMethod = "GET"
	// DefaultTimeout is 30s
	DefaultTimeout = 30 * time.Second
)

Functions

func FileDownload

func FileDownload(fileURL, distPath string) (filePath string, err error)

FileDownload can save a file to dist

Types

type Context

type Context struct {
	Crawler *Crawler

	Request  *http.Request
	Response *http.Response
	Document *goquery.Document
	Param    map[string]interface{}
	// contains filtered or unexported fields
}

Context is a context for request and response

type Crawler

type Crawler struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

Crawler is just a crawler

func New

func New(option Option) *Crawler

New get a new Crawler from option

func (*Crawler) AddDataToStorer

func (c *Crawler) AddDataToStorer(name string, data interface{})

AddDataToStorer add a data to storer

func (*Crawler) AddQueue

func (c *Crawler) AddQueue(queue Queue)

AddQueue add a queue to Crawler

func (*Crawler) AddRule

func (c *Crawler) AddRule(name string, rule Rule)

AddRule add a rule to Crawler

func (*Crawler) Run

func (c *Crawler) Run()

Run is an init function of Crawler

func (*Crawler) Stop

func (c *Crawler) Stop()

Stop is a stop function of Crawler

type Option

type Option struct {
	Name            string
	LogDisable      bool
	AutoStopDisable bool
	PauseTime       []int

	DefaultMethod string
	StorerWork    func(map[string][]interface{}) map[string]bool
	// contains filtered or unexported fields
}

Option is an option of Crawler

type Queue

type Queue struct {
	URL    string
	Method string
	Rule   string
	Param  map[string]interface{}
}

Queue is a request queue for Crawler

type Rule

type Rule struct {
	Timeout       time.Duration
	BeforeRequest func(*http.Request)
	Parse         func(*Context) bool
	// contains filtered or unexported fields
}

Rule is a rule for Crawler

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL