Documentation ¶
Index ¶
- Constants
- type Crawler
- func (this *Crawler) AddBaseTask(task model.Task)
- func (this *Crawler) AddParser(parser model.Parser)
- func (this *Crawler) Run()
- func (this *Crawler) SetPProfPort(port string)
- func (this *Crawler) SetProxyGenerator(generater model.ProxyGenerator)
- func (this *Crawler) SetProxyTimeOut(timeout time.Duration)
- func (this *Crawler) WaitForShutDown()
Constants ¶
View Source
const (
ErrShutDownCrawler string = "Cannot ShutDown the crwaler when "
)
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
func NewDistributedSqlCrawler ¶
func NewDistributedSqlCrawler(db *gorm.DB, config *model.DistributedConfig) *Crawler
used for distributed mode,need zookeeper and a sql database to store the internal data. Make sure you sql database can be accessed by all the server
func NewLocalMemCrawler ¶
local mode, store the internal data into a queue, suitable for simple application.
func NewLocalSqlCrawler ¶
local mode, need a sql database to store the internal data and to spare the memory use.
func (*Crawler) AddBaseTask ¶
only the master crawler excute the Crawler.AddBaseTask So, if you are under the Distributed Mode, you can just change the config.json and make your crawler work distributedly.
func (*Crawler) SetProxyGenerator ¶
func (this *Crawler) SetProxyGenerator(generater model.ProxyGenerator)
func (*Crawler) SetProxyTimeOut ¶
func (*Crawler) WaitForShutDown ¶
func (this *Crawler) WaitForShutDown()
shutdown when task listbecome empty
Directories ¶
Path | Synopsis |
---|---|
example
|
|
douban_movie_top250
the local memory mode crawler
|
the local memory mode crawler |
douban_movie_top250/distributed_sql_mode
the distributed sql mode crawler
|
the distributed sql mode crawler |
douban_movie_top250/local_sql_mode
the local sql mode crawler
|
the local sql mode crawler |
github_stars
the local memory mode crawler
|
the local memory mode crawler |
github_stars/distributed_sql_mode
the distributed sql mode crawler
|
the distributed sql mode crawler |
github_stars/local_sql_mode
the local sql mode crawler
|
the local sql mode crawler |
Click to show internal directories.
Click to hide internal directories.