Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type IFSiteCfg ¶
type IFSiteCfg interface { GetStartUrls() []string //起始页面 GetDefaultFileName() string //站点的默认索引文件名,ex: index.html GetHostList() []string //爬取的Host列表 CheckHost(host string) bool //检查一个host是否在爬取的Host列表内 ForEachSearchNodes(param interface{}, cbFunc func(nodeName string, attrName string, attrType string, param interface{})) }
type MyPageProcesser ¶
type MyPageProcesser struct {
// contains filtered or unexported fields
}
func NewMyPageProcesser ¶
func NewMyPageProcesser(configerIn interface{}) *MyPageProcesser
func (*MyPageProcesser) Finish ¶
func (this *MyPageProcesser) Finish()
func (*MyPageProcesser) Process ¶
func (this *MyPageProcesser) Process(p *page.Page)
Parse html dom here and record the parse result that we want to Page. Package goquery (http://godoc.org/github50.com/PuerkitoBio/goquery) is used to parse html.
type MyPipeline ¶
type MyPipeline struct { }
func (*MyPipeline) Process ¶
func (this *MyPipeline) Process(items *page_items.PageItems, t com_interfaces.Task)
Click to show internal directories.
Click to hide internal directories.