spiderIns

package
v0.0.0-...-da0e060 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 13, 2017 License: Apache-2.0 Imports: 17 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Run

func Run()

Types

type IFSiteCfg

type IFSiteCfg interface {
	GetStartUrls() []string     //起始页面
	GetDefaultFileName() string //站点的默认索引文件名,ex: index.html
	GetHostList() []string      //爬取的Host列表
	CheckHost(host string) bool //检查一个host是否在爬取的Host列表内
	ForEachSearchNodes(param interface{}, cbFunc func(nodeName string, attrName string, attrType string, param interface{}))
}

type MyPageProcesser

type MyPageProcesser struct {
	// contains filtered or unexported fields
}

func NewMyPageProcesser

func NewMyPageProcesser(configerIn interface{}) *MyPageProcesser

func (*MyPageProcesser) Finish

func (this *MyPageProcesser) Finish()

func (*MyPageProcesser) Process

func (this *MyPageProcesser) Process(p *page.Page)

Parse html dom here and record the parse result that we want to Page. Package goquery (http://godoc.org/github50.com/PuerkitoBio/goquery) is used to parse html.

type MyPipeline

type MyPipeline struct {
}

func (*MyPipeline) Process

func (this *MyPipeline) Process(items *page_items.PageItems, t com_interfaces.Task)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL