fisher

package module
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2023 License: BSD-3-Clause Imports: 18 Imported by: 0

README

fisher

Crawl utils.

Documentation

Index

Constants

View Source
const COMPLETE_OPT_SIZE = 128
View Source
const PARA_OPT_SIZE = 16

Variables

This section is empty.

Functions

func ExtractJsonListFromFile added in v0.1.7

func ExtractJsonListFromFile(srcFile string) ([]map[string]interface{}, error)
func GetFullLink(ctx context.Context, itemSel string) (hrefUrl string, err error)

使用chromedp.Run方法获取完整链接,目标项之中必须本身包含href属性。

func GetProxyAddress added in v0.1.9

func GetProxyAddress(ctx context.Context, url string) (rtn string, err error)

获取单个SOCKS5代理地址

func GetProxyAddressArray added in v0.1.9

func GetProxyAddressArray(ctx context.Context, url string) (rtn []string, err error)

获取一组SOCKS5代理地址

func GetProxyAddressString added in v0.1.9

func GetProxyAddressString(obj ProxyInfo) (result string)

func GetProxyArrayWithConvertFunc added in v0.1.9

func GetProxyArrayWithConvertFunc(ctx context.Context, url string, convertFunc ConvertFunc) (rtn []string, err error)

获取一组SOCKS5代理地址

func GetSocks5Proxy

func GetSocks5Proxy(ctx context.Context, url string) (rtn string, err error)

获取单个SOCKS5代理地址

func GetSocks5ProxyArray added in v0.1.8

func GetSocks5ProxyArray(ctx context.Context, url string) (rtn []string, err error)

获取一组SOCKS5代理地址

func GetSocks5ProxyUrl added in v0.1.8

func GetSocks5ProxyUrl(obj ProxyInfo) (result string)

func GetTbodyDom added in v0.1.3

func GetTbodyDom(content string) (*goquery.Document, error)

如果是tbody类型的元素,则必须进行此替换,goquery才能正常读取解析,且实际读取时,依然保持tbody标识

func GetTheadDom added in v0.1.6

func GetTheadDom(content string) (*goquery.Document, error)

如果是thead类型的元素,则必须进行此替换,goquery才能正常读取解析,且实际读取时,需要调整为以tbody进行标识

func NewChromedpContext added in v0.1.0

func NewChromedpContext(ctx context.Context) (context.Context, context.CancelFunc)

func RecordData

func RecordData(data interface{}) (err error)

写入已经json序列化好的字符数组,字符串或者是可以进行json序列化的对象

func RunWithCrawler

func RunWithCrawler(ctx context.Context, crawler Crawler)

func SetRecordWriter added in v0.2.0

func SetRecordWriter(writer ioutils.Writer)

func StartRecoder added in v0.0.8

func StartRecoder(ctx context.Context, writer ioutils.Writer)

开启存储数据的服务,便于实现全局写数据

func ToCrawl

func ToCrawl(action CrawlAction)

Types

type ConvertFunc added in v0.1.9

type ConvertFunc func(obj ProxyInfo) (result string)

type CrawlAction

type CrawlAction = func(proxyReqUrl string, isHeadless bool, customMap map[string]string, outputFile string)

func GetAction

func GetAction(crawler Crawler) CrawlAction

type Crawler

type Crawler interface {
	Crawl(ctx context.Context) error
	ParseArgs(argMap map[string]string) error
}

type ProxyInfo

type ProxyInfo struct {
	IP   string `json:"ip"`
	Port int    `json:"port"`
}

先对 json 格式进行 struct 结构定义

func GetProxyInfo

func GetProxyInfo(ctx context.Context, url string) (ret []ProxyInfo, err error)

获取一组代理地址信息

type ProxyResponse

type ProxyResponse struct {
	Code    int         `json:"code"`
	Data    []ProxyInfo `json:"data"`
	Msg     string      `json:"msg"`
	Success bool        `json:"success"`
}

type Root added in v0.1.0

type Root struct {
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL