crawler

package
v0.0.0-...-ed9fe1d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 2, 2018 License: BSD-3-Clause Imports: 6 Imported by: 0

Documentation

Overview

Package crawler provides types and functions to run crawler.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Run

func Run(fetcher Fetcher, scraper Scraper) (interface{}, error)

Run execute the fetcher and pass the content to the scraper.

Types

type ErrHTTP

type ErrHTTP struct {
	Response *http.Response
	Content  []byte
}

ErrHTTP is an error when fetch fails.

func (*ErrHTTP) Error

func (e *ErrHTTP) Error() string

type Fetcher

type Fetcher interface {
	Fetch() (io.Reader, error)
}

Fetcher is an interface to get a raw resource for crawled targed.

type FileFetcher

type FileFetcher struct {
	// contains filtered or unexported fields
}

FileFetcher is an implementation to fetch a content from a local file path.

func NewFileFetcher

func NewFileFetcher(path string) *FileFetcher

NewFileFetcher returns a new *FileFetcher for the given file path.

func (*FileFetcher) Fetch

func (f *FileFetcher) Fetch() (io.Reader, error)

Fetch implements Fetcher#Fetch

type HTMLScraper

type HTMLScraper func(doc *goquery.Document) (interface{}, error)

HTMLScraper is a function wrapper for Scraper interface using goquery. [Deprecation] use github.com/speedland/go/crawler/html.HTMLScraper instead

func (HTMLScraper) Scrape

func (f HTMLScraper) Scrape(r io.Reader) (interface{}, error)

Scrape implements Scraper#Scrape [Deprecation] use github.com/speedland/go/crawler/html.HTMLScraper#Scrape instead

type HTTPFetcher

type HTTPFetcher struct {
	// contains filtered or unexported fields
}

HTTPFetcher is an implementation to fetch a content from an url.

func NewHTTPFetcher

func NewHTTPFetcher(url string, client *http.Client) *HTTPFetcher

NewHTTPFetcher returns a new *HTTPFetcher for the given url with http.Client. `client“ can be nil, then http.DefaultClient is used.

func (*HTTPFetcher) Fetch

func (f *HTTPFetcher) Fetch() (io.Reader, error)

Fetch implements Fetcher#Fetch

type Scraper

type Scraper interface {
	Scrape(r io.Reader) (interface{}, error)
}

Scraper is an interface to scrape a content

type ScraperFunc

type ScraperFunc func(r io.Reader) (interface{}, error)

ScraperFunc is a function wrapper for Scraper interface

func (ScraperFunc) Scrape

func (f ScraperFunc) Scrape(r io.Reader) (interface{}, error)

Scrape implements Scraper#Scrape

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL