Documentation ¶
Index ¶
- Constants
- Variables
- type Context
- func (c *Context) Attr(expr string) string
- func (c *Context) Build(input q.InputFunc, processors ...q.ProcessorFunc) *QueryValue
- func (c *Context) Each(expr string, fn func(int, *Context))
- func (c *Context) Find(expr string) string
- func (c *Context) FindAll(expr string) []string
- func (c *Context) Or(values ...*QueryValue) *QueryValue
- type Domain
- type QueryValue
- type Scraper
Constants ¶
View Source
const DefaultUserAgent = "Flexiscraper (https://github.com/harrisbaird/flexiscraper)"
DefaultUserAgent is the default user agent string. It's used in all http requests and during robots.txt validation.
Variables ¶
View Source
var ErrDisallowedByRobots = errors.New("HTTP request disallowed by robots.txt")
ErrDisallowedByRobots is returned when the requested URL is disallowed by robots.txt.
View Source
var ErrNoMatches = errors.New("No matching queries")
Functions ¶
This section is empty.
Types ¶
type Context ¶
func (*Context) Build ¶
func (c *Context) Build(input q.InputFunc, processors ...q.ProcessorFunc) *QueryValue
func (*Context) Each ¶
Each finds nodes matching an xpath expression and calls the given function for each node.
func (*Context) Or ¶
func (c *Context) Or(values ...*QueryValue) *QueryValue
type Domain ¶
Domain defines implementation for scraping a single domain.
func (*Domain) Fetch ¶
Fetch and parse html from the given URL, checks and obeys robots.txt if ObeyRobots is true in the scraper.
type QueryValue ¶
func (*QueryValue) Int ¶
func (qv *QueryValue) Int() int
func (*QueryValue) IntSlice ¶
func (qv *QueryValue) IntSlice() (s []int)
func (*QueryValue) String ¶
func (qv *QueryValue) String() string
func (*QueryValue) StringSlice ¶
func (qv *QueryValue) StringSlice() []string
type Scraper ¶
type Scraper struct { // The user agent string sent during http requests and when checking // robots.txt. UserAgent string // The http client to use when fetching, defaults to http.DefaultClient. HTTPClient *http.Client // ObeyRobots enables robot.txt policy checking. // Default: true ObeyRobots bool }
A Scraper defines the parameters for running a web scraper.
Click to show internal directories.
Click to hide internal directories.