Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ErrQuit = errors.New("quit requested")
Functions ¶
func CompressSpace ¶
CompressSpace reduces all whitespace sequences (space, tabs, newlines etc) in a string to a single space. Leading/trailing space is trimmed. Has the effect of converting multiline strings to one line.
func GetAttr ¶
GetAttr retrieved the value of an attribute on a node. Returns empty string if attribute doesn't exist.
func GetTextContent ¶
GetTextContent recursively fetches the text for a node
Types ¶
type DiscoverStats ¶
type Discoverer ¶
type Discoverer struct { Name string StartURL url.URL ArtPats []*regexp.Regexp XArtPats []*regexp.Regexp CruftSel cascadia.Selector BaseErrorThreshold int StripFragments bool StripQuery bool HostPat *regexp.Regexp UserAgent string ErrorLog Logger InfoLog Logger Stats DiscoverStats }
func NewDiscoverer ¶
func NewDiscoverer(cfg DiscovererDef) (*Discoverer, error)
func (*Discoverer) CookArticleURL ¶
type DiscovererDef ¶
type DiscovererDef struct { Name string URL string // article urls to include - regexes ArtPat []string // article urls to exclude - regexes XArtPat []string // article url forms to include (eg "/YYYY/MM/SLUG.html") ArtForm []string // article url forms to exclude XArtForm []string NavSel string XNavPat []string // css selector for elements to cull during article discovery CruftSel string // BaseErrorThreshold is starting number of http errors to accept before // bailing out. default is 5 (and 0 is considered as unset, so default is applied) // error threshold formula: base + 10% of successful request count BaseErrorThreshold int // Hostpat is a regex matching accepted domains // if empty, reject everything on a different domain HostPat string // If NoStripQuery is set then article URLs won't have the query part zapped NoStripQuery bool // UserAgent string to use in HTTP requests UserAgent string }
type LinkSet ¶
thin map wrapper for some set operations
type NullLogger ¶
type NullLogger struct{}
func (NullLogger) Printf ¶
func (l NullLogger) Printf(format string, v ...interface{})
Click to show internal directories.
Click to hide internal directories.