Documentation ¶
Overview ¶
Package crawler defines all the functionality for page crawling
Index ¶
- type CrawlResult
- type Crawler
- func (c *Crawler) Crawl() error
- func (c *Crawler) CrawlPage(url string) ([]string, CrawlResult, error)
- func (c *Crawler) FormatRelative(urls map[string]int) (formatedUrls []string)
- func (c *Crawler) GetLinks(doc *goquery.Document) []string
- func (c *Crawler) GetRequest(url string) (*goquery.Document, error)
- func (c *Crawler) GetResult(doc *goquery.Document, url string) CrawlResult
- func (c *Crawler) ParseBase() error
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CrawlResult ¶
CrawlResult defines the result of crawled single page.
type Crawler ¶
type Crawler struct { ID string `json:"ID"` BaseURL string `json:"BaseURL"` StartURL string `json:"StartURL"` PagesLimit int `json:"PagesLimit"` Results []CrawlResult `json:"Results"` }
Crawler defines a default crawler
func (*Crawler) Crawl ¶
Crawl crawls the whole host of give startURL and saves data(URLs and Titles) to Crawler struct.
func (*Crawler) CrawlPage ¶
func (c *Crawler) CrawlPage(url string) ([]string, CrawlResult, error)
CrawlPage crawls single page, returns links as []string, CrawlResult(Page URL and Title) and error.
func (*Crawler) FormatRelative ¶
FormatRelative formats relative links to an absolute links if encounter them during crawling.
func (*Crawler) GetRequest ¶
GetRequest is a helper function for CrawlPage. It makes a request to a page and returns goquery.Document and error.
Click to show internal directories.
Click to hide internal directories.