scrape: github.com/vedhavyas/scrape Index | Files

package scrape

import "github.com/vedhavyas/scrape"

Index

Package Files

gru.go minion.go processors.go scrape.go sitemap.go utils.go

func Sitemap Uses

func Sitemap(resp *Response, file string) error

Sitemap generates a sitemap from the given response

type Response Uses

type Response struct {
    BaseURL      *url.URL            // starting url at maxDepth 0
    UniqueURLs   map[string]int      // UniqueURLs holds the map of unique urls we crawled and times its repeated
    URLsPerDepth map[int][]*url.URL  // URLsPerDepth holds url found in each depth
    SkippedURLs  map[string][]string // SkippedURLs holds urls from different domains(if domainRegex is given) and invalid URLs
    ErrorURLs    map[string]error    // errorURLs holds details as to why reason this url was not crawled
    DomainRegex  *regexp.Regexp      // restricts crawling the urls to given domain
    MaxDepth     int                 // MaxDepth of crawl, -1 means no limit for maxDepth
    Interrupted  bool                // says if gru was interrupted while scraping
}

Response holds the scrapped response

func Start Uses

func Start(ctx context.Context, url string) (resp *Response, err error)

Start will start the scrapping with no depth limit(-1) and base url domain

func StartWithDepth Uses

func StartWithDepth(ctx context.Context, url string, maxDepth int) (resp *Response, err error)

StartWithDepth will start the scrapping with given max depth and base url domain

func StartWithDepthAndDomainRegex Uses

func StartWithDepthAndDomainRegex(ctx context.Context, url string, maxDepth int, domainRegex string) (resp *Response, err error)

StartWithDepthAndDomainRegex will start the scrapping with max depth and regex

func StartWithDomainRegex Uses

func StartWithDomainRegex(ctx context.Context, url, domainRegex string) (resp *Response, err error)

StartWithDomainRegex will start the scrapping with no depth limit(-1) and regex

func (Response) String Uses

func (r Response) String() string

String returns a human readable format of the response

Package scrape imports 15 packages (graph). Updated 2017-10-28. Refresh now. Tools for package owners.