crawl

package
v0.0.0-...-2c0fe92 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2016 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AbsURL

func AbsURL(href, parent string) (string, error)

Transforms a URL to an absolute URL given its parent. If the URL is already an absolute URL (which could be in a different domain) it is returned as is.

func GetDomain

func GetDomain(href string) (string, error)

Given a URL it returns its domain.

Types

type Link struct {
	// contains filtered or unexported fields
}
func ExtractLinks(url string, body io.Reader) []*Link

Extracts and returns a list of absolute URLs (links and assets) from an HTML document. Accepts a reader as it is returned from the HTTP client.

type Sitemap

type Sitemap struct {
	// contains filtered or unexported fields
}

func GetSitemap

func GetSitemap(url string) (*Sitemap, error)

Sitemap crawls a start URL for all links and assets and builds a sitemap with pages and assets per crawled link. Links are restricted to the same domain but assets are not since they are likely to be served by a CDN.

func NewSitemap

func NewSitemap() *Sitemap

Constructs a new Sitemap.

func (*Sitemap) AddEntry

func (s *Sitemap) AddEntry(url, parentUrl string, isAsset bool)

Adds a `Link` to the sitemap in a thread-safe manner.

func (*Sitemap) PrettyPrint

func (s *Sitemap) PrettyPrint()

A convenience method to pretty-print a sitemap.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL