sitemap

package module
v0.0.0-...-3607cf1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2017 License: MIT Imports: 8 Imported by: 0

README

Build Status

sitemap

sitemap is a library that can crawl a given URL and produce a map of sites within that domain and subdomains. It will honour rel="nofollow" attributes on links.

sitemap will attempt to gather information on links, scripts, images, videos, and audio.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Site

type Site struct {
	URL           url.URL
	Links         []url.URL
	NoFollowLinks []url.URL
	Scripts       []url.URL
	Images        []url.URL
	Videos        []url.URL
	Audio         []url.URL
	CSS           []url.URL
}

Site contains information about a site, including its URL, and the URLs of other sites it links to, scripts, and images.

func (*Site) Crawl

func (s *Site) Crawl()

Crawl populates the resources of this Site instance by loading the associated link and processing the page. After processing the document, links are pushed into a channel for processing into the sitemap.

func (*Site) MarshalJSON

func (s *Site) MarshalJSON() ([]byte, error)

MarshalJSON is a helper method which makes Site instances more JSON friendly.

type SiteMap

type SiteMap struct {
	Sites map[string]*Site
	// contains filtered or unexported fields
}

SiteMap is a container for a set of URLs for assets used by the site, and a Mutex to allow safe access over go routines.

func NewSiteMap

func NewSiteMap(origin *url.URL) *SiteMap

NewSiteMap is a constructor function used to return a SiteMap instance with an initialized sites map.

func (*SiteMap) Crawl

func (m *SiteMap) Crawl(workers int, done chan<- bool)

Crawl begins the crawl. It hides the use of a WaitGroup behind a synchronization channel.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL