scrape

package
v0.0.0-...-15a4ce3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2017 License: Unlicense Imports: 8 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// AssetTypeLink is used for <link> assets
	AssetTypeLink = "link"
	// AssetTypeImage is used for <image> assets
	AssetTypeImage = "image"
	// AssetTypeScript is used for <script> assets
	AssetTypeScript = "script"
)

Variables

View Source
var (
	// ErrURLInvalid is given when the URL provided to the 'Site'
	// method is empty or invalid
	ErrURLInvalid = errors.New("The given URL is invalid")

	// ErrHTTPError is given when the URL provided results in a
	// HTTP error code or could not be reached.
	ErrHTTPError = errors.New("The given URL gave a http error code")

	// ErrParseError is given when a page gave a HTML response that
	// could not be parsed.
	ErrParseError = errors.New("Failed to parse link")
)

Functions

This section is empty.

Types

type Asset

type Asset struct {
	Type string `json:"type"`
	URL  string `json:"url"`
}

Asset represents a reference to a piece of static content. Assets include stylesheets, images and scripts. Assets can be external because they will not be followed by the scraper. They do not represent the content that was served but the reference from the page.

type Page

type Page struct {
	URL    string   `json:"url"`
	Assets []*Asset `json:"assets"`
	Pages  []string `json:"pages"`
}

Page represents a location within a sitemap. Should be indicative of a page within the website.

type Sitemap

type Sitemap struct {
	Pages []*Page `json:"pages"`
}

Sitemap represents a heirachy of pages within a webiste

func Site

func Site(ctx context.Context, site string) (*Sitemap, error)

Site will generate a sitemap for the given URL. The sitemap will be constrained to a given domain, external links will not be followed.

An error will be thrown if the url is invalid or the site can not be reached for any reason. Partial sitemaps will not be returned.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL