sitemap

package module
v0.0.0-...-9c8e08e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 3, 2023 License: MIT Imports: 11 Imported by: 1

README

sitemap Go Reference

A Golang parser and client for the Sitemap XML format:

urls, err := sitemap.Fetch(context.TODO(), "https://sitemaps.org/sitemap.xml")
if err != nil {
    panic(err)
}
for _, url := range urls {
    log.Println(url.LastModification, url.Location)
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Alternate

type Alternate map[string]string

Alternate is a map of language codes to corrosponding URL https://developers.google.com/search/docs/specialty/international/localized-versions

func (*Alternate) UnmarshalXML

func (a *Alternate) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

UnmarshalXML implements xml.Unmarshaler

type ChangeFrequency

type ChangeFrequency string

OPTIONAL: Indicates how frequently the content at a particular URL is likely to change. The value "always" should be used to describe documents that change each time they are accessed. The value "never" should be used to describe archived URLs. Please note that web crawlers may not necessarily crawl pages marked "always" more often. Consider this element as a friendly suggestion and not a command.

const (
	ChangeFrequencyAlways  ChangeFrequency = "always"
	ChangeFrequencyHourly  ChangeFrequency = "hourly"
	ChangeFrequencyDaily   ChangeFrequency = "daily"
	ChangeFrequencyWeekly  ChangeFrequency = "weekly"
	ChangeFrequencyMonthly ChangeFrequency = "monthly"
	ChangeFrequencyYearly  ChangeFrequency = "yearly"
	ChangeFrequencyNever   ChangeFrequency = "never"
)

type FilterFunc

type FilterFunc func([]Sitemap) []Sitemap

FilterFunc reduces the list of sitemaps to fetch

type Option

type Option func(*options)

Option changes the behavior of the Fetch function

func WithFilter

func WithFilter(f FilterFunc) Option

WithFilter provides a custom function for filtering sitemaps

func WithHTTPClient

func WithHTTPClient(client *http.Client) Option

WithHTTPClient replaces the default http.Client (cleanhttp.DefaultClient) used by Fetch

func WithParallelism

func WithParallelism(limit int) Option

WithParallelism adjusts the maximum parallel fetches

func WithProcessor

func WithProcessor(f ProcessFunc) Option

WithProcessor provides a custom function for processing results

type ProcessFunc

type ProcessFunc func(
	ctx context.Context,
	sitemap *Sitemap,
	urls []URL,
) error

ProcessFunc processes each <urlset> as they are identified

type Sitemap

type Sitemap struct {
	// REQUIRED: The location URI of a document.
	// The URI must conform to RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt).
	Location string `xml:"loc"`
	// OPTIONAL: The date the document was last modified.
	// The date must conform to the W3C DATETIME format (http://www.w3.org/TR/NOTE-datetime).
	// Example: 2005-05-10 Lastmod may also contain a timestamp. Example: 2005-05-10T17:33:30+08:00
	LastModification datetime.Time `xml:"lastmod,omitempty"`
}

Container for the data needed to describe a document to crawl.

type SitemapIndex

type SitemapIndex struct {
	// Container for the data needed to describe a sitemap.
	Sitemaps []Sitemap `xml:"sitemap"`
}

Container for a set of up to 50,000 sitemap URLs. This is the root element of the XML file.

func (*SitemapIndex) ReadFrom

func (s *SitemapIndex) ReadFrom(r io.Reader) (int64, error)

ReadFrom implements io.ReaderFrom.

type URL

type URL struct {
	// REQUIRED: The location URI of a document.
	// The URI must conform to RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt).
	Location string `xml:"loc"`
	// OPTIONAL: The date the document was last modified.
	// The date must conform to the W3C DATETIME format (http://www.w3.org/TR/NOTE-datetime).
	// Example: 2005-05-10 Lastmod may also contain a timestamp. Example: 2005-05-10T17:33:30+08:00
	LastModification datetime.Time `xml:"lastmod,omitempty"`
	// OPTIONAL: Indicates how frequently the content at a particular URL is likely to change.
	// The value "always" should be used to describe documents that change each time they are accessed.
	// The value "never" should be used to describe archived URLs.
	// Please note that web crawlers may not necessarily crawl pages marked "always" more often.
	// Consider this element as a friendly suggestion and not a command.
	ChangeFrequency ChangeFrequency `xml:"changefreq,omitempty"`
	// OPTIONAL: The priority of a particular URL relative to other pages on the same site.
	// The value for this element is a number between 0.0 and 1.0 where 0.0 identifies the lowest priority page(s).
	// The default priority of a page is 0.5. Priority is used to select between pages on your site.
	// Setting a priority of 1.0 for all URLs will not help you, as the relative priority of pages on your site is what will be considered.
	Priority float64 `xml:"priority,omitempty"`
	// https://developers.google.com/search/docs/specialty/international/localized-versions
	Alternate Alternate `xml:"link,omitempty"`
}

Container for the data needed to describe a document to crawl.

func Fetch

func Fetch(ctx context.Context, sitemap string, opts ...Option) (urls []URL, err error)

Fetch retrieves the URLs in a given sitemap using reasonable defaults

type URLSet

type URLSet struct {
	// Container for the data needed to describe a document to crawl.
	URLs []URL `xml:"url"`
}

Container for a set of up to 50,000 document elements. This is the root element of the XML file.

func (*URLSet) ReadFrom

func (u *URLSet) ReadFrom(r io.Reader) (int64, error)

ReadFrom implements io.ReaderFrom.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL