pageinfo

package
v0.0.0-...-f77d796 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2022 License: MIT Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetHTMLTitle

func GetHTMLTitle(doc *html.Node) (string, error)

GetHTMLTitle extracts the title from a parsed document.

func IsURL

func IsURL(str string) bool

IsURL returns true if a string parses as a URL and false otherwise.

Types

type Fetcher

type Fetcher interface {
	FetchPage(string) (*PageInfo, error)
}

Fetcher is an interface for structs that fetch pages from the internet.

type HTMLFetcher

type HTMLFetcher struct {
	// contains filtered or unexported fields
}

HTMLFetcher is the default fetcher.

func New

func New(client *http.Client) *HTMLFetcher

New returns an HTMLFetcher initialized with the http.Client passed in.

func (*HTMLFetcher) FetchPage

func (f *HTMLFetcher) FetchPage(url string) (*PageInfo, error)

FetchPage satisfies the Fetcher interface.

type PageInfo

type PageInfo struct {
	URL   string
	Title *string
	Body  []byte
}

PageInfo holds information about a page that has been fetched.

func (*PageInfo) MissingTitle

func (p *PageInfo) MissingTitle() bool

MissingTitle returns true if no title was found.

func (*PageInfo) Sha1String

func (p *PageInfo) Sha1String() string

Sha1String returns a sha1 of the contents of the page that was fetched.

func (*PageInfo) WriteToFile

func (p *PageInfo) WriteToFile(path string) error

WriteToFile writes the contents of the page that was fetched to the path provided.

type URL

type URL struct {
	CanonicalURL string
	Input        string
	Sha1         string
}

URL holds information about a URL that has been upserted.

func NewURL

func NewURL(providedURL string) (*URL, error)

NewURL returns a URL with a canonicalized form and a SHA1.

func NormalizeURL

func NormalizeURL(rawURL string) (*URL, error)

NormalizeURL normalizes a url before it is stored in the database.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL