scraper

package
v0.0.0-...-519d24f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2020 License: BSD-3-Clause Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Func

type Func func(r io.Reader) (interface{}, error)

Func is a function wrapper for Scraper interface

func (Func) Scrape

func (f Func) Scrape(r io.Reader) (interface{}, error)

Scrape implements Scraper#Scrape

type Head struct {
	Title     string     `json:"title"`
	Canonical string     `json:"canonical"`
	Meta      url.Values `json:"meta"`
	Rel       url.Values `json:"rel"`
}

type Rss2Channel

type Rss2Channel struct {
	Title          string       `xml:"title"`
	Link           string       `xml:"link"`
	Description    string       `xml:"description"`
	Language       string       `xml:"language"`
	Copyright      string       `xml:"copyright"`
	ManagingEditor string       `xml:"managingEditor"`
	WebMaster      string       `xml:"webMaster"`
	Images         []*Rss2Image `xml:"images"`
	LastBuildDate  Time         `xml:"lastBuildDate"`
	Category       string       `xml:"category"`
	Generator      string       `xml:"generator"`
	Items          []*Rss2Item  `xml:"item"`
}

Rss2Channel represents `rss>channel` doc.

type Rss2Doc

type Rss2Doc struct {
	XMLName xml.Name     `xml:"rss"`
	Channel *Rss2Channel `xml:"channel"`
}

Rss2Doc is a struct to represent RSS 2.0 document

type Rss2Image

type Rss2Image struct {
	URL    string `xml:"url"`
	Title  string `xml:"title"`
	Link   string `xml:"link"`
	Width  int64  `xml:"width"`
	Height int64  `xml:"height"`
}

Rss2Image represents `rss>channel>image` doc.

type Rss2Item

type Rss2Item struct {
	Title          string `xml:"title"`
	Link           string `xml:"link"`
	Description    string `xml:"description"`
	Author         string `xml:"author"`
	Category       string `xml:"category"`
	Comments       string `xml:"comments"`
	GUID           string `xml:"guid"`
	PubDate        Time   `xml:"pubDate"`
	ContentEncoded string `xml:"encoded"`
}

Rss2Item represents `rss>item` doc.

type Scraper

type Scraper interface {
	Scrape(r io.Reader) (interface{}, error)
}

Scraper is an interface to scrape a content

func Html

func Html(f func(*goquery.Document, *Head) (interface{}, error)) Scraper

Html returns a scraper for HTML with a custom logic on top of goquery

func Rss2

func Rss2(f func(*Rss2Doc) (interface{}, error)) Scraper

Rss2 returns a scraper for Rss2 with a custom logic on top of Rss2 struct

type Time

type Time time.Time

Time is an alias type for time.Time to unmarshal XML doc.

func (*Time) UnmarshalXML

func (t *Time) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

UnmarshalXML to implement xml unmarshalization

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL