feeder

package
v0.0.0-...-eaf77a1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 23, 2014 License: CC0-1.0, MIT Imports: 9 Imported by: 0

README

RSS

This package allows us to fetch Rss and Atom feeds from the internet. They are parsed into an object tree which is a hybrid of both the RSS and Atom standards.

Supported feeds are:

  • Rss v0.91, 0.92 and 2.0
  • Atom 1.0

The package allows us to maintain cache timeout management. This prevents us from querying the servers for feed updates too often and risk ip bans. Apart from setting a cache timeout manually, the package also optionally adheres to the TTL, SkipDays and SkipHours values specified in the feeds themselves.

Note that the TTL, SkipDays and SkipHour fields are only part of the RSS spec. For Atom feeds, we use the CacheTimeout in the Feed struct.

Because the object structure is a hybrid between both RSS and Atom specs, not all fields will be filled when requesting either an RSS or Atom feed. I have tried to create as many shared fields as possible but some of them simply do not occur in either the RSS or Atom spec.

The Feed object supports notifications of new channels and items. This is achieved by passing 2 function handlers to the feeder.New() function. They will be called whenever a feed is updated from the remote source and either a new channel or a new item is found that previously did not exist. This allows you to easily monitor a feed for changes. See src/feed_test.go for an example of how this works.

DEPENDENCIES

github.com/jteeuwen/go-pkg-xmlx

USAGE

An idiomatic example program can be found in testdata/example.go.

Documentation

Overview

Credits go to github.com/SlyMarbo/rss for inspiring this solution.

Author: jim teeuwen <jimteeuwen@gmail.com>
Dependencies: go-pkg-xmlx (http://github.com/jteeuwen/go-pkg-xmlx)

This package allows us to fetch Rss and Atom feeds from the internet.
They are parsed into an object tree which is a hyvrid of both the RSS and Atom
standards.

Supported feeds are:
	- Rss v0.91, 0.91 and 2.0
	- Atom 1.0

The package allows us to maintain cache timeout management. This prevents us
from querying the servers for feed updates too often and risk ip bams. Appart
from setting a cache timeout manually, the package also optionally adheres to
the TTL, SkipDays and SkipHours values specied in the feeds themselves.

Note that the TTL, SkipDays and SkipHour fields are only part of the RSS spec.
For Atom feeds, we use the CacheTimeout in the Feed struct.

Because the object structure is a hybrid between both RSS and Atom specs, not
all fields will be filled when requesting either an RSS or Atom feed. I have
tried to create as many shared fields as possiblem but some of them simply do
not occur in either the RSS or Atom spec.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewDatabase

func NewDatabase() *database

Types

type Author

type Author struct {
	Name  string
	Uri   string
	Email string
}

type Category

type Category struct {
	Domain string
	Text   string
}

type Channel

type Channel struct {
	Title          string
	Links          []Link
	Description    string
	Language       string
	Copyright      string
	ManagingEditor string
	WebMaster      string
	PubDate        string
	LastBuildDate  string
	Docs           string
	Categories     []*Category
	Generator      Generator
	TTL            int
	Rating         string
	SkipHours      []int
	SkipDays       []int
	Image          Image
	Items          []*Item
	Cloud          Cloud
	TextInput      Input
	Extensions     map[string]map[string][]Extension

	// Atom fields
	Id       string
	Rights   string
	Author   Author
	SubTitle SubTitle
}

func (*Channel) Key

func (c *Channel) Key() string

type ChannelHandler

type ChannelHandler func(f *Feed, newchannels []*Channel)

type Cloud

type Cloud struct {
	Domain            string
	Port              int
	Path              string
	RegisterProcedure string
	Protocol          string
}

type Content

type Content struct {
	Type string
	Lang string
	Base string
	Text string
}

type Enclosure

type Enclosure struct {
	Url    string
	Length int64
	Type   string
}

type Extension

type Extension struct {
	Name      string
	Value     string
	Attrs     map[string]string
	Childrens map[string][]Extension
}

type Feed

type Feed struct {
	// Custom cache timeout in minutes.
	CacheTimeout int

	// Make sure we adhere to the cache timeout specified in the feed. If
	// our CacheTimeout is higher than that, we will use that instead.
	EnforceCacheLimit bool

	// Type of feed. Rss, Atom, etc
	Type string

	// Version of the feed. Major and Minor.
	Version [2]int

	// Channels with content.
	Channels []*Channel

	// Url from which this feed was created.
	Url string
	// contains filtered or unexported fields
}

func New

func New(cachetimeout int, enforcecachelimit bool, ch ChannelHandler, ih ItemHandler) *Feed

func (*Feed) CanUpdate

func (this *Feed) CanUpdate() bool

This function returns true or false, depending on whether the CacheTimeout value has expired or not. Additionally, it will ensure that we adhere to the RSS spec's SkipDays and SkipHours values (if Feed.EnforceCacheLimit is set to true). If this function returns true, you can be sure that a fresh feed update will be performed.

func (*Feed) Fetch

func (this *Feed) Fetch(uri string, charset xmlx.CharsetFunc) (err error)

Fetch retrieves the feed's latest content if necessary.

The charset parameter overrides the xml decoder's CharsetReader. This allows us to specify a custom character encoding conversion routine when dealing with non-utf8 input. Supply 'nil' to use the default from Go's xml package.

This is equivalent to calling FetchClient with http.DefaultClient

func (*Feed) FetchBytes

func (this *Feed) FetchBytes(uri string, content []byte, charset xmlx.CharsetFunc) (err error)

func (*Feed) FetchClient

func (this *Feed) FetchClient(uri string, client *http.Client, charset xmlx.CharsetFunc) (err error)

Fetch retrieves the feed's latest content if necessary.

The charset parameter overrides the xml decoder's CharsetReader. This allows us to specify a custom character encoding conversion routine when dealing with non-utf8 input. Supply 'nil' to use the default from Go's xml package.

The client parameter allows the use of arbitrary network connections, for example the Google App Engine "URL Fetch" service.

func (*Feed) GetVersionInfo

func (this *Feed) GetVersionInfo(doc *xmlx.Document) (ftype string, fversion [2]int)

func (*Feed) LastUpdate

func (this *Feed) LastUpdate() int64

This returns a timestamp of the last time the feed was updated. The value is in seconds.

func (*Feed) SecondsTillUpdate

func (this *Feed) SecondsTillUpdate() int64

Returns the number of seconds needed to elapse before the feed should update.

type Generator

type Generator struct {
	Uri     string
	Version string
	Text    string
}

type Image

type Image struct {
	Title       string
	Url         string
	Link        string
	Width       int
	Height      int
	Description string
}

type Input

type Input struct {
	Title       string
	Description string
	Name        string
	Link        string
}

type Item

type Item struct {
	// RSS and Shared fields
	Title       string
	Links       []*Link
	Description string
	Author      Author
	Categories  []*Category
	Comments    string
	Enclosures  []*Enclosure
	Guid        *string
	PubDate     string
	Source      *Source

	// Atom specific fields
	Id           string
	Generator    *Generator
	Contributors []string
	Content      *Content

	Extensions map[string]map[string][]Extension
}

func (*Item) Key

func (i *Item) Key() string

func (*Item) ParsedPubDate

func (i *Item) ParsedPubDate() (time.Time, error)

type ItemHandler

type ItemHandler func(f *Feed, ch *Channel, newitems []*Item)
type Link struct {
	Href     string
	Rel      string
	Type     string
	HrefLang string
}

type Source

type Source struct {
	Url  string
	Text string
}

type SubTitle

type SubTitle struct {
	Type string
	Text string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL