feedcrawler

package module
v2.2.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 28, 2018 License: MIT Imports: 14 Imported by: 1

README

go-feedcrawler

Feed (RSS and Atom) crawler library (an example application included).

Features

  • Support RSS and Atom
  • Filtering entries
    • Regexp based filter for title, description, content, author and categories
    • Callback function filter
  • State management (keep published date and detect new entries)
  • Multiple workers

Examples

See _example directory.

  • TOML based configuration file
  • Fake feed server (dynamic entries feed)

TODO

  • Suppor local files (local path and/or file scheme such as "file://")

License

MIT

Author

Yuki (@yukithm)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SaveStates

func SaveStates(states States, w io.Writer) error

SaveStates writes current states to io.Writer.

func SaveStatesFile

func SaveStatesFile(states States, file string) error

SaveStatesFile save current states into JSON file.

Types

type Crawler

type Crawler struct {
	Subscriptions []Subscription
	States        States
	NumWorkers    int
	Parser        *gofeed.Parser
}

Crawler is a crawler for RSS and Atom feeds.

func (*Crawler) Crawl

func (fc *Crawler) Crawl() ([]Result, error)

Crawl crawls subscribed feeds.

func (*Crawler) CrawlFunc

func (fc *Crawler) CrawlFunc(f func(Result)) error

CrawlFunc crawls subscribed feeds and call the func with each result.

type EnhancedAtomTranslator

type EnhancedAtomTranslator struct {
	// contains filtered or unexported fields
}

func NewEnhancedAtomTranslator

func NewEnhancedAtomTranslator() *EnhancedAtomTranslator

func (*EnhancedAtomTranslator) Translate

func (ct *EnhancedAtomTranslator) Translate(feed interface{}) (*gofeed.Feed, error)

type Feed

type Feed struct {
	ID                string `toml:"id"`
	URI               string `toml:"uri"`
	TitleFilter       string `toml:"title_filter,omitempty"`
	DescriptionFilter string `toml:"description_filter,omitempty"`
	ContentFilter     string `toml:"content_filter,omitempty"`
	AuthorFilter      string `toml:"author_filter,omitempty"`
	CategoryFilter    string `toml:"category_filter,omitempty"`
}

Feed is a feed configuration to be subscribed.

func LoadFeeds

func LoadFeeds(r io.Reader) ([]Feed, error)

LoadFeeds loads feeds from io.Reader.

func LoadFeedsFile

func LoadFeedsFile(file string) ([]Feed, error)

LoadFeedsFile loads feeds from file.

func (*Feed) Subscription

func (f *Feed) Subscription() (Subscription, error)

Subscription returns a Subscription.

type FeedID

type FeedID string

FeedID is an identifier of a feed.

type Result

type Result struct {
	Subscription Subscription
	Feed         *gofeed.Feed
	NewItems     []*gofeed.Item
	Err          error
}

Result is a result of a feed crawling.

type SimpleSubscription

type SimpleSubscription struct {
	FeedID            FeedID
	FeedURI           string
	TitleFilter       *regexp.Regexp
	DescriptionFilter *regexp.Regexp
	ContentFilter     *regexp.Regexp
	AuthorFilter      *regexp.Regexp
	CategoryFilter    *regexp.Regexp
	FilterFunc        func(*gofeed.Item) bool
}

SimpleSubscription is a feed configuration to be subscribed.

func (*SimpleSubscription) Filter

func (s *SimpleSubscription) Filter(item *gofeed.Item) bool

Filter returns true if the item is acceptable.

func (*SimpleSubscription) ID

func (s *SimpleSubscription) ID() FeedID

ID returns the feed ID.

func (*SimpleSubscription) URI

func (s *SimpleSubscription) URI() string

URI returns the feed URI.

type State

type State struct {
	CrawledAt time.Time `json:"crawled_at,omitempty"`
	UpdatedAt time.Time `json:"updated_at,omitempty"`
}

State is feed's crawling state.

type States

type States map[FeedID]*State

States is a list of State.

func LoadStates

func LoadStates(r io.Reader) (States, error)

LoadStates loads states from io.Reader.

func LoadStatesFile

func LoadStatesFile(file string) (States, error)

LoadStatesFile loads states from JSON file.

func (States) UpdateState

func (s States) UpdateState(result Result)

UpdateState updates states by the result.

type Subscription

type Subscription interface {
	ID() FeedID
	URI() string
	Filter(*gofeed.Item) bool
}

Subscription is a subscription information of a feed.

Directories

Path Synopsis
_example

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL