linksrc

package
v0.0.0-...-c05b5d5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 6, 2024 License: MIT Imports: 21 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	// The name of the source, e.g., "New York Magazine"
	Name string
	// url of the site containing links
	URL url.URL
	// CSS selector for a link within a list of links.
	ItemSelector css.Selector
	// CSS selector for a caption within a link item.
	// Relative to ItemSelector
	CaptionSelector css.Selector
	// CSS selector for the actual link within a link item. Should be an
	// "a" element. Relative to ItemSelector.
	LinkSelector css.Selector
	// Maximum number of Items in a Set. If a scraper returns more than this
	// within a link site, Items will be chosen arbitrarily.
	MaxItems uint
	// The minimum number of words that a block-level HTML element must
	// contain for it to be included in a link item's caption. Used to
	// exclude short pieces of text like blog tags, bylines, or anything
	// else that can get in the way of a caption's substance.
	//
	// Must be greater than zero. The default is three.
	ShortElementFilter int
}

Config stores options for the link source container.

There is no support for grouped (i.e., comma-separated) selectors. This is because, while grouped selectors are useful for applying styles to generalized sets of elements, the HTML parser needs to locate elements individually.

func (*Config) CheckAndSetDefaults

func (c *Config) CheckAndSetDefaults() (Config, error)

CheckAndSetDefaults validates c and either returns a copy of c with default settings applied or returns an error due to an invalid configuration

func (*Config) UnmarshalYAML

func (c *Config) UnmarshalYAML(unmarshal func(interface{}) error) error

UnmarshalYAML implements the yaml.Unmarshaler interface. Validation is performed here.

type LinkItem

type LinkItem struct {
	// using a string here because we'll let the downstream context deal
	// with parsing URLs etc. This comes from a website so we can't really
	// trust it.
	LinkURL string
	Caption string
}

LinkItem represents data for a single link item found within a list of links

func (LinkItem) Key

func (li LinkItem) Key() []byte

Key returns the key to use for determining whether a LinkItem has already been stored within the database

func (LinkItem) NewKVEntry

func (li LinkItem) NewKVEntry() storage.KVEntry

NewKVEntry prepares the LinkItem to be saved in the KV database. Keys are SHA256 hashes of the entire LinkItem. Values are timestamps in seconds since the Unix epoch. Usually we'll just be checking whether newly fetched LinkItems are already saved. Eventually we might want to use the timestamp.

type Set

type Set struct {
	// The publication that the links came from
	Name string
	// contains filtered or unexported fields
}

Set represents a set of link items. It's not meant to be modified by concurrent goroutines.

func NewSet

func NewSet(ctx context.Context, r io.Reader, conf Config, code int) Set

NewSet initializes a new collection of listed link items for an HTML document Reader, link source configuration, and HTTP status code (which is treated as a 200 OK if not set)

func (*Set) AddMessage

func (s *Set) AddMessage(msg string)

AddMessage adds a message to the Set for displaying later in an email. These messages are used only for ad hoc notes that don't belong in a LinkItem, such as error messages. Messages should be complete sentences.

func (*Set) CountLinkItems

func (s *Set) CountLinkItems() int

CountLinkItems returns the number of LinkItems managed by the Set

func (*Set) LinkItems

func (s *Set) LinkItems() []LinkItem

LinkItems returns all of the LinkItems managed by the Set

func (*Set) Messages

func (s *Set) Messages() []string

Messages returns all of the ad-hoc messages for the Set

func (*Set) RemoveLinkItem

func (s *Set) RemoveLinkItem(li LinkItem)

RemoveLinkItem removes the LinkItem from the Set. Not to be used concurrently

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL