fftr

package
v0.0.0-...-d1a9080 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2021 License: AGPL-3.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultConfigurationFolders = ConfigFolderList{
	{siteConfigFS("custom"), "custom"},
	{siteConfigFS("standard"), "standard"},
}

DefaultConfigurationFolders is a list of default locations with configuration files.

Functions

func ExtractAuthor

func ExtractAuthor(m *extract.ProcessMessage, next extract.Processor) extract.Processor

ExtractAuthor applies the "author" directives to find an author.

func ExtractBody

ExtractBody tries to find a body as defined by the "body" directives in the configuration file.

func ExtractDate

ExtractDate applies the "date" directives to find a date. If a date is found we try to parse it.

func FindContentPage

func FindContentPage(m *extract.ProcessMessage, next extract.Processor) extract.Processor

FindContentPage searches for SinglePageLinkSelectors in the page and, if it finds one, it reset the process to its beginning with the newly found URL.

func FindNextPage

FindNextPage looks for NextPageLinkSelectors and if it finds a URL, it's added to the message and can be processed later with GoToNextPage.

func GoToNextPage

GoToNextPage checks if there is a "next_page" value in the process message. It then creates a new drop with the URL.

func LoadConfiguration

func LoadConfiguration(m *extract.ProcessMessage, next extract.Processor) extract.Processor

LoadConfiguration will try to find a matching fftr configuration for the first Drop (the extraction starting point).

If a configuration is found, it will be added to the context.

If the configuration indicates custom HTTP headers, they'll be added to the client.

func ReplaceStrings

func ReplaceStrings(m *extract.ProcessMessage, next extract.Processor) extract.Processor

ReplaceStrings applies all the replace_string directive in fftr configuration file on the received body.

func StripTags

StripTags removes the tags from the DOM root node, according to "strip_tags" configuration directives.

Types

type Config

type Config struct {
	Files []string `json:"-"`

	TitleSelectors          []string          `json:"title_selectors"`
	BodySelectors           []string          `json:"body_selectors"`
	DateSelectors           []string          `json:"date_selectors"`
	AuthorSelectors         []string          `json:"author_selectors"`
	StripSelectors          []string          `json:"strip_selectors"`
	StripIDOrClass          []string          `json:"strip_id_or_class"`
	StripImageSrc           []string          `json:"strip_image_src"`
	NativeAdSelectors       []string          `json:"native_ad_selectors"`
	Tidy                    bool              `json:"tidy"`
	Prune                   bool              `json:"prune"`
	AutoDetectOnFailure     bool              `json:"autodetect_on_failure"`
	SinglePageLinkSelectors []string          `json:"single_page_link_selectors"`
	NextPageLinkSelectors   []string          `json:"next_page_link_selectors"`
	ReplaceStrings          [][2]string       `json:"replace_strings"`
	HTTPHeaders             map[string]string `json:"http_headers"`
	Tests                   []FilterTest      `json:"tests"`
}

Config holds the fivefilters configuration.

func NewConfig

func NewConfig(r io.Reader, format string) (*Config, error)

NewConfig loads a configuration file from an io.Reader.

func NewConfigForURL

func NewConfigForURL(src *url.URL, folders ConfigFolderList) (*Config, error)

NewConfigForURL loads site config configuration file(s) for a given URL.

func (*Config) Merge

func (cf *Config) Merge(new *Config)

Merge merges a new configuration in the current one.

type ConfigFolder

type ConfigFolder struct {
	fs.FS
	Name string
}

ConfigFolder is an http.FileSystem with a name.

type ConfigFolderList

type ConfigFolderList []*ConfigFolder

ConfigFolderList is a list of configuration folders.

type FilterTest

type FilterTest struct {
	URL      string   `json:"url"`
	Contains []string `json:"contains"`
}

FilterTest holds the values for a filter's test.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL