ratt

package module
v0.0.0-...-7433a87 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2024 License: MIT Imports: 25 Imported by: 1

README

ratt

RSS all the things!

ratt is a tool for converting websites to rss/atom feeds. It uses lua config files which define the extraction of the feed data by using css selectors or lua functions.

Config files are in lua format:

--for automatic extraction, ratt checks all config files and matches the regex
ratt.add(
	--regex
	"https://github.com/trending",
	--css selectors table
	{
		--settings for all http requests for the website
		httpsettings = {
			cookie = {},
			header = {},
			useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36"
		},
		--css selectors to get the feed data
		feed = {
			title = ".h1",
		},
		--css selectors to get item data
		item = {
			--the item container
			container = "article.Box-row",
			--selector can be a function which get's the item container selection object
			title = function(sel, _)
				return sel:find("h1.h3 > a[data-hydro-click]"):text():gsub("%s+", "")
			end,
			link = function(sel, _)
				return "https://github.com" .. sel:find("a[data-hydro-click]"):attr("href")
			end,
			description = "p.color-fg-muted",
		}
	}
)

Configs

Config files are lua files. ratt has some confs embedded. When calling eg: ratt https://1337x.to/top-100 ratt will try to find the config for the website url, it searches the embedded config files, the current directory and in ~/.config/ratt/*.lua.

Installation

First install go, git and scdoc, then:

git clone https://git.sr.ht/~ghost08/ratt
cd ratt
sudo make install

Install on Arch Linux from AUR with your favorite helper:

yay -S ratt-git

Issues

File bugs and TODOs through the issue tracker or send an email to ~ghost08/ratt@todo.sr.ht. For general discussion, use the mailing list: ~ghost08/ratt@lists.sr.ht.

Usage

Just calling ratt with the url of the web page.

ratt https://github.com/trending/go

Documentation

man ratt.5

What will I do with this RSS feed?

That's a very good question. I'm happy you asked :)

You might feed the feed directly to photon, which is a modern RSS/Atom reader. photon will play you the media from your feed. It uses mpv and youtube-dl to automaticaly play videos, download torrents, view images and much more :)

So try this out:

ratt https://1337x.to/top-100 | photon -

photon 1337x screenshot

Lua

If a css selector isn't enough to select the needed data, every feed and item attribute can be a lua function.

The function gets two arguments by default:

sel is the selection object of the feed/item container on which it can be queried for the selectors

index number of the item processed

The Lua script will get some modules to help with the extraction:

goquery is a module imported by default and it is a subset of the famous goquery library

gojq is a module imported by default, it is the gojq) library

ratt will take the return value of the Lua function and insert it as the data of the feed/item. When a error has occured, just use the error function.

For more documentation see ratt(5)

examples

Calling another link, parsing it to a goquery.Document and querying the new doc:

item = {
  --select the item container html element
  container = ".table-list-wrap tbody tr",
  --select the title element in the item container
  title = "a:nth-child(2)",
  --lua script
  link = function(sel, _)
    --sel is the item container element, find <a/>
    a = sel:find("a:nth-child(2)")
	--get the href attribute of <a/> and make a item url link from it
    itemURL = "https://1337x.to" .. a:attr("href")
	--request and parse the document
    doc, err = goquery.newDocFromURL(itemURL)
    if err ~= nil then
	  --return error if the request was unsuccesfull
      error(err)
    end
	--find the item link you want
    link = doc:find("ul li a[onclick]"):first():attr("href")
	--trim space characters
    link = link:gsub("%s+", "")
	--and finally print the link out so ratt can include it in the item.link
    return link
  end,
}

You can also parse and query json data, with the help of the awesome gojq) library:

feed = {
  title = ".title",
  description = function(sel, _)
    --find the <script> element where the json data is
    script = sel:find("script"):first():text()
    index = script::find("var myJsonData =")
    --cut of the "var myJsonData =" prefix
    jsonData = script:sub(index+16)
    --parse a gojq query, that will find the obj["description'] value
    query, err = gojq.parse(".description")
    if err ~= nil then
      error(err)
    end
    --expecting that the input data is a map/object (otherwise if it's a array use runArray)
    desc, err = query.runMap(jsonData)
    if err ~= nil then
      error(err)
    end
    return desc[1]["description"]
  end,
}

Check the confs dir for other examples.

Contribution

ratt needs config files for it to run. I really rely on the community to create configs for all the sites!

So please create config files, send them here, then everybody can make the world RSS again!

Anyone can contribute to ratt:

  • Clone the repository.
  • Patch the code.
  • Make some tests.
  • Ensure that your code is properly formatted with gofmt.
  • Ensure that everything works as expected.
  • Ensure that you did not break anything.
  • Do not forget to update the docs.

Once you are happy with your work, you can create a commit (or several commits). Follow these general rules:

  • Limit the first line (title) of the commit message to 60 characters.
  • Use a short prefix for the commit title for readability with git log --oneline.
  • Use the body of the commit message to actually explain what your patch does and why it is useful.
  • Address only one issue/topic per commit.
  • If you are fixing a ticket, use appropriate commit trailers.
  • If you are fixing a regression introduced by another commit, add a Fixes: trailer with the commit id and its title.

There is a great reference for commit messages in the Linux kernel documentation.

Before sending the patch, you should configure your local clone with sane defaults:

git config format.subjectPrefix "PATCH ratt"
git config sendemail.to "~ghost08/ratt@lists.sr.ht"

And send the patch to the mailing list:

git sendemail --annotate -1

Wait for feedback. Address comments and amend changes to your original commit. Then you should send a v2:

git sendemail --in-reply-to=$first_message_id --annotate -v2 -1

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ByCreated

type ByCreated []*feeds.Item

func (ByCreated) Len

func (a ByCreated) Len() int

func (ByCreated) Less

func (a ByCreated) Less(i, j int) bool

func (ByCreated) Swap

func (a ByCreated) Swap(i, j int)

type ByTitle

type ByTitle []*feeds.Item

func (ByTitle) Len

func (a ByTitle) Len() int

func (ByTitle) Less

func (a ByTitle) Less(i, j int) bool

func (ByTitle) Swap

func (a ByTitle) Swap(i, j int)

type Feed

type Feed struct {
	Title           lua.LValue
	TitleAttr       string
	Description     lua.LValue
	DescriptionAttr string
	AuthorName      lua.LValue
	AuthorNameAttr  string
	AuthorEmail     lua.LValue
	AuthorEmailAttr string
}

type Item

type Item struct {
	Container       lua.LValue
	Title           lua.LValue
	TitleAttr       string
	Link            lua.LValue
	LinkAttr        string
	Created         lua.LValue
	CreatedAttr     string
	CreatedFormat   string
	Description     lua.LValue
	DescriptionAttr string
	Content         lua.LValue
	ContentAttr     string
	Image           lua.LValue
	ImageAttr       string
}

type OutputEnum

type OutputEnum string
const (
	OutputEnumRSS  OutputEnum = "rss"
	OutputEnumAtom OutputEnum = "atom"
	OutputEnumJson OutputEnum = "json"
)

type Ratt

type Ratt struct {
	// contains filtered or unexported fields
}

func Init

func Init(inputURL *url.URL, outputType string, verbose bool) (*Ratt, error)

func (*Ratt) FindSelectors

func (r *Ratt) FindSelectors() (*Selectors, error)

func (*Ratt) NewSelectorsFromLua

func (r *Ratt) NewSelectorsFromLua(sels *lua.LTable) *Selectors

type Selectors

type Selectors struct {
	*Ratt
	HTTPSettings  httpsettings.HTTPSettings
	Feed          Feed
	Item          Item
	NextPage      lua.LValue
	NextPageAttr  string
	NextPageCount int
	Sort          SortEnum
}

func (*Selectors) ConstructFeed

func (s *Selectors) ConstructFeed(doc *goquery.Document, inputURL string) (feed *feeds.Feed, err error)

func (*Selectors) ConstructFeedFromURL

func (s *Selectors) ConstructFeedFromURL(inputURL *url.URL) (feed *feeds.Feed, err error)

func (*Selectors) Extract

func (s *Selectors) Extract(out io.Writer)

type SortEnum

type SortEnum string
const (
	SortDontSort    SortEnum = ""
	SortReverse     SortEnum = "REVERSE"
	SortCreatedASD  SortEnum = "CREATED_ASD"
	SortCreatedDESC SortEnum = "CREATED_DESC"
	SortTitleASD    SortEnum = "TITLE_ASD"
	SortTitleDESC   SortEnum = "TITLE_DESC"
)

Directories

Path Synopsis
cmd
lib

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL