ratt
RSS all the things!
ratt is a tool for converting websites to rss/atom feeds. It uses config files which define the extraction of the feed data by using css selectors, or Lua script.
Config files are in yaml format:
#for automatic extraction, ratt checks all config files and matches the regex
regex: https://videoportal.joj.sk/.*
selectors:
#settings for all http requests for the website
httpsettings:
cookie: {}
header: {}
useragent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36
#css selectors to get the feed data
feed:
title: .title.my-2
desctription: .description
authorname:
authoremail:
#css selectors to get item data
item:
#the item container
container: article.b-article.title-xs.article-lp
#all subsequent attributes of the item are selected from the subtree of the item container
title: div.content > h3
link: a
linkattr: href
created: .date
createdformat: 2.1.2006
description: div.col > .date
image: img.img-fluid
imageattr: data-original
Configs
Config files are yaml files. ratt has some confs embedded. When calling eg: ratt auto https://1337x.to/top-100
ratt will try to find the config for the website url, it searches the embedded config files, the current directory and in ~/.config/ratt/*.yml
.
Installation
go get gitlab.com/microo8/ratt@latest
Usage
ratt has three commands:
auto
- automatically searches for the config that will be used.
extract
- with other arguments, ratt will scrap the website to generate the RSS/Atom feed.
save
- when you have the correct css selectors/lua scripts, save the config to a yaml file
ratt save --feed-title=".featured-heading strong" --item-container=".table-list-wrap tbody tr" --item-title="a:nth-child(2)" --item-link='a = sel:find("a:nth-child(2)")
itemURL = "https://1337x.to" .. a:attr("href")
doc, err = goquery.newDocFromURL(itemURL)
if err ~= nil then
error(err)
end
link = doc:find("ul li a[onclick]"):first():attr("href")
link = link:gsub("%s+", "")
print(link)' --item-created=".coll-date" --item-created-format="" "https://1337x.to/.*" 1337x.yml
That's a very good question. I'm happy you asked :)
You might feed the feed directly to photon, which is a modern RSS/Atom reader. photon will play you the media from your feed. It uses mpv and youtube-dl to automaticaly play videos, download torrents, view images and much more :)
So try this out:
ratt auto https://1337x.to/top-100 | photon -
Lua
If a css selector isn't enough to select the needed data, every feed and item attribute can be written as a multiline value and ratt will interpret it as Lua script.
The Lua script will get some global variables, to help with the extraction:
goquery
is a module imported by default and it is a subset of the famous goquery library
sel
is the selection object of the feed/item container on which it can be queried for the selectors
gojq
is a module imported by default, it is the gojq) library
setGlobal
sets a global variable that will be visible in other lua scripts. eg. in feed title setGlobal("myvar", 1)
is called and than in every subsequent item title, item link, ..., item image the variable will be visible print(myvar)
index
number of the item processed
ratt will take the stdout of the Lua script and insert it as the data of the feed/item. When a error has occured, just use the error
function.
examples
Calling another link, parsing it to a goquery.Document and querying the new doc:
item:
#select the item container html element
container: .table-list-wrap tbody tr
#select the title element in the item container
title: a:nth-child(2)
#lua script
link: |-
--sel is the item container element, find <a/>
a = sel:find("a:nth-child(2)")
--get the href attribute of <a/> and make a item url link from it
itemURL = "https://1337x.to" .. a:attr("href")
--request and parse the document
doc, err = goquery.newDocFromURL(itemURL)
if err ~= nil then
--return error if the request was unsuccesfull
error(err)
end
--find the item link you want
link = doc:find("ul li a[onclick]"):first():attr("href")
--trim space characters
link = link:gsub("%s+", "")
--and finally print the link out so ratt can include it in the item.link
print(link)
You can also parse and query json data, with the help of the awesome gojq) library:
feed:
title: .title
description: |-
--find the <script> element where the json data is
script = sel:find("script"):first():text()
index = script::find("var myJsonData =")
--cut of the "var myJsonData =" prefix
jsonData = script:sub(index+16)
--parse a gojq query, that will find the obj["description'] value
query, err = gojq.parse(".description")
if err ~= nil then
error(err)
end
--expecting that the input data is a map/object (otherwise if it's a array use runArray)
desc, err = query.runMap(jsonData)
if err ~= nil then
error(err)
end
print(desc[1]["description"])
Check the confs dir for other examples.
Contribution
ratt needs config files for it to run. I really rely on the community to create configs for all the sites!
So please create config files, push it here, than everybody can make the world RSS again!