tumblr-scraper

command module
v0.0.0-...-5085e7c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2021 License: MIT Imports: 4 Imported by: 0

README

tumblr-scraper

This project was created as a black box, from scratch reimplementation of Liru/tumblr-downloader, recreating and improving upon its features.

Features

  • Downloads all photos and videos of a blog, including those inlined into posts
  • Automatically stops scraping a blog where it left off the last time
  • Allows filtering out reblogs
  • Uses Tumblr's v2 API, which is more robust and significantly faster
  • Simulates Tumblr's private API to even scrape private blogs if needed
  • All downloads are parallelized

TODOs

  • Documentation (up until now this strictly has been a private project)
  • Crawling of >5000 posts per day will lead to rate limiting
  • Continuing a previously failed crawl/scrape is not supported
    Setting the before field in the config allows you to scrape backwards starting at a date in the past.
    That way you can manually, iteratively scrape a huge blog in "sane" chunks (e.g. first everything before 2014, then 2015, 2016, ...).
  • Support for youtube-dl would be nice

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
Package cookiejar implements an in-memory RFC 6265-compliant http.CookieJar.
Package cookiejar implements an in-memory RFC 6265-compliant http.CookieJar.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL