rotten_tomato

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2023 License: MIT Imports: 12 Imported by: 0

README

tests

🎥 Rotten Tomatoes in Golang (and API) 🎬

Note This is a golang rewrite of the rottentomatoes-python project, made to be used for cinemascan.org

Disclaimer If at any point in your project this library stops working, 99% of the time it's due to Rotten Tomatoes IP-blocking the server (every request scrapes Rotten Tomatoes /search endpoint) OR because the Rotten Tomatoes site schema has changed, meaning some changes to web scraping and extraction under the hood will be necessary to make everything work again.

This package allows you to easily fetch Rotten Tomatoes scores and other movie data such as genres, without the use of the official Rotten Tomatoes API. The package scrapes their website for the data. This package is a golang rewrite of rottentomatoes-python for higher performance and to be used for storing movie ratings info for cinemascan.org

The package now, by default, scrapes the Rotten Tomatoes search page to find the true url of the first valid movie response (is a movie and has a tomatometer). This means queries that previously didn't work because their urls had a unique identifier or a year-released prefix, now work. The limitation of this new mechanism is that you only get the top response, and when searching for specific movies (sequels, by year, etc.) Rotten Tomatoes seems to return the same results as the original query. So, it's difficult to use specific queries to try and get the desired result movie as the top response. See #4 for more info on this.

There is now an API deployed to query movies and getting responses easier. The endpoint is https://rottentomato.cinemascan.org and it's open and free to use. Visit the swagger docs in the browser to view the endpoints. Both endpoints live right now are browser accessible meaning you don't need an HTTP client to use the API.

Usage

Basic usage example:

import (¯
    "github.com/cinemascan/rottentomato-go/rotten_tomato"
)

movieName := "The Matrix"
currentYear := 1999
proxyUrl := os.Getenv("PROXY_URL")
scrapedRtInfo, err := rotten_tomato.GetMovieInfo(title, year, proxyUrl)

fmt.Printf("%v", scrapedRtInfo)
//// OUTPUT
// {
//     "audienceScore": {
//         "averageRating": "4.5",
//         "bandedRatingCount": "10,000+",
//         "likedCount": 12460,
//         "notLikedCount": 1248,
//         "ratingCount": 13708,
//         "reviewCount": 5583,
//         "state": "upright",
//         "value": 91
//     },
//     "rating": "R",
//     "tomatometerScore": {
//         "averageRating": "8.60",
//         "bandedRatingCount": "",
//         "likedCount": 429,
//         "notLikedCount": 31,
//         "ratingCount": 460,
//         "reviewCount": 460,
//         "state": "certified-fresh",
//         "value": 93
//     },
//     "title": "Oppenheimer",
//     "year": 2023,
//     "runtime": "3h 0m",
//     "genres": [
//         "History",
//         "Drama"
//     ]
// }

Performance

Since every request queries the Rotten Tomatoes search endpoint, response times can range from 2-3s up to 10s in rarer cases.

If performance is important, you may use cinemascan's private API https://api.cinemascan.org/search/movies

We store the ratings for the top movies from 1999 - 2023 in our DB, hence response times range from 50-100+ms depending on location (view response times here)

API

Try out via swagger: https://rottentomato.cinemascan.org/swagger/index.html#/

Documentation

Index

Constants

View Source
const RT_HOST string = "www.rottentomatoes.com"

Variables

View Source
var REQUEST_HEADERS http.Header = http.Header{
	"User-Agent":      {"Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0"},
	"Accept-Language": {"en-US"},
	"Accept":          {"text/html"},
	"Referer":         {"https://www.google.com"},
}

Functions

func GetChunksFromString

func GetChunksFromString(startTag string, endTag string, content string) []string

func GetGenres

func GetGenres(movieName string, year *int, proxyUrl string) ([]string, error)

func GetMovieTitle

func GetMovieTitle(movieName string, year *int, proxyUrl string) (string, error)

func RemoveSpecialChars

func RemoveSpecialChars(raw string) string

Types

type RTMovieInfo

type RTMovieInfo struct {
	AudienceScore    RTScore  `json:"audienceScore"`
	Rating           string   `json:"rating"`
	TomatometerScore RTScore  `json:"tomatometerScore"`
	Title            string   `json:"title"`
	Year             int      `json:"year"`
	Runtime          string   `json:"runtime"`
	Genres           []string `json:"genres"`
}

func GetMovieInfo

func GetMovieInfo(movieName string, year *int, proxyUrl string) (*RTMovieInfo, error)

type RTSchemaAggregateRating

type RTSchemaAggregateRating struct {
	Type        string `json:"@type"`
	BestRating  string `json:"bestRating"`
	Description string `json:"description"`
	Name        string `json:"name"`
	RatingCount int    `json:"ratingCount"`
	RatingValue string `json:"ratingValue"`
	ReviewCount int    `json:"reviewCount"`
	WorstRating string `json:"worstRating"`
}

type RTSchemaCompany

type RTSchemaCompany struct {
	Type string `json:"@type"`
	Name string `json:"name"`
}

type RTSchemaJson

type RTSchemaJson struct {
	Context           string                  `json:"@context"`
	Type              string                  `json:"@type"`
	Actors            []RTSchemaPerson        `json:"actors"`
	AggregateRating   RTSchemaAggregateRating `json:"aggregateRating"`
	Author            []RTSchemaPerson        `json:"author"`
	Character         []string                `json:"character"`
	ContentRating     string                  `json:"contentRating"`
	DateCreated       string                  `json:"dateCreated"`
	DateModified      string                  `json:"dateModified"`
	Director          []RTSchemaPerson        `json:"director"`
	Genre             []string                `json:"genre"`
	Image             string                  `json:"image"`
	Name              string                  `json:"name"`
	ProductionCompany RTSchemaCompany         `json:"productionCompany"`
	Url               string                  `json:"url"`
}

type RTSchemaPerson

type RTSchemaPerson struct {
	Name  string `json:"name"`
	Url   string `json:"sameAs"`
	Image string `json:"image"`
}

func GetActors

func GetActors(movieName string, year *int, maxActors int, proxyUrl string) ([]RTSchemaPerson, error)

func GetDirectors

func GetDirectors(movieName string, year *int, maxDirectors int, proxyUrl string) ([]RTSchemaPerson, error)

type RTScore

type RTScore struct {
	AverageRating     string `json:"averageRating"`
	BandedRatingCount string `json:"bandedRatingCount"`
	LikedCount        int    `json:"likedCount"`
	NotLikedCount     int    `json:"notLikedCount"`
	RatingCount       int    `json:"ratingCount"`
	ReviewCount       int    `json:"reviewCount"`
	State             string `json:"state"`
	Value             int    `json:"value"`
}

type RTScoreDetails

type RTScoreDetails struct {
	MediaType       string       `json:"mediaType"`
	PrimaryImageUrl string       `json:"primaryImageUrl"`
	Scoreboard      RTScoreboard `json:"scoreboard"`
}

type RTScoreboard

type RTScoreboard struct {
	AudienceCountHref string  `json:"audienceCountHref"`
	AudienceScore     RTScore `json:"audienceScore"`
	Rating            string  `json:"rating"`
	TomatometerScore  RTScore `json:"tomatometerScore"`
	Title             string  `json:"title"`
	Info              string  `json:"info"`
}

type SearchListing

type SearchListing struct {
	Title          string
	HasTomatometer bool
	IsMovie        bool
	Year           int
	Url            string
}

func FromHtml

func FromHtml(htmlSnippet string) (*SearchListing, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL