twitterscraper

package module
v0.0.0-...-715658a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2024 License: MIT Imports: 24 Imported by: 0

README

Twitter Scraper

Go Reference Go

Twitter’s API is pricey and has lots of limitations. But their frontend has its own API, which was reverse-engineered by @n0madic and maintained by @imperatrona. Some endpoints require authentication, but it is easy to scale by buying new accounts and proxies.

You can use this library to get tweets, profiles, and trends trivially.

Table of Contents

Installation

go get -u github.com/imperatrona/twitter-scraper

Quick start

package main

import (
    "context"
    "fmt"
    twitterscraper "github.com/imperatrona/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New()
    scraper.SetAuthToken(twitterscraper.AuthToken{Token: "auth_token", CSRFToken: "ct0"})

    // After setting Cookies or AuthToken you have to execute IsLoggedIn method.
    // Without it, scraper wouldn't be able to make requests that requires authentication
    if !scraper.IsLoggedIn() {
      panic("Invalid AuthToken")
    }

    for tweet := range scraper.GetTweets(context.Background(), "x", 50) {
        if tweet.Error != nil {
            panic(tweet.Error)
        }
        fmt.Println(tweet.Text)
    }
}

Rate limits

Api has a global limit on how many requests per second are allowed, don’t make requests more than once per 1.5 seconds from one account. Also each endpoint has its own limits, most of them are 150 requests per 15 minutes.

Apparently twitter doesn’t limit the number of accounts that can be used per one IP address. This could change at any time. As of February 2024, I have been managing 20 accounts per IP address without receiving a ban for several months.

OpenAccount was great in the past, but now it’s nerfed by twitter. They allow 180 requests instead of 150, but you can only create one account per month with one IP address. If you use OpenAccount you should save your credentials and use them later with WithOpenAccount method.

Authentication

Most endpoints require authentication. The preferable way is to use SetCookies. You can also use SetAuthToken but POST endpoints will not work. Login with password may require confirmation with email and is often the reason of accounts ban.

Endpoints that work without authentication will not return sensitive content. To get sensitive content you need to authenticate with any available method including OpenAccount.

Using cookies
// Deserialize from JSON
var cookies []*http.Cookie
f, _ := os.Open("cookies.json")
json.NewDecoder(f).Decode(&cookies)

scraper.SetCookies(cookies)
if !scraper.IsLoggedIn() {
    panic("Invalid cookies")
}

To save cookies from an authorized client to a file, use GetCookies:

cookies := scraper.GetCookies()

data, _ := json.Marshal(cookies)
f, _ = os.Create("cookies.json")
f.Write(data)
Using AuthToken

SetAuthToken method simply set required cookies auth_token and ct0.

scraper.SetAuthToken(twitterscraper.AuthToken{Token: "auth_token", CSRFToken: "ct0"})
if !scraper.IsLoggedIn() {
    panic("Invalid AuthToken")
}
OpenAccount

[!WARNING]
Deprecated. Nerfed by twitter, doesn't support new endpoints.

LoginOpenAccount is now limited to one new account per month for IP address.

account, err := scraper.LoginOpenAccount()

You should save OpenAccount returned by LoginOpenAccount to reuse it later.

scraper.WithOpenAccount(twitterscraper.OpenAccount{
    OAuthToken: "TOKEN",
    OAuthTokenSecret: "TOKEN_SECRET",
})
Login & Password

To log in, you have to use your username, not the email!

err := scraper.Login("username", "password")

If you have email confirmation, use your email address in addition:

err := scraper.Login("username", "password", "email")

If you have two-factor authentication, use the code:

err := scraper.Login("username", "password", "code")
Check if login

Status of login can be checked with method IsLoggedIn:

scraper.IsLoggedIn()
Log out
scraper.Logout()

Methods

Get tweet

150 requests / 15 minutes

TweetDetail endpoint requires auth, so TweetResultByRestId endpoint used instead when auth not provided. Which doesn't return InReplyToStatus and Thread tweets.

tweet, err := scraper.GetTweet("1328684389388185600")
Get user tweets

150 requests / 15 minutes

GetTweets returns a channel with the specified number of user tweets. It’s using the FetchTweets method under the hood.

for tweet := range scraper.GetTweets(context.Background(), "taylorswift13", 50) {
    if tweet.Error != nil {
        panic(tweet.Error)
    }
    fmt.Println(tweet.Text)
}

FetchTweets returns tweets and cursor for fetching the next page. Each request returns up to 20 tweets.

var cursor string
tweets, cursor, err := scraper.FetchTweets("taylorswift13", 20, cursor)
Get user medias

500 requests / 15 minutes

GetMediaTweets returns a channel with the specified number of user tweets that contain media. It’s using the FetchMediaTweets method under the hood.

for tweet := range scraper.GetMediaTweets(context.Background(), "taylorswift13", 50) {
    if tweet.Error != nil {
        panic(tweet.Error)
    }
    fmt.Println(tweet.Text)
}

FetchMediaTweets returns tweets and cursor for fetching the next page. Each request returns up to 20 tweets.

var cursor string
tweets, cursor, err := scraper.FetchMediaTweets("taylorswift13", 20, cursor)
Get bookmarks

[!IMPORTANT]
Requires authentication!

500 requests / 15 minutes

GetBookmarks returns a channel with the specified number of bookmarked tweets. It’s using the FetchBookmarks method under the hood.

for tweet := range scraper.GetBookmarks(context.Background(), 50) {
    if tweet.Error != nil {
        panic(tweet.Error)
    }
    fmt.Println(tweet.Text)
}

FetchBookmarks returns bookmarked tweets and cursor for fetching the next page. Each request returns up to 20 tweets.

var cursor string
tweets, cursor, err := scraper.FetchBookmarks(20, cursor)
Search tweets

[!IMPORTANT]
Requires authentication!

150 requests / 15 minutes

SearchTweets returns a channel with the specified number of tweets that contain media. It’s using the FetchSearchTweets method under the hood.

for tweet := range scraper.SearchTweets(context.Background(),
    "twitter scraper data -filter:retweets", 50) {
    if tweet.Error != nil {
        panic(tweet.Error)
    }
    fmt.Println(tweet.Text)
}

FetchSearchTweets returns tweets and cursor for fetching the next page. Each request returns up to 20 tweets.

tweets, cursor, err := scraper.FetchSearchTweets("taylorswift13", 20, cursor)

By default, search returns top tweets. You can change it by specifying the search mode before making requests. Supported modes are SearchTop, SearchLatest, SearchPhotos, SearchVideos, and SearchUsers.

scraper.SetSearchMode(twitterscraper.SearchLatest)
Search params

See Rules and filtering for build standard queries.

Get profile

95 requests / 15 minutes

profile, err := scraper.GetProfile("taylorswift13")
Search profile

[!IMPORTANT]
Requires authentication!

150 requests / 15 minutes

SearchProfiles returns a channel with the specified number of tweets that contain media. It’s using the FetchSearchProfiles method under the hood.

for profile := range scraper.SearchProfiles(context.Background(), "Twitter", 50) {
    if profile.Error != nil {
        panic(profile.Error)
    }
    fmt.Println(profile.Name)
}

FetchSearchProfiles returns profiles and cursor for fetching the next page. Each request returns up to 20 tweets.

profiles, cursor, err := scraper.FetchSearchProfiles("taylorswift13", 20, cursor)
trends, err := scraper.GetTrends()
Get following

[!IMPORTANT]
Requires authentication!

500 requests / 15 minutes

var cursor string
users, cursor, err := scraper.FetchFollowing("Support", 20, cursor)
Get followers

[!IMPORTANT]
Requires authentication!

50 requests / 15 minutes

var cursor string
users, cursor, err := scraper.FetchFollowers("Support", 20, cursor)
Get scheduled tweets

[!IMPORTANT]
Requires authentication!

500 requests / 15 minutes

tweets, err := scraper.FetchScheduledTweets()
Create scheduled tweet

[!IMPORTANT]
Requires authentication!

500 requests / 15 minutes

tweets, err := scraper.CreateScheduledTweet(twitterscraper.TweetSchedule{
    Text:   "New scheduled tweet text",
    Date:   time.Now().Add(time.Hour * 24 * 31),
    Medias: nil,
})
Delete scheduled tweet

[!IMPORTANT]
Requires authentication!

500 requests / 15 minutes

err := scraper.DeleteScheduledTweet("123")
Upload media

[!IMPORTANT]
Requires authentication!

50 requests / 15 minutes

Uploads photo, video or gif for further posting or scheduling. Expires in 24 hours if not used.

media, err := scraper.UploadMedia("./files/movie.mp4")

Connection

Proxy
HTTP(s)
err := scraper.SetProxy("http://localhost:3128")
SOCKS5
err := scraper.SetProxy("socks5://localhost:1080")

Socks5 proxy support authentication.

err := scraper.SetProxy("socks5://user:pass@localhost:1080")
Delay

Add delay between API requests (in seconds)

scraper.WithDelay(5)
Load timeline with tweet replies
scraper.WithReplies(true)

Contributing

Testing

To run some tests, you need to set any form of authentication via environment variables. You can see all possible variables in .vscode/settings.json file. You can also set them in the file to use automatically in vscode, just make sure you don’t commit them in your contribution.

Documentation

Index

Constants

View Source
const DefaultClientTimeout = 10 * time.Second

default http client timeout

Variables

This section is empty.

Functions

This section is empty.

Types

type AuthToken

type AuthToken struct {
	Token     string
	CSRFToken string
}

Use auth_token cookie as Token and ct0 cookie as CSRFToken

type GIF

type GIF struct {
	ID      string
	Preview string
	URL     string
}

GIF type.

type Media

type Media struct {
	ID        int
	Type      string
	Size      int
	Parts     int
	ExpiresAt time.Time
}

type Mention

type Mention struct {
	ID       string
	Username string
	Name     string
}

Mention type.

type OpenAccount

type OpenAccount struct {
	OAuthToken       string `json:"oauth_token"`
	OAuthTokenSecret string `json:"oauth_token_secret"`
}

type Photo

type Photo struct {
	ID  string
	URL string
}

Photo type.

type Place

type Place struct {
	ID          string `json:"id"`
	PlaceType   string `json:"place_type"`
	Name        string `json:"name"`
	FullName    string `json:"full_name"`
	CountryCode string `json:"country_code"`
	Country     string `json:"country"`
	BoundingBox struct {
		Type        string        `json:"type"`
		Coordinates [][][]float64 `json:"coordinates"`
	} `json:"bounding_box"`
}

type ProcessingInfo

type ProcessingInfo struct {
	State      string `json:"state"`
	CheckAfter int    `json:"check_after_secs"`
	Progress   int    `json:"progress_percent"`
}

type Profile

type Profile struct {
	Avatar         string
	Banner         string
	Biography      string
	Birthday       string
	FollowersCount int
	FollowingCount int
	FriendsCount   int
	IsPrivate      bool
	IsVerified     bool
	Joined         *time.Time
	LikesCount     int
	ListedCount    int
	Location       string
	Name           string
	PinnedTweetIDs []string
	TweetsCount    int
	URL            string
	UserID         string
	Username       string
	Website        string
	Sensitive      bool
	Following      bool
	FollowedBy     bool
}

Profile of twitter user.

type ProfileResult

type ProfileResult struct {
	Profile
	Error error
}

ProfileResult of scrapping.

type ScheduledTweet

type ScheduledTweet struct {
	ID        string
	State     string
	ExecuteAt time.Time
	Text      string
	Videos    []Video
	Photos    []Photo
	GIFs      []GIF
}

type Scraper

type Scraper struct {
	// contains filtered or unexported fields
}

Scraper object

func New

func New() *Scraper

New creates a Scraper object

func (*Scraper) ClearCookies

func (s *Scraper) ClearCookies()

func (*Scraper) ClearGuestToken

func (s *Scraper) ClearGuestToken() error

func (*Scraper) CreateScheduledTweet

func (s *Scraper) CreateScheduledTweet(schedule TweetSchedule) (string, error)

CreateScheduledTweet schedule new tweet.

func (*Scraper) DeleteScheduledTweet

func (s *Scraper) DeleteScheduledTweet(id string) error

DeleteScheduledTweet removes tweet from scheduled.

func (*Scraper) FetchBookmarks

func (s *Scraper) FetchBookmarks(maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchBookmarks gets bookmarked tweets via the Twitter frontend GraphQL API.

func (*Scraper) FetchFollowers

func (s *Scraper) FetchFollowers(user string, maxUsersNbr int, cursor string) ([]*Profile, string, error)

FetchFollowers gets following profiles list for a given user, via the Twitter frontend GraphQL API.

func (*Scraper) FetchFollowersByUserID

func (s *Scraper) FetchFollowersByUserID(userID string, maxUsersNbr int, cursor string) ([]*Profile, string, error)

FetchFollowersByUserID gets followers profiles list for a given userID, via the Twitter frontend GraphQL API.

func (*Scraper) FetchFollowing

func (s *Scraper) FetchFollowing(user string, maxUsersNbr int, cursor string) ([]*Profile, string, error)

FetchFollowing gets following profiles list for a given user, via the Twitter frontend GraphQL API.

func (*Scraper) FetchFollowingByUserID

func (s *Scraper) FetchFollowingByUserID(userID string, maxUsersNbr int, cursor string) ([]*Profile, string, error)

FetchFollowingByUserID gets following profiles list for a given userID, via the Twitter frontend GraphQL API.

func (*Scraper) FetchMediaTweets

func (s *Scraper) FetchMediaTweets(user string, maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchMediaTweets gets tweets with medias for a given user, via the Twitter frontend API.

func (*Scraper) FetchMediaTweetsByUserID

func (s *Scraper) FetchMediaTweetsByUserID(userID string, maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchMediaTweetsByUserID gets tweets with medias for a given userID, via the Twitter frontend GraphQL API.

func (*Scraper) FetchScheduledTweets

func (s *Scraper) FetchScheduledTweets() ([]*ScheduledTweet, error)

FetchScheduledTweets gets scheduled tweets via the Twitter frontend GraphQL API.

func (*Scraper) FetchSearchProfiles

func (s *Scraper) FetchSearchProfiles(query string, maxProfilesNbr int, cursor string) ([]*Profile, string, error)

FetchSearchProfiles gets users for a given search query, via the Twitter frontend API

func (*Scraper) FetchSearchTweets

func (s *Scraper) FetchSearchTweets(query string, maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchSearchTweets gets tweets for a given search query, via the Twitter frontend API

func (*Scraper) FetchTweets

func (s *Scraper) FetchTweets(user string, maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchTweets gets tweets for a given user, via the Twitter frontend API.

func (*Scraper) FetchTweetsByUserID

func (s *Scraper) FetchTweetsByUserID(userID string, maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchTweetsByUserID gets tweets for a given userID, via the Twitter frontend GraphQL API.

func (*Scraper) FetchTweetsByUserIDLegacy

func (s *Scraper) FetchTweetsByUserIDLegacy(userID string, maxTweetsNbr int, cursor string) ([]*Tweet, string, error)

FetchTweetsByUserIDLegacy gets tweets for a given userID, via the Twitter frontend legacy API.

func (*Scraper) GetBookmarks

func (s *Scraper) GetBookmarks(ctx context.Context, maxTweetsNbr int) <-chan *TweetResult

GetBookmarks returns channel with tweets from user bookmarks.

func (*Scraper) GetCookies

func (s *Scraper) GetCookies() []*http.Cookie

func (*Scraper) GetGuestToken

func (s *Scraper) GetGuestToken() error

GetGuestToken from Twitter API

func (*Scraper) GetMediaTweets

func (s *Scraper) GetMediaTweets(ctx context.Context, user string, maxTweetsNbr int) <-chan *TweetResult

GetTweets returns channel with tweets for a given user.

func (*Scraper) GetProfile

func (s *Scraper) GetProfile(username string) (Profile, error)

GetProfile return parsed user profile.

func (*Scraper) GetTrends

func (s *Scraper) GetTrends() ([]string, error)

GetTrends return list of trends.

func (*Scraper) GetTweet

func (s *Scraper) GetTweet(id string) (*Tweet, error)

GetTweet get a single tweet by ID.

func (*Scraper) GetTweets

func (s *Scraper) GetTweets(ctx context.Context, user string, maxTweetsNbr int) <-chan *TweetResult

GetTweets returns channel with tweets for a given user.

func (*Scraper) GetUserIDByScreenName

func (s *Scraper) GetUserIDByScreenName(screenName string) (string, error)

GetUserIDByScreenName from API

func (*Scraper) IsGuestToken

func (s *Scraper) IsGuestToken() bool

IsGuestToken check if guest token not empty

func (*Scraper) IsLoggedIn

func (s *Scraper) IsLoggedIn() bool

IsLoggedIn check if scraper logged in

func (*Scraper) Login

func (s *Scraper) Login(credentials ...string) error

Login to Twitter Use Login(username, password) for ordinary login or Login(username, password, email) for login if you have email confirmation or Login(username, password, code_for_2FA) for login if you have two-factor authentication

func (*Scraper) LoginOpenAccount

func (s *Scraper) LoginOpenAccount() (OpenAccount, error)

LoginOpenAccount as Twitter app

func (*Scraper) Logout

func (s *Scraper) Logout() error

Logout is reset session

func (*Scraper) RequestAPI

func (s *Scraper) RequestAPI(req *http.Request, target interface{}) error

RequestAPI get JSON from frontend API and decodes it

func (*Scraper) SearchProfiles

func (s *Scraper) SearchProfiles(ctx context.Context, query string, maxProfilesNbr int) <-chan *ProfileResult

SearchProfiles returns channel with profiles for a given search query

func (*Scraper) SearchTweets

func (s *Scraper) SearchTweets(ctx context.Context, query string, maxTweetsNbr int) <-chan *TweetResult

SearchTweets returns channel with tweets for a given search query

func (*Scraper) SetAuthToken

func (s *Scraper) SetAuthToken(token AuthToken)

Auth using auth_token and ct0 cookies

func (*Scraper) SetCookies

func (s *Scraper) SetCookies(cookies []*http.Cookie)

func (*Scraper) SetProxy

func (s *Scraper) SetProxy(proxyAddr string) error

SetProxy set http proxy in the format `http://HOST:PORT` set socket proxy in the format `socks5://HOST:PORT`

func (*Scraper) SetSearchMode

func (s *Scraper) SetSearchMode(mode SearchMode) *Scraper

SetSearchMode switcher

func (*Scraper) UploadMedia

func (s *Scraper) UploadMedia(filePath string) (*Media, error)

Uploads photo, video or gif for further posting or scheduling. Expires in 24 hours if not used.

func (*Scraper) WithClientTimeout

func (s *Scraper) WithClientTimeout(timeout time.Duration) *Scraper

client timeout

func (*Scraper) WithDelay

func (s *Scraper) WithDelay(seconds int64) *Scraper

WithDelay add delay between API requests (in seconds)

func (*Scraper) WithOpenAccount

func (s *Scraper) WithOpenAccount(openAccount OpenAccount)

func (*Scraper) WithReplies

func (s *Scraper) WithReplies(b bool) *Scraper

WithReplies enable/disable load timeline with tweet replies

type SearchMode

type SearchMode int

SearchMode type

const (
	// SearchTop - default mode
	SearchTop SearchMode = iota
	// SearchLatest - live mode
	SearchLatest
	// SearchPhotos - image mode
	SearchPhotos
	// SearchVideos - video mode
	SearchVideos
	// SearchUsers - user mode
	SearchUsers
)

type Tweet

type Tweet struct {
	ConversationID    string
	GIFs              []GIF
	Hashtags          []string
	HTML              string
	ID                string
	InReplyToStatus   *Tweet
	InReplyToStatusID string
	IsQuoted          bool
	IsPin             bool
	IsReply           bool
	IsRetweet         bool
	IsSelfThread      bool
	Likes             int
	Name              string
	Mentions          []Mention
	PermanentURL      string
	Photos            []Photo
	Place             *Place
	QuotedStatus      *Tweet
	QuotedStatusID    string
	Replies           int
	Retweets          int
	RetweetedStatus   *Tweet
	RetweetedStatusID string
	Text              string
	Thread            []*Tweet
	TimeParsed        time.Time
	Timestamp         int64
	URLs              []string
	UserID            string
	Username          string
	Videos            []Video
	Views             int
	SensitiveContent  bool
}

Tweet type.

type TweetResult

type TweetResult struct {
	Tweet
	Error error
}

TweetResult of scrapping.

type TweetSchedule

type TweetSchedule struct {
	Text   string
	Date   time.Time
	Medias []*Media
}

type Video

type Video struct {
	ID      string
	Preview string
	URL     string
	HLSURL  string
}

Video type.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL