lib

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 23, 2024 License: MIT Imports: 19 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultRatePerSecond = 2

DefaultRatePerSecond defines the default request rate per second when creating a new Fetcher.

Variables

This section is empty.

Functions

This section is empty.

Types

type DateFilterFunc

type DateFilterFunc func(string) bool

type ExtractResult

type ExtractResult struct {
	Post Post
	Err  error
}

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor is a utility for extracting Substack posts from URLs.

func NewExtractor

func NewExtractor(f *Fetcher) *Extractor

NewExtractor creates a new Extractor with the provided Fetcher. If the Fetcher is nil, a default Fetcher will be used.

func (*Extractor) ExtractAllPosts

func (e *Extractor) ExtractAllPosts(ctx context.Context, urls []string) <-chan ExtractResult

func (*Extractor) ExtractPost

func (e *Extractor) ExtractPost(ctx context.Context, pageUrl string) (Post, error)

func (*Extractor) GetAllPostsURLs

func (e *Extractor) GetAllPostsURLs(ctx context.Context, pubUrl string, f DateFilterFunc) ([]string, error)

type FetchError

type FetchError struct {
	TooManyRequests bool
	RetryAfter      int
}

FetchError represents an error returned when encountering too many requests with a Retry-After value.

func (*FetchError) Error

func (e *FetchError) Error() string

Error returns the error message for the FetchError, indicating the retry wait time.

type FetchResult

type FetchResult struct {
	Url   string
	Body  io.ReadCloser
	Error error
}

FetchResult represents the result of a URL fetch operation.

type Fetcher

type Fetcher struct {
	Client      *http.Client
	RateLimiter *rate.Limiter
	BackoffCfg  backoff.BackOff
	Cookie      *http.Cookie
}

Fetcher represents a URL fetcher with rate limiting and retry mechanisms.

func NewFetcher

func NewFetcher(opts ...FetcherOption) *Fetcher

NewFetcher creates a new Fetcher with the provided options. If ratePerSecond is 0, the default rate (DefaultRatePerSecond) is used. If b is nil, the default backoff configuration is used.

func (*Fetcher) FetchURL

func (f *Fetcher) FetchURL(ctx context.Context, url string) (io.ReadCloser, error)

FetchURL fetches the specified URL and returns the response body as io.ReadCloser and any encountered error. It uses rate limiting and retry mechanisms to handle rate limits and transient failures.

func (*Fetcher) FetchURLs

func (f *Fetcher) FetchURLs(ctx context.Context, urls []string) <-chan FetchResult

FetchURLs concurrently fetches the specified URLs and returns a channel to receive the FetchResults. The returned channel will be closed once all fetch operations are completed.

type FetcherOption

type FetcherOption func(*FetcherOptions)

FetcherOption defines a function that applies a specific option to FetcherOptions.

func WithBackOffConfig

func WithBackOffConfig(b backoff.BackOff) FetcherOption

WithBackOffConfig sets the backoff configuration for the Fetcher.

func WithCookie

func WithCookie(cookie *http.Cookie) FetcherOption

WithCookie sets the cookie for the Fetcher.

func WithProxyURL

func WithProxyURL(proxyURL *url.URL) FetcherOption

WithProxyURL sets the proxy URL for the Fetcher.

func WithRatePerSecond

func WithRatePerSecond(rate int) FetcherOption

WithRatePerSecond sets the rate per second for the Fetcher.

type FetcherOptions

type FetcherOptions struct {
	RatePerSecond int
	ProxyURL      *url.URL
	BackOffConfig backoff.BackOff
	Cookie        *http.Cookie
}

FetcherOptions holds configurable options for Fetcher.

type Post

type Post struct {
	Id               int    `json:"id"`
	PublicationId    int    `json:"publication_id"`
	Type             string `json:"type"`
	Slug             string `json:"slug"`
	PostDate         string `json:"post_date"`
	CanonicalUrl     string `json:"canonical_url"`
	PreviousPostSlug string `json:"previous_post_slug"`
	NextPostSlug     string `json:"next_post_slug"`
	CoverImage       string `json:"cover_image"`
	Description      string `json:"description"`
	WordCount        int    `json:"wordcount"`
	//PostTags         []string `json:"postTags"`
	Title    string `json:"title"`
	BodyHTML string `json:"body_html"`
}

Post represents a structured Substack post with various fields.

func (*Post) ToHTML

func (p *Post) ToHTML(withTitle bool) string

ToHTML returns the Post's HTML body as-is or with an optional title header.

func (*Post) ToJSON

func (p *Post) ToJSON() (string, error)

ToJSON converts the Post to a JSON string.

func (*Post) ToMD

func (p *Post) ToMD(withTitle bool) (string, error)

ToMD converts the Post's HTML body to Markdown format.

func (*Post) ToText

func (p *Post) ToText(withTitle bool) string

ToText converts the Post's HTML body to plain text format.

func (*Post) WriteToFile

func (p *Post) WriteToFile(path string, format string) error

WriteToFile writes the Post's content to a file in the specified format (html, md, or txt).

type PostWrapper

type PostWrapper struct {
	Post Post `json:"post"`
}

PostWrapper wraps a Post object for JSON unmarshaling.

type RawPost

type RawPost struct {
	// contains filtered or unexported fields
}

RawPost represents a raw Substack post in string format.

func (*RawPost) ToPost

func (r *RawPost) ToPost() (Post, error)

ToPost converts the RawPost to a structured Post object.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL