providers

package
v0.0.0-...-69560be Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 26, 2022 License: MIT Imports: 15 Imported by: 0

README

Providers

Providers are the "back ends" that pixdl can fetch images from. There are lots of different types of image hosting services, so Provider has to be a little flexible in terms of how it works.

Broadly speaking, there are two kinds of providers. The first are URLProviders - providers that can download an album just given the URL. For example, if the user gives us a URL like https://imgur.com/gallery/88wOh, then we can fetch https://imgur.com/gallery/88wOh.json and grab all the images for this album without ever having to parse any HTML.

The second kind of provider - HTMLProvider - is one that needs to scrape HTML. Take a message board powered by XenForo as an example - the HTML will have a certain characteristic structure that's the same on all XenForo boards, so we can have a single XenForo provider that downloads images from all of them. We could list all the URLs for known XenForo boards in that Provider, but if a URL isn't on our list, a Provider could still peek at the HTML and try to figure out if the structure is something it recognizes.

These two cases are different because in the first case, we just need to ask the Provider "Can you fetch albums from this URL?" In the second case, the provider needs to look at parsed HTML, and obviously we don't want each provider to re-parse the HTML over and over again. We therefore have two different Provider interfaces - URLProvider and HTMLProvider (although a given provider can implement both), and the overall algorithm for finding a provider is broadly:

  • For each URLProvider, call CanDownloadFromURL(url). If the provider returns true, call FetchAlbum() to download the album.
  • If no provider found, HEAD the URL
  • If the URL is for an image, download the image directly.
  • If the URL is for an HTML file, parse the HTML, and then for each HTMLProvider call FetchAlbumFromHTML() until one returns true, indicating that it found some images to download.

Note the last HTMLProvider is the "web" provider, which should be able to download just about anything.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var HTMLProviderRegistry = []HTMLProvider{
	xenforoProvider{},

	webProvider{},
}

HTMLProviderRegistry is a list of all HTMLProviders.

View Source
var URLProviderRegistry = []URLProvider{
	imgurProvider{},
	gofileProvider{},
	singleimageProvider{},
}

URLProviderRegistry is a list of all URLProviders.

Functions

func IsImageByExtension

func IsImageByExtension(url string) bool

IsImageByExtension returns true if the given URL appears to point to an image, based on the file extension.

Types

type Env

type Env struct {
	// DownloadClient is the client that wil be used to download files.
	// This must be provided.
	DownloadClient *download.Client
}

Env is a common "environment" object with utility functions and settings information that is passed to all providers.

func (*Env) Get

func (env *Env) Get(url string) (*http.Response, error)

Get will fetch the contents of a URL via HTTP GET.

func (*Env) GetFileInfo

func (env *Env) GetFileInfo(url string) (*download.RemoteFileInfo, error)

GetFileInfo returns information about a file on a server.

func (*Env) GetHTML

func (env *Env) GetHTML(url string) (*html.Node, error)

GetHTML will fetch the HTML contents of a URL via HTTP GET, and return the parsed HTML.

func (*Env) NewGetRequest

func (*Env) NewGetRequest(url string) (*http.Request, error)

NewGetRequest creates a new http GET request.

type HTMLProvider

type HTMLProvider interface {
	// Name is the name of this provider.
	Name() string
	// FetchAlbum will fetch all images in an album, and pass them to the ImageCallback.
	// If this provider cannot download images from this album, returns `false`
	// immediately.  If any images were successfully fetched, returns true.
	FetchAlbumFromHTML(env *Env, params map[string]string, url string, node *html.Node, callback ImageCallback) bool
}

HTMLProvider represents a back-end which can figure out if a given HTML document represents an image album, and find images in that album.

type ImageCallback

type ImageCallback func(
	album *meta.AlbumMetadata,
	image *meta.ImageMetadata,
	err error,
) (wantMore bool)

ImageCallback is a function called by a Provider for each image in an album. This will be called once for each image, and then with `album, nil, nil` when there are no more images.

If an error occurs fetching images, this will be called with err set.

Implemnetations can return false to stop the Provider from providing any further images.

type URLImageProvider

type URLImageProvider interface {
	// Name is the name of this provider.
	Name() string
	// CanFetchImage returns true if this Provider can fetch the specified URL.
	CanFetchImage(url string) bool
	// FetchImage will get information about the image at the specified URL.
	FetchImage(
		env *Env,
		params map[string]string,
		album *meta.AlbumMetadata,
		url string,
	) (image *meta.ImageMetadata, err error)
}

URLImageProvider downloads a single image. This is used when a given website (say a Xenforo forum) has an album containing links to images from external image hosts.

type URLProvider

type URLProvider interface {
	// Name is the name of this provider.
	Name() string
	// CanDownload returns true if this Provider can fetch the specified URL.
	CanDownload(url string) bool
	// FetchAlbum will fetch all images in an album, and pass them to the ImageCallback.
	FetchAlbum(env *Env, params map[string]string, url string, callback ImageCallback)
}

URLProvider represents a back-end which can read album and image metadata from a server. URLProvider differs from HTMLProvider in that it can decide whether or not it can fetch an album given only a URL.

func SingleImageProvider

func SingleImageProvider() URLProvider

SingleImageProvider returns a new instance of the singleimage provider.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL