scraper

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 29, 2024 License: AGPL-3.0 Imports: 13 Imported by: 1

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ContentProviders = []ContentProvider{}

ContentProviders is the registry of all supported content provider

Functions

This section is empty.

Types

type ContentProvider

type ContentProvider interface {
	Get(ctx context.Context, rawurl string) (*WebPage, error)
	Match(url string) bool
}

ContentProvider is a content provider interface

func GetContentProvider

func GetContentProvider(rawurl string) ContentProvider

GetContentProvider return content provider that match the given URL

type WebPage

type WebPage struct {
	URL      string `json:"url,omitempty"`
	Title    string `json:"title,omitempty"`
	HTML     string `json:"html,omitempty"`
	Text     string `json:"text,omitempty"`
	Length   int    `json:"length,omitempty"`
	Excerpt  string `json:"excerpt,omitempty"`
	SiteName string `json:"sitename,omitempty"`
	Image    string `json:"image,omitempty"`
	Favicon  string `json:"favicon,omitempty"`
}

WebPage is the result of a web scraping

type WebScraper

type WebScraper interface {
	Scrap(ctx context.Context, rawurl string) (*WebPage, error)
}

WebScraper is an interface with Web Scrapping provider

func NewExternalWebScraper

func NewExternalWebScraper(httpClient *http.Client, uri string) (WebScraper, error)

NewExternalWebScraper create an external web scrapping service

func NewInternalWebScraper

func NewInternalWebScraper(httpClient *http.Client) WebScraper

NewInternalWebScraper create an internal web scrapping service

func NewWebScraper

func NewWebScraper(httpClient *http.Client, uri string) (WebScraper, error)

NewWebScraper create new Web Scraping service

Directories

Path Synopsis
content-provider
all
oembed
Code generated by go generate; DO NOT EDIT.
Code generated by go generate; DO NOT EDIT.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL