extractor

package
v0.0.0-...-1e67cb1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2024 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package extractor uses altered version of go-readabilty and local rules to get articles

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Response

type Response struct {
	Content     string   `json:"content"`
	Rich        string   `json:"rich_content"`
	Domain      string   `json:"domain"`
	URL         string   `json:"url"`
	Title       string   `json:"title"`
	Excerpt     string   `json:"excerpt"`
	Image       string   `json:"lead_image_url"`
	AllImages   []string `json:"images"`
	AllLinks    []string `json:"links"`
	ContentType string   `json:"type"`
	Charset     string   `json:"charset"`
}

Response from api calls

type Rules

type Rules interface {
	Get(ctx context.Context, rURL string) (datastore.Rule, bool)
	GetByID(ctx context.Context, id primitive.ObjectID) (datastore.Rule, bool)
	Save(ctx context.Context, rule datastore.Rule) (datastore.Rule, error)
	Disable(ctx context.Context, id primitive.ObjectID) error
	All(ctx context.Context) []datastore.Rule
}

Rules interface with all methods to access datastore

type UReadability

type UReadability struct {
	TimeOut     time.Duration
	SnippetSize int
	Rules       Rules
}

UReadability implements fetcher & extractor for local readability-like functionality

func (UReadability) Extract

func (f UReadability) Extract(ctx context.Context, reqURL string) (rb *Response, err error)

Extract fetches page and retrieves article

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL