pagination

package
v0.0.0-...-977eb4a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 10, 2023 License: MIT Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// If the numeric value of a link's anchor text is greater than this number,
	// we don't think it represents the page number of the link.
	MaxNumForPageParam = 100
)

Variables

This section is empty.

Functions

This section is empty.

Types

type PageNumberFinder

type PageNumberFinder struct {
	// contains filtered or unexported fields
}

PageNumberFinder parses the document to collect groups of adjacent plain text numbers and outlinks with digital anchor text.

func NewPageNumberFinder

func NewPageNumberFinder(wc stringutil.WordCounter, timingInfo *data.TimingInfo, logger logutil.Logger) *PageNumberFinder
func (pnf *PageNumberFinder) FindOutlink(root *html.Node, pageURL *nurl.URL) *info.PageParamInfo

FindOutlink parses the document to collect outlinks with numeric anchor text and numeric text around them. Returns PageParamInfo, always (never null). If no page parameter is detected or determined to be best, its Type is info.Unset.

func (*PageNumberFinder) FindPagination

func (pnf *PageNumberFinder) FindPagination(root *html.Node, pageURL *nurl.URL) (pagination data.PaginationInfo)

type PrevNextFinder

type PrevNextFinder struct {
	// contains filtered or unexported fields
}

PrevNextFinder finds the next and previous page links for the distilled document. The functionality for next page links is migrated from readability.getArticleTitle() in chromium codebase's third_party/readability/js/readability.js, and then expanded for previous page links; boilerpipe doesn't have such capability. First, it determines the prefix URL of the document. Then, for each anchor in the document, its href and text are compared to the prefix URL and examined for next- or previous-paging-related information. If it passes, its score is then determined by applying various heuristics on its href, text, class name and ID. Lastly, the page link with the highest score of at least 50 is considered to have enough confidence as the next or previous page link.

func NewPrevNextFinder

func NewPrevNextFinder(logger logutil.Logger) *PrevNextFinder
func (pnf *PrevNextFinder) FindOutlink(root *html.Node, pageURL *nurl.URL, findNext bool) string

func (*PrevNextFinder) FindPagination

func (pnf *PrevNextFinder) FindPagination(root *html.Node, pageURL *nurl.URL) data.PaginationInfo

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL