extractor

package
v0.0.0-...-71af719 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2023 License: MIT Imports: 18 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ArticleExtractor

type ArticleExtractor struct {
	// contains filtered or unexported fields
}

func NewArticleExtractor

func NewArticleExtractor(logger logutil.Logger) *ArticleExtractor

func (*ArticleExtractor) Extract

func (ae *ArticleExtractor) Extract(doc *webdoc.TextDocument, wc stringutil.WordCounter, candidateTitles []string) bool

Extract extracts TextDocument. It is tuned towards news articles.

type ContentExtractor

type ContentExtractor struct {
	Parser      *markup.Parser
	TimingInfo  *data.TimingInfo
	ImageURLs   []string
	WordCounter stringutil.WordCounter
	// contains filtered or unexported fields
}

func NewContentExtractor

func NewContentExtractor(root *html.Node, pageURL *nurl.URL, logger logutil.Logger) *ContentExtractor

func (*ContentExtractor) ExtractContent

func (ce *ContentExtractor) ExtractContent() (*webdoc.Document, int)

func (*ContentExtractor) ExtractTitle

func (ce *ContentExtractor) ExtractTitle() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL