service

package
v0.0.0-...-acd0fa8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2020 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetHTMLDoc

func GetHTMLDoc(url string) *goquery.Document

GetHTMLDoc is get HTML document

func GetLinks(doc *goquery.Document, filterRegexp string) []string

GetLinks is get all links in HTML document

func GetTexts

func GetTexts(doc *goquery.Document, selector string) []string

GetTexts is get all HTML texts matched elements by selector

func OutputJSONL

func OutputJSONL(rows []string)

OutputJSONL is output jsonl to dataset directory

func SanitizeHTML

func SanitizeHTML(html string) []string

SanitizeHTML is sanitize HTML texts without policy

func UniqStr

func UniqStr(stringSlice []string) []string

UniqStr is make stringSlice unique

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL