cleanhtml

package
v0.0.0-...-496bd57 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2020 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package cleanhtml provides a toolset for reading source HTML documents and attempting to render them into more human-readable output.

Although this package is meant to be consumed by the cleanpg utility (http://github.com/scu/cleanpg) it may be useful in other applications.

url := "http://example.com"
sourceData, err := cleanhtml.ReadHTML(url)
if err != nil {
	errStr := fmt.Sprintf("Could not read document at %q: %s", url, err)
	panic(errStr)
}

cleanData, err := cleanhtml.CleanHTML(sourceData)
if err != nil {
	errStr := fmt.Sprintf("Could not transform data: %s", err)
	panic(errStr)
}

Disclaimer: this library outputs a document layout and content different than the original page designer. Use of these re-rendered documents are not intended for re-publishing, circumventing content protection mechanisms or violate the copyright of the original content authors.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CleanHTML

func CleanHTML(data []byte) (string, error)

CleanHTML provides a rendered HTML document. It accepts document data (normally through cleanhtml.ReadHTML), parses and renders the data through a set of filters to produce readable HTML output, which is returned as a string.

func ReadHTML

func ReadHTML(url string) ([]byte, error)

ReadHTML reads a web page and returns a string containing the unfiltered document, which is then passed to cleanhtml.CleanHTML to render the result.

func SetLinksRender

func SetLinksRender(flag bool)

SetLinksRender sets flag indicating whether links <a... href...> will be rendered [default = true]

func SetPostH1Render

func SetPostH1Render(flag bool)

SetPostH1Render sets flag indicating whether the renderer will process BODY elements until the first H1 tag is reached

func SetStyleRender

func SetStyleRender(flag bool)

SetStyleRender sets flag indicating whether the renderer embeds tag-level styles automatically [default = true]

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL