html

package

v0.5.13 Latest Latest Go to latest Published: May 1, 2024 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/go-go-golems/glazed

Links

Open Source Insights

Documentation ¶

Index ¶

func NewHTMLCommand() (*cobra.Command, error)
type HTMLSplitParser
- func NewHTMLHeadingSplitParser(gp middlewares.Processor, removeTags []string) *HTMLSplitParser
- func NewHTMLSplitParser(gp middlewares.Processor, removeTags, splitTags []string, extractTitle bool) *HTMLSplitParser
- func (hsp *HTMLSplitParser) ProcessNode(ctx context.Context, n *html.Node) (*html.Node, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewHTMLCommand ¶

func NewHTMLCommand() (*cobra.Command, error)

Types ¶

type HTMLSplitParser ¶

type HTMLSplitParser struct {
	// contains filtered or unexported fields
}

HTMLSplitParser is a GlazeProcessor that splits an HTML document into sections. When encountering one of the tags in splitTags, it extracts the content below the tag as Title (if extractTitle is true) and the following siblings until the next split tag is encountered as body.

func NewHTMLHeadingSplitParser ¶

func NewHTMLHeadingSplitParser(gp middlewares.Processor, removeTags []string) *HTMLSplitParser

NewHTMLHeadingSplitParser creates a new HTMLSplitParser that splits the document into sections and keeps the titles, by splitting at h1, h2, h3...

func NewHTMLSplitParser ¶

func NewHTMLSplitParser(gp middlewares.Processor, removeTags, splitTags []string, extractTitle bool) *HTMLSplitParser

func (*HTMLSplitParser) ProcessNode ¶

func (hsp *HTMLSplitParser) ProcessNode(ctx context.Context, n *html.Node) (*html.Node, error)

ProcessNode extracts the content below a header tag and sends it to the GlazeProcessor. It extracts the header tag content as Title, and the following siblings until the next header tag is encountered as body.

It returns the next node to be parsed (because we need to split a certain amount of sibling nodes).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL