html_util

package

v0.0.5 Latest Latest Go to latest Published: Oct 1, 2022 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/rbnbr/go-html-utils

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func GetAttributeByKey(node *html.Node, key string) (html.Attribute, error)
func GetChildren(node *html.Node) []*html.Node
func GetElementNodeByTagName(name string, startNode *html.Node) *html.Node
func GetElementsInTableRowByConditionForOneOfTheElements(tableNode *html.Node, cond func(n *html.Node) bool) []*html.Node
func GetFirstTextNode(startNode *html.Node) *html.Node
func GetFirstTextNodeWithCondition(startNode *html.Node, cond func(s string) bool) *html.Node
func GetNextNodeByCondition(startNode *html.Node, cond func(node *html.Node) bool) *html.Node
func GetNextNodesByCondition(startNode *html.Node, cond func(node *html.Node) bool) []*html.Node
func GetNodeByCondition(startNode *html.Node, cond func(node *html.Node) bool) *html.Node
func GetNodesByCondition(startNode *html.Node, cond func(node *html.Node) bool) []*html.Node
func GetTextNodes(startNode *html.Node) []*html.Node
func GetTextNodesByCondition(startNode *html.Node, cond func(s string) bool) []*html.Node
func MakeByAttributeNameAndValueCondition(attributeName, attributeValue string) func(node *html.Node) bool
func MakeByClassNameCondition(className string) func(node *html.Node) bool
func MakeByIdCondition(id string) func(node *html.Node) bool
func MakeByTagNameCondition(name string) func(node *html.Node) bool
func MakeTextNodeComposite(textNodes []*html.Node, compositeRune string) string
func MakeTextNodeCompositeWithNormalizerFunc(textNodes []*html.Node, compositeDelimiter string, ...) string
func ParseSelectHTMLNode(selectNode *html.Node) (map[string]string, string, error)
func WalkHtmlTree(node *html.Node, f func(n *html.Node) bool)
type HtmlTable
- func ParseHtmlTable(tableNode *html.Node, hasHeaderRow bool, hasIndexColumn bool, suffix string) (*HtmlTable, error)
- func ParseHtmlTableWithNormalizer(tableNode *html.Node, hasHeaderRow bool, hasIndexColumn bool, suffix string, ...) (*HtmlTable, error)
- func (ht HtmlTable) GetColumnByIndex(j int) ([]string, string)
- func (ht HtmlTable) GetColumnByKey(key string) ([]string, int, bool)
- func (ht HtmlTable) GetColumnByKeyNum(key string, occurrence int) ([]string, int, bool)
- func (ht HtmlTable) GetElementByIndex(i, j int) string
- func (ht HtmlTable) GetElementByKeys(rowKey, columnKey string) (string, int, int, bool)
- func (ht HtmlTable) GetElementByKeysNum(rowKey, columnKey string, rowOccurrence, columnOccurrence int) (string, int, int, bool)
- func (ht HtmlTable) GetRowByIndex(i int) ([]string, string)
- func (ht HtmlTable) GetRowByKey(key string) ([]string, int, bool)
- func (ht HtmlTable) GetRowByKeyNum(key string, occurrence int) ([]string, int, bool)

Constants ¶

This section is empty.

Variables ¶

View Source

var TextRegex = regexp.MustCompile("[^!-~]") // without space

Functions ¶

func GetAttributeByKey ¶

func GetAttributeByKey(node *html.Node, key string) (html.Attribute, error)

func GetChildren ¶

func GetChildren(node *html.Node) []*html.Node

GetChildren Same as below, return slice of pointers, even though considered bad practice, to be able to directly modify substructures of a bigger tree.

func GetElementNodeByTagName ¶

func GetElementNodeByTagName(name string, startNode *html.Node) *html.Node

GetElementNodeByTagName Returns the first node with the given tag name provided a starting node Returns nil if none found

func GetElementsInTableRowByConditionForOneOfTheElements ¶

func GetElementsInTableRowByConditionForOneOfTheElements(tableNode *html.Node, cond func(n *html.Node) bool) []*html.Node

GetElementsInTableRowByConditionForOneOfTheElements Returns all children elements (with tag <td>) of the table row node with tag (<tr>), for which at least one children fulfills the provided condition cond

func GetFirstTextNode ¶

func GetFirstTextNode(startNode *html.Node) *html.Node

func GetFirstTextNodeWithCondition ¶

func GetFirstTextNodeWithCondition(startNode *html.Node, cond func(s string) bool) *html.Node

func GetNextNodeByCondition ¶

func GetNextNodeByCondition(startNode *html.Node, cond func(node *html.Node) bool) *html.Node

GetNextNodeByCondition Returns the first node for which the provided condition yields true, excluding the start node

func GetNextNodesByCondition ¶

func GetNextNodesByCondition(startNode *html.Node, cond func(node *html.Node) bool) []*html.Node

GetNextNodesByCondition Return all nodes in the tree of startNode for which the provided condition yields true, excluding startNode. Note that this returns a slice with pointers to structs which is considered bad practice However, we do not want copies to the nodes but the actual pointers in case we want to modify nodes in part of a bigger tree structure.

func GetNodeByCondition ¶

func GetNodeByCondition(startNode *html.Node, cond func(node *html.Node) bool) *html.Node

GetNodeByCondition Returns the first node for which the provided condition yields true, including the start node

func GetNodesByCondition ¶

func GetNodesByCondition(startNode *html.Node, cond func(node *html.Node) bool) []*html.Node

GetNodesByCondition Return all nodes in the tree of startNode for which the provided condition yields true, including startNode. Note that this returns a slice with pointers to structs which is considered bad practice However, we do not want copies to the nodes but the actual pointers in case we want to modify nodes in part of a bigger tree structure.

func GetTextNodes ¶

func GetTextNodes(startNode *html.Node) []*html.Node

func GetTextNodesByCondition ¶

func GetTextNodesByCondition(startNode *html.Node, cond func(s string) bool) []*html.Node

func MakeByAttributeNameAndValueCondition ¶

func MakeByAttributeNameAndValueCondition(attributeName, attributeValue string) func(node *html.Node) bool

func MakeByClassNameCondition ¶

func MakeByClassNameCondition(className string) func(node *html.Node) bool

func MakeByIdCondition ¶

func MakeByIdCondition(id string) func(node *html.Node) bool

func MakeByTagNameCondition ¶

func MakeByTagNameCondition(name string) func(node *html.Node) bool

func MakeTextNodeComposite ¶

func MakeTextNodeComposite(textNodes []*html.Node, compositeRune string) string

func MakeTextNodeCompositeWithNormalizerFunc ¶

func MakeTextNodeCompositeWithNormalizerFunc(textNodes []*html.Node, compositeDelimiter string, normalizerFunc func(string) string) string

func ParseSelectHTMLNode ¶

func ParseSelectHTMLNode(selectNode *html.Node) (map[string]string, string, error)

ParseSelectHTMLNode Parses the html node with tag 'select' into its different options. Returns a map containing key: value as strings, in which key is the content text content of the option and value is the content of the 'value' attribute of this option.

If multiple options have the same content text, they will be overridden and only the last one is kept. Returns the currently selected option, which is the option with attribute 'selected' if it exists, otherwise the first occurring option.

If multiple options have the "selected" attribute, returns the last option that has it as "selectedOption" Returns nil map and nil error if no options were found.

func WalkHtmlTree ¶

func WalkHtmlTree(node *html.Node, f func(n *html.Node) bool)

WalkHtmlTree Calls f on node. If it returns true, call WalkHtmlTree on all of its children.

Types ¶

type HtmlTable ¶

type HtmlTable struct {
	Headers   []string   // Headers, equal to TableData[0, :] in numpy expression
	Index     []string   // Index, equal to TableData[:, 0] in numpy expression
	TableData [][]string // All data excluding headers and index
	// contains filtered or unexported fields
}

HtmlTable Represents an HTML table in a struct Contains only text content

func ParseHtmlTable ¶

func ParseHtmlTable(tableNode *html.Node, hasHeaderRow bool, hasIndexColumn bool, suffix string) (*HtmlTable, error)

ParseHtmlTable Parses a given html.Node which should point to a <table> ElementNode in a html tree to an HtmlTable Struct which can be used to easily look up existing indices, headers, and values. Content is set after normalizing with identity normalizer func, normalizer(s) = s. we append '{suffix}_{keyCount}' to keys which appear multiple times to make them unique. the first occurrence does not have this.

func ParseHtmlTableWithNormalizer ¶

func ParseHtmlTableWithNormalizer(tableNode *html.Node, hasHeaderRow bool, hasIndexColumn bool, suffix string, normalizerFunc func(string) string, allowCompositeTexts bool, compositeDelimiter string) (*HtmlTable, error)

ParseHtmlTableWithNormalizer Parses a given html.Node which should point to a <table> ElementNode in a html tree to an HtmlTable Struct which can be used to easily look up existing indices, headers, and values. Content is set after normalizing with normalizerFunc we append '{suffix}_{keyCount}' to keys which appear multiple times to make them unique. the first occurrence does not have this. TODO: describe the meaning of allowCompositeTexts and compositeDelimiter parameters

func (HtmlTable) GetColumnByIndex ¶

func (ht HtmlTable) GetColumnByIndex(j int) ([]string, string)

GetColumnByIndex Analogous to GetRowByIndex but for columns. You can check the length of columns via the length of the Headers. GetColumnByIndex(0) returns the index column.

func (HtmlTable) GetColumnByKey ¶

func (ht HtmlTable) GetColumnByKey(key string) ([]string, int, bool)

GetColumnByKey Analogous to GetRowByKey but for columns.

func (HtmlTable) GetColumnByKeyNum ¶

func (ht HtmlTable) GetColumnByKeyNum(key string, occurrence int) ([]string, int, bool)

GetColumnByKeyNum Returns the column with the original key (with possibly multiple occurrences) and the num occurrence

func (HtmlTable) GetElementByIndex ¶

func (ht HtmlTable) GetElementByIndex(i, j int) string

GetElementByIndex Returns the element in table data for row i and column j. Panics if either is out of bounds.

func (HtmlTable) GetElementByKeys ¶

func (ht HtmlTable) GetElementByKeys(rowKey, columnKey string) (string, int, int, bool)

GetElementByKeys Returns the element in table data with the provided row key and column key. returns "", false if at least one key is missing.

func (HtmlTable) GetElementByKeysNum ¶

func (ht HtmlTable) GetElementByKeysNum(rowKey, columnKey string, rowOccurrence, columnOccurrence int) (string, int, int, bool)

GetElementByKeysNum Returns the element in table data with the provided row key and column key and the corresponding occurrences. returns "", false if at least one key is missing.

func (HtmlTable) GetRowByIndex ¶

func (ht HtmlTable) GetRowByIndex(i int) ([]string, string)

GetRowByIndex Returns a copy of the table row with index i as well as the key of the corresponding index. panics if the row is out of bounds You can check the length of rows via the length of the index. GetRowByIndex(0) returns the header row. GetRowByIndex(1) returns the first row below the header row, and so on. Note: There is always a header row. Even if during parsing no header row was specified, the resulting table will have an artificial header row like (Index 1 2 3 4 ...)

func (HtmlTable) GetRowByKey ¶

func (ht HtmlTable) GetRowByKey(key string) ([]string, int, bool)

GetRowByKey Returns the copy of the row with the given key as index if it exists, else, returns (nil, false)

func (HtmlTable) GetRowByKeyNum ¶

func (ht HtmlTable) GetRowByKeyNum(key string, occurrence int) ([]string, int, bool)

GetRowByKeyNum Returns the row with the original key (with possibly multiple occurrences) and the num occurrence

Source Files ¶

View all Source files

html.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL