parser

package
v0.0.0-...-f41957a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 21, 2021 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type HTML

type HTML struct{}

HTML represents a HTML parser implementation.

func (*HTML) FindAttrMap

func (h *HTML) FindAttrMap(r io.Reader, q QueryAttrMap, res crawler.QueryResult) error

FindAttrMap parses HTML documents with multiple queries and retrieves the corresponding attributes of found elements. All queries and related attributes are stores within QueryAttrMap. All results will be stored in QueryResult.

Example: p.FindAttrMap(body, QueryAttrMap{"div": "class"}, QueryResult{}).

func (*HTML) FindContent

func (h *HTML) FindContent(r io.Reader, query string) string

FindContent searches for the element that satisfies the query and returns its text node as content. By default, it'll return very first element from a set.

type Parser

type Parser interface {
	// FindContent searches for only one element and returns its text content.
	FindContent(r io.Reader, query string) string

	// FindAttrMap searches by multiple queries and retrieves corresponding attributes
	// of found elements. All queries and related attributes stores within QueryAttrMap.
	// All results will be stored in QueryResult.
	FindAttrMap(io.Reader, QueryAttrMap, crawler.QueryResult) error
}

Parser describes a generic set of parser functions.

type QueryAttrMap

type QueryAttrMap map[string]string

QueryAttrMap stores a mapping of parser query and an attribute which data should be retrieved. Example: QueryAttrMap{"div": "class"}. Here div elements should be found and the class value of each will be collected.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL