htmlquery

package
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2022 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package htmlquery provides extract data from HTML documents using XPath expression.

Index

Constants

View Source
const (
	SPACES = iota
	COMMA
	UNIVERSAL
	TYPE
	ELEMENT
	CLASS
	ID
	LBRACKET
	RBRACKET
	AttrName
	AttrValue
	EQUALS
	ContainsClass
	DashPrefixed
	StartsWith
	EndsWith
	CONTAINS
	MatchOp
	PseudoClass
	FirstChild
	FirstOfType
	NthChild
	NthOfType
	OnlyChild
	OnlyOfType
	LastChild
	LastOfType
	NOT
	LPAREN
	RPAREN
	COEFFICIENT
	SIGNED
	UNSIGNED
	ODD
	EVEN
	N
	OPERATOR
	PLUS
	MINUS
	BINOMIAL
	AdjacentTo
	PRECEDES
	ParentOf
	AncestorOf
	// and a counter ... I can't believe I didn't think of this sooner
	NumLexemes
)

all types

View Source
const (
	GLOBAL = iota
	LOCAL
)

all Scopes

Variables

View Source
var DisableSelectorCache = false

DisableSelectorCache will disable caching for the query selector if value is true.

View Source
var SelectorCacheMaxEntries = 100

SelectorCacheMaxEntries allows how many selector object can be caching. Default is 50. Will disable caching if SelectorCacheMaxEntries <= 0.

Functions

func CSS2Xpath

func CSS2Xpath(css string, scope Scope) string

CSS2Xpath 将css转为xpath

func Find

func Find(top *html.Node, expr string) []*html.Node

Find is like QueryAll but Will panics if the expression `expr` cannot be parsed.

See `QueryAll()` function.

func FindOne

func FindOne(top *html.Node, expr string) *html.Node

FindOne is like Query but will panics if the expression `expr` cannot be parsed. See `Query()` function.

func InnerText

func InnerText(n *html.Node) string

InnerText returns the text between the start and end tags of the object.

func LoadDoc

func LoadDoc(path string) (*html.Node, error)

LoadDoc loads the HTML document from the specified file path.

func LoadURL

func LoadURL(url string) (*html.Node, error)

LoadURL loads the HTML document from the specified URL.

func OutputHTML

func OutputHTML(n *html.Node, self bool) string

OutputHTML returns the text including tags name.

func Parse

func Parse(r io.Reader) (*html.Node, error)

Parse returns the parse tree for the HTML from the given Reader.

func Query

func Query(top *html.Node, expr string) (*html.Node, error)

Query searches the html.Node that matches by the specified XPath expr, and return the first element of matched html.Node.

Return an error if the expression `expr` cannot be parsed.

func QueryAll

func QueryAll(top *html.Node, expr string) ([]*html.Node, error)

QueryAll searches the html.Node that matches by the specified XPath expr. Return an error if the expression `expr` cannot be parsed.

func QuerySelector

func QuerySelector(top *html.Node, selector *xpath.Expr) *html.Node

QuerySelector returns the first matched html.Node by the specified XPath selector.

func QuerySelectorAll

func QuerySelectorAll(top *html.Node, selector *xpath.Expr) []*html.Node

QuerySelectorAll searches all of the html.Node that matches the specified XPath selectors.

func SelectAttr

func SelectAttr(n *html.Node, name string) (val string)

SelectAttr returns the attribute value with the specified name.

Types

type Lexeme

type Lexeme int

Lexeme Lexeme

type NodeNavigator

type NodeNavigator struct {
	// contains filtered or unexported fields
}

func CreateXPathNavigator

func CreateXPathNavigator(top *html.Node) *NodeNavigator

CreateXPathNavigator creates a new xpath.NodeNavigator for the specified html.Node.

func (*NodeNavigator) Copy

func (h *NodeNavigator) Copy() xpath.NodeNavigator

func (*NodeNavigator) Current

func (h *NodeNavigator) Current() *html.Node

func (*NodeNavigator) LocalName

func (h *NodeNavigator) LocalName() string

func (*NodeNavigator) MoveTo

func (h *NodeNavigator) MoveTo(other xpath.NodeNavigator) bool

func (*NodeNavigator) MoveToChild

func (h *NodeNavigator) MoveToChild() bool

func (*NodeNavigator) MoveToFirst

func (h *NodeNavigator) MoveToFirst() bool

func (*NodeNavigator) MoveToNext

func (h *NodeNavigator) MoveToNext() bool

func (*NodeNavigator) MoveToNextAttribute

func (h *NodeNavigator) MoveToNextAttribute() bool

func (*NodeNavigator) MoveToParent

func (h *NodeNavigator) MoveToParent() bool

func (*NodeNavigator) MoveToPrevious

func (h *NodeNavigator) MoveToPrevious() bool

func (*NodeNavigator) MoveToRoot

func (h *NodeNavigator) MoveToRoot()

func (*NodeNavigator) NodeType

func (h *NodeNavigator) NodeType() xpath.NodeType

func (*NodeNavigator) Prefix

func (*NodeNavigator) Prefix() string

func (*NodeNavigator) String

func (h *NodeNavigator) String() string

func (*NodeNavigator) Value

func (h *NodeNavigator) Value() string

type Scope

type Scope int

Scope Scope

type Selector

type Selector struct {
	Node   *html.Node
	IsRoot bool
}

Selector Selector

func NewSelector

func NewSelector(content []byte) *Selector

NewSelector 通过bytes 生成selector

func (*Selector) Attr

func (s *Selector) Attr(name string) string

Attr 获取节点属性

func (*Selector) CSS

func (s *Selector) CSS(css string) Selectors

CSS 通过CSS选择节点

func (*Selector) HTML

func (s *Selector) HTML() string

HTML 获取节点完整html代码

func (*Selector) InnerHTML

func (s *Selector) InnerHTML() string

InnerHTML 获取节点内html代码

func (*Selector) Text

func (s *Selector) Text() string

Text 获取节点内所有文本

func (*Selector) Xpath

func (s *Selector) Xpath(path string) Selectors

Xpath 通过Xpath选择节点

type Selectors

type Selectors []*Selector

Selectors Selector数组

func (Selectors) Attrs

func (ss Selectors) Attrs(name string) []string

Attrs 获取节点属性

func (Selectors) HTMLs

func (ss Selectors) HTMLs() []string

HTMLs 获取节点完整html代码

func (Selectors) InnerHTMLs

func (ss Selectors) InnerHTMLs() []string

InnerHTMLs 获取节点内html代码

func (Selectors) Texts

func (ss Selectors) Texts() []string

Texts 所有Selector的text列表

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL