scrappy

package module
v0.0.0-...-4fc3b5b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2018 License: GPL-3.0 Imports: 5 Imported by: 0

README

Build status GoReport GoDoc License Gitter


Fast and high-level web scraper


Quickstart

go get github.com/oxequa/scrappy

Documentation

You can read the full documentation of Scrappy here.

Contributing

Please read our guideline here.

License

Scrappy is licensed under the GNU GENERAL PUBLIC LICENSE V3.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type A

type A struct {
	*Scrappy
}

All, group of methods that return all occurrence

func (*A) Breadth

func (a *A) Breadth(node *html.Node, filters ...FilterFunc) []*html.Node

Breadth return nodes using first breadth algorithm

func (*A) Child

func (a *A) Child(node *html.Node, filters ...FilterFunc) []*html.Node

Child return child nodes that matches with given filters

func (*A) Depth

func (a *A) Depth(node *html.Node, filters ...FilterFunc) []*html.Node

Depth return nodes using first depth algorithm

func (*A) NextSibling

func (a *A) NextSibling(root *html.Node, filters ...FilterFunc) []*html.Node

Next return next sibling nodes that matches with given filters

func (*A) Parent

func (a *A) Parent(node *html.Node, filters ...FilterFunc) []*html.Node

Parent return parent nodes that matches with given filters

func (*A) PrevSibling

func (a *A) PrevSibling(node *html.Node, filters ...FilterFunc) []*html.Node

Prev return prev sibling nodes that matches with given filters

type F

type F struct {
	*Scrappy
	// contains filtered or unexported fields
}

First, group of methods that return only one occurrence

func (*F) Breadth

func (f *F) Breadth(node *html.Node, filters ...FilterFunc) *html.Node

Breadth return a node using first breadth algorithm, scan all nodes

func (*F) Depth

func (f *F) Depth(node *html.Node, filters ...FilterFunc) *html.Node

Depth return a node using first depth algorithm, scan all nodes

func (*F) FirstChild

func (f *F) FirstChild(node *html.Node, filters ...FilterFunc) *html.Node

Return first child node that matches

func (*F) FirstSibling

func (f *F) FirstSibling(node *html.Node, filters ...FilterFunc) (result *html.Node)

FirstSibling, return first sibling node that matches

func (*F) Index

func (f *F) Index(index int) *F

func (*F) LastChild

func (f *F) LastChild(node *html.Node, filters ...FilterFunc) *html.Node

Return last child, it's like a last sibling with the first child

func (*F) LastSibling

func (f *F) LastSibling(node *html.Node, filters ...FilterFunc) (result *html.Node)

LastSibling, return last sibling node that matches

func (*F) NextSibling

func (f *F) NextSibling(node *html.Node, filters ...FilterFunc) *html.Node

NextSibling, return next sibling that matches

func (*F) Parent

func (f *F) Parent(root *html.Node, filters ...FilterFunc) *html.Node

Return first parent node that matches

func (*F) PrevSibling

func (f *F) PrevSibling(node *html.Node, filters ...FilterFunc) *html.Node

PrevSibling, return prev sibling that matches

type FilterFunc

type FilterFunc func(node *html.Node) bool

FilterFunc is the general definition of a node filter

func Attr

func Attr(val string) FilterFunc

Attr is a filter func that return a node that matches with a given string

func AttrVal

func AttrVal(attr string, val string) FilterFunc

AttrVal is a filter func that return a node that matches with a pair attr/value

func ContainsAttr

func ContainsAttr(val string) FilterFunc

ContainAttr is a filter func that return a node with an attr that contain a given string

func ContainsTag

func ContainsTag(val string) FilterFunc

ContainTag is a filter func that return a node with a tag that contain a given string

func ContainsText

func ContainsText(val string) FilterFunc

ContainText is a filter func that return a node that contain a given string

func ContainsValue

func ContainsValue(val string) FilterFunc

AttrValues is a filter func that return a node with an attr value that contain a given string

func Tag

func Tag(val string) FilterFunc

Tag s a filter func that return a node that matches with a given string

func Text

func Text(val string) FilterFunc

Text is a filter func that return a node that matches with a given string

func Value

func Value(val string) FilterFunc

Values is a filter func that return a node that matches with a given string

type Scrappy

type Scrappy struct {
	*A
	*F
	// contains filtered or unexported fields
}

Scrappy is the cool struct of the lib

func New

func New() *Scrappy

New return a blank scrappy instance

func (*Scrappy) Deep

func (s *Scrappy) Deep(val int) *Scrappy

Deep set deep option and return a new isolated scrappy

func (*Scrappy) Get

func (s *Scrappy) Get(url string) (*html.Node, error)

Get return the content of a given url

func (*Scrappy) Nest

func (s *Scrappy) Nest() *Scrappy

Nest set nested option and return a new isolated scrappy

func (*Scrappy) Parse

func (s *Scrappy) Parse(reader io.Reader) (*html.Node, error)

Parse can be used with any reader

func (*Scrappy) Proxy

func (s *Scrappy) Proxy(proxy string) error

Proxy set a proxy for all requests

func (*Scrappy) Validate

func (s *Scrappy) Validate(node *html.Node, filters ...FilterFunc) bool

Validate validate a node by a list of filters

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL