extractor

package
v0.0.0-...-9ce7f06 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 7, 2019 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetImg

func GetImg(t html.Token, tokenType html.TokenType) string

GetImg is an `extractor.CheckFunc` used to retrieve image URLs from a web page. It uses `t` as the token to analyse and its `tokenType`. It returns the link value or an empty `string` if `t` does not correspond to a link.

func GetLinkBasic

func GetLinkBasic(t html.Token, tokenType html.TokenType) string

GetLinkBasic is an `extractor.CheckFunc` used to retrieve link URLs from a web page. It uses `t` as the token to analyse and its `tokenType`. It returns the link value or an empty `string` if `t` does not correspond to a link.

NOTE: This function ignores the `nofollow` meta tag.

func GetLinkNoFollow

func GetLinkNoFollow(t html.Token, tokenType html.TokenType) string

GetLinkNoFollow is an `extractor.CheckFunc` used to retrieve link URLs from a web page. It uses `t` as the token to analyse and its `tokenType`. It returns the link value or an empty `string` if `t` does not correspond to a link.

NOTE: This function respect the `nofollow` meta tag.

Types

type CheckFunc

type CheckFunc func(html.Token, html.TokenType) string

CheckFunc is a named type representing a function that checks if an `html.Token` has a link that can be crawled.

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor is a `struct` that extracts links found in a web page according to the results of its inner `CheckFunc` functions.

func NewExtractor

func NewExtractor(checkFuncs ...CheckFunc) *Extractor

NewExtractor returns a new `*extractor.Extractor`.

func (e *Extractor) ExtractLinks(baseURL string, content []byte) []string

ExtractLinks extracts, cleans and returns a `[]string` of links found in `content` and matching any `e.cf` function.

func (*Extractor) Pipe

func (e *Extractor) Pipe(wg *sync.WaitGroup, in <-chan *domain.Target, out chan<- *domain.Target)

Pipe connects `in` and `out` together. Any `*domain.Target` received from `in` will be parsed and extracted links will be sent to `out`.

NOTE: This function will loop over a channel until `in` is closed. After that it will close `out`.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL