parser

package
v0.0.0-...-04f6dc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 16, 2017 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

View Source
const (
	TagA      = "a"
	TagLink   = "link"
	TagImg    = "img"
	TagScript = "script"
)

HTML tags we care about

View Source
const (
	AttrHref = "href"
	AttrSrc  = "src"
)

Attribute types we look for,

Variables

View Source
var ByToken = Func(func(body []byte) (Results, error) {
	tokenizer := html.NewTokenizer(bytes.NewReader(body))
	results := Results{}
	for {
		tokenType := tokenizer.Next()
		switch tokenType {

		case html.ErrorToken:
			err := tokenizer.Err()
			if err == io.EOF {
				return results, nil
			}
			return results, err

		case html.StartTagToken:
			token := tokenizer.Token()

			if isTag(token, TagA) {
				href := filterAttrByName(token, AttrHref)
				if href == nil {
					continue
				}
				uri, err := url.Parse(*href)
				if err != nil {
					continue
				}
				results.Links = append(results.Links, uri)
				continue
			}

			if isTag(token, TagImg) || isTag(token, TagScript) {
				src := filterAttrByName(token, AttrSrc)
				if src == nil {
					continue
				}
				results.Assets = append(results.Assets, *src)
			}

			if isTag(token, TagLink) {
				href := filterAttrByName(token, AttrHref)
				if href == nil {
					continue
				}
				results.Assets = append(results.Assets, *href)
				continue
			}

		}
	}
})

ByToken iterates over tokens in the response, pulling out links and assets.

Functions

This section is empty.

Types

type Func

type Func func([]byte) (Results, error)

Func describes the parser function.

func (Func) Parse

func (f Func) Parse(body []byte) (Results, error)

Parse adapts func to the Parser interface.

type Parser

type Parser interface {
	Parse([]byte) (Results, error)
}

Parser allows for different parser implementations. For example, it may be possible to get a speed increase at the expense of accuracy by using regex.

type Results

type Results struct {
	Assets []string
	Links  []*url.URL
}

Results encapsulates data we want out of the parser.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL