tokeq

package module
v0.0.0-...-97e91f5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 15, 2016 License: BSD-3-Clause Imports: 8 Imported by: 1

README

tokeq

Match & Callback iterator for html.Node - aimed for performance & HTML document reusability

Build Status GoDoc BSD3License

Benchmark

go test -run=10000 -bench=. -benchmem
Finding html

paragraph

operations time / operation Bytes / operation allocations / operation
BenchmarkGoQueryFindP 5000 277185 ns/op 46059 B/op 929 allocs/op
BenchmarkStandardLibraryTokenFindP 500 3698731 ns/op 291695 B/op 9240 allocs/op
BenchmarkStandardLibraryNodeFindP 20000 78340 ns/op 64 B/op 0 allocs/op
BenchmarkDissectNodesFindP 20000 68640 ns/op 64 B/op 0 allocs/op

Installation

go get github.com/linkosmos/tokeq

Godeps

godep get github.com/linkosmos/tokeq

Usage

package main

import (
	"fmt"
	"net/http"

	"github.com/linkosmos/tokeq"
	"golang.org/x/net/html"
)

func main() {
	response, err := http.Get("https://golang.org/pkg/fmt/")
	if err != nil {
		fmt.Printf("http.Get error: %s", err)
		return
	}

	err = tokeq.ParseResponseWithDefer(response, tokeq.Tok{
		Match: tokeq.MatchP,
		Callback: func(n *html.Node) {
			fmt.Println(tokeq.FindText(n))
		},
	})
	if err != nil {
		fmt.Printf("tokeq.ParseResponseWithDefer error: %s", err)
	}
}

Contributing

  1. Fork it ( https://github.com/linkosmos/tokeq/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add tokeq feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrResponseBodyIsEmpty   = errors.New("http.Response Body is empty")
	ErrResponseBodyIsNotHTML = errors.New("http.Response Body is not HTML type or malformed")
)

-

Functions

func DissectNodes

func DissectNodes(input *html.Node, toks ...Tok)

DissectNodes - range toks through recursively through FindNodes

func FindDeepText

func FindDeepText(n *html.Node) (data string)

FindDeepText - finds text in given & child nodes

func FindNodes

func FindNodes(input *html.Node, match Matcher, callback MatcherCallback)

FindNodes - recursively find nodes

func FindText

func FindText(input *html.Node) (data string)

FindText - finds text in given node

func IsTextHTML

func IsTextHTML(contentType string) bool

IsTextHTML - compare header content type

func MatchA

func MatchA(t html.NodeType, a atom.Atom) bool

func MatchAbbr

func MatchAbbr(t html.NodeType, a atom.Atom) bool

func MatchAddress

func MatchAddress(t html.NodeType, a atom.Atom) bool

func MatchArea

func MatchArea(t html.NodeType, a atom.Atom) bool

func MatchArticle

func MatchArticle(t html.NodeType, a atom.Atom) bool

func MatchAside

func MatchAside(t html.NodeType, a atom.Atom) bool

func MatchAudio

func MatchAudio(t html.NodeType, a atom.Atom) bool

func MatchB

func MatchB(t html.NodeType, a atom.Atom) bool

func MatchBase

func MatchBase(t html.NodeType, a atom.Atom) bool

func MatchBdi

func MatchBdi(t html.NodeType, a atom.Atom) bool

func MatchBdo

func MatchBdo(t html.NodeType, a atom.Atom) bool

func MatchBlockquote

func MatchBlockquote(t html.NodeType, a atom.Atom) bool

func MatchBody

func MatchBody(t html.NodeType, a atom.Atom) bool

func MatchBr

func MatchBr(t html.NodeType, a atom.Atom) bool

func MatchButton

func MatchButton(t html.NodeType, a atom.Atom) bool

func MatchCanvas

func MatchCanvas(t html.NodeType, a atom.Atom) bool

func MatchCaption

func MatchCaption(t html.NodeType, a atom.Atom) bool

func MatchCite

func MatchCite(t html.NodeType, a atom.Atom) bool

func MatchCode

func MatchCode(t html.NodeType, a atom.Atom) bool

func MatchCol

func MatchCol(t html.NodeType, a atom.Atom) bool

func MatchColgroup

func MatchColgroup(t html.NodeType, a atom.Atom) bool

func MatchCommand

func MatchCommand(t html.NodeType, a atom.Atom) bool

func MatchData

func MatchData(t html.NodeType, a atom.Atom) bool

func MatchDatalist

func MatchDatalist(t html.NodeType, a atom.Atom) bool

func MatchDd

func MatchDd(t html.NodeType, a atom.Atom) bool

func MatchDel

func MatchDel(t html.NodeType, a atom.Atom) bool

func MatchDetails

func MatchDetails(t html.NodeType, a atom.Atom) bool

func MatchDfn

func MatchDfn(t html.NodeType, a atom.Atom) bool

func MatchDialog

func MatchDialog(t html.NodeType, a atom.Atom) bool

func MatchDiv

func MatchDiv(t html.NodeType, a atom.Atom) bool

func MatchDl

func MatchDl(t html.NodeType, a atom.Atom) bool

func MatchDt

func MatchDt(t html.NodeType, a atom.Atom) bool

func MatchEm

func MatchEm(t html.NodeType, a atom.Atom) bool

func MatchEmbed

func MatchEmbed(t html.NodeType, a atom.Atom) bool

func MatchFieldset

func MatchFieldset(t html.NodeType, a atom.Atom) bool

func MatchFigcaption

func MatchFigcaption(t html.NodeType, a atom.Atom) bool

func MatchFigure

func MatchFigure(t html.NodeType, a atom.Atom) bool

func MatchFooter

func MatchFooter(t html.NodeType, a atom.Atom) bool

func MatchForm

func MatchForm(t html.NodeType, a atom.Atom) bool

func MatchH1

func MatchH1(t html.NodeType, a atom.Atom) bool

func MatchH2

func MatchH2(t html.NodeType, a atom.Atom) bool

func MatchH3

func MatchH3(t html.NodeType, a atom.Atom) bool

func MatchH4

func MatchH4(t html.NodeType, a atom.Atom) bool

func MatchH5

func MatchH5(t html.NodeType, a atom.Atom) bool

func MatchH6

func MatchH6(t html.NodeType, a atom.Atom) bool

func MatchHead

func MatchHead(t html.NodeType, a atom.Atom) bool

func MatchHeader

func MatchHeader(t html.NodeType, a atom.Atom) bool

func MatchHgroup

func MatchHgroup(t html.NodeType, a atom.Atom) bool

func MatchHr

func MatchHr(t html.NodeType, a atom.Atom) bool

func MatchHtml

func MatchHtml(t html.NodeType, a atom.Atom) bool

func MatchI

func MatchI(t html.NodeType, a atom.Atom) bool

func MatchIframe

func MatchIframe(t html.NodeType, a atom.Atom) bool

func MatchImg

func MatchImg(t html.NodeType, a atom.Atom) bool

func MatchInput

func MatchInput(t html.NodeType, a atom.Atom) bool

func MatchIns

func MatchIns(t html.NodeType, a atom.Atom) bool

func MatchKbd

func MatchKbd(t html.NodeType, a atom.Atom) bool

func MatchKeygen

func MatchKeygen(t html.NodeType, a atom.Atom) bool

func MatchLabel

func MatchLabel(t html.NodeType, a atom.Atom) bool

func MatchLegend

func MatchLegend(t html.NodeType, a atom.Atom) bool

func MatchLi

func MatchLi(t html.NodeType, a atom.Atom) bool
func MatchLink(t html.NodeType, a atom.Atom) bool

func MatchMap

func MatchMap(t html.NodeType, a atom.Atom) bool

func MatchMark

func MatchMark(t html.NodeType, a atom.Atom) bool

func MatchMenu

func MatchMenu(t html.NodeType, a atom.Atom) bool

func MatchMenuitem

func MatchMenuitem(t html.NodeType, a atom.Atom) bool

func MatchMeta

func MatchMeta(t html.NodeType, a atom.Atom) bool

func MatchMeter

func MatchMeter(t html.NodeType, a atom.Atom) bool

func MatchNav

func MatchNav(t html.NodeType, a atom.Atom) bool

func MatchNoscript

func MatchNoscript(t html.NodeType, a atom.Atom) bool

func MatchObject

func MatchObject(t html.NodeType, a atom.Atom) bool

func MatchOl

func MatchOl(t html.NodeType, a atom.Atom) bool

func MatchOptgroup

func MatchOptgroup(t html.NodeType, a atom.Atom) bool

func MatchOption

func MatchOption(t html.NodeType, a atom.Atom) bool

func MatchOutput

func MatchOutput(t html.NodeType, a atom.Atom) bool

func MatchP

func MatchP(t html.NodeType, a atom.Atom) bool

func MatchParam

func MatchParam(t html.NodeType, a atom.Atom) bool

func MatchPre

func MatchPre(t html.NodeType, a atom.Atom) bool

func MatchProgress

func MatchProgress(t html.NodeType, a atom.Atom) bool

func MatchQ

func MatchQ(t html.NodeType, a atom.Atom) bool

func MatchRp

func MatchRp(t html.NodeType, a atom.Atom) bool

func MatchRt

func MatchRt(t html.NodeType, a atom.Atom) bool

func MatchRuby

func MatchRuby(t html.NodeType, a atom.Atom) bool

func MatchS

func MatchS(t html.NodeType, a atom.Atom) bool

func MatchSamp

func MatchSamp(t html.NodeType, a atom.Atom) bool

func MatchScript

func MatchScript(t html.NodeType, a atom.Atom) bool

func MatchSection

func MatchSection(t html.NodeType, a atom.Atom) bool

func MatchSelect

func MatchSelect(t html.NodeType, a atom.Atom) bool

func MatchSmall

func MatchSmall(t html.NodeType, a atom.Atom) bool

func MatchSource

func MatchSource(t html.NodeType, a atom.Atom) bool

func MatchSpan

func MatchSpan(t html.NodeType, a atom.Atom) bool

func MatchStrong

func MatchStrong(t html.NodeType, a atom.Atom) bool

func MatchStyle

func MatchStyle(t html.NodeType, a atom.Atom) bool

func MatchSub

func MatchSub(t html.NodeType, a atom.Atom) bool

func MatchSummary

func MatchSummary(t html.NodeType, a atom.Atom) bool

func MatchSup

func MatchSup(t html.NodeType, a atom.Atom) bool

func MatchTable

func MatchTable(t html.NodeType, a atom.Atom) bool

func MatchTbody

func MatchTbody(t html.NodeType, a atom.Atom) bool

func MatchTd

func MatchTd(t html.NodeType, a atom.Atom) bool

func MatchTemplate

func MatchTemplate(t html.NodeType, a atom.Atom) bool

func MatchTextarea

func MatchTextarea(t html.NodeType, a atom.Atom) bool

func MatchTfoot

func MatchTfoot(t html.NodeType, a atom.Atom) bool

func MatchTh

func MatchTh(t html.NodeType, a atom.Atom) bool

func MatchThead

func MatchThead(t html.NodeType, a atom.Atom) bool

func MatchTime

func MatchTime(t html.NodeType, a atom.Atom) bool

func MatchTitle

func MatchTitle(t html.NodeType, a atom.Atom) bool

func MatchTr

func MatchTr(t html.NodeType, a atom.Atom) bool

func MatchTrack

func MatchTrack(t html.NodeType, a atom.Atom) bool

func MatchU

func MatchU(t html.NodeType, a atom.Atom) bool

func MatchUl

func MatchUl(t html.NodeType, a atom.Atom) bool

func MatchVar

func MatchVar(t html.NodeType, a atom.Atom) bool

func MatchVideo

func MatchVideo(t html.NodeType, a atom.Atom) bool

func MatchWbr

func MatchWbr(t html.NodeType, a atom.Atom) bool

func ParseReader

func ParseReader(input io.Reader, toks ...Tok) error

ParseReader - parses io.Reader, expected input is HTML page

func ParseResponse

func ParseResponse(response *http.Response, toks ...Tok) error

ParseResponse - wrapps sequence of URL fate functions user is response to handle: defer response.Body.Close()

func ParseResponseWithDefer

func ParseResponseWithDefer(response *http.Response, toks ...Tok) error

ParseResponseWithDefer - same as ParseResponse but with defer response.Body.Close()

func PrettyPrint

func PrettyPrint(n *html.Node) string

PrettyPrint - pretty print node

func PrintNode

func PrintNode(n *html.Node)

PrintNode - convenient to use as Tok.Callback to see html.Node dump

func PrintNodes

func PrintNodes(n *html.Node)

PrintNodes - convenient to use as Tok.Callback to see html.Node siblings & parents

Types

type Matcher

type Matcher func(tt html.NodeType, a atom.Atom) bool

Matcher - used in Tok

type MatcherCallback

type MatcherCallback func(t *html.Node)

MatcherCallback - uses in Tok as a callback when match occurs. Contents of the (t *html.Token) may change on the next call to Next.

type Tok

type Tok struct {
	Match    Matcher
	Callback MatcherCallback
}

Tok - contains html.Node Matcher & Callback

type Toks

type Toks []Tok

Toks - array of Tok's

func (*Toks) Add

func (toks *Toks) Add(tok ...Tok)

Add - add new Tok to Toks

func (*Toks) Iterate

func (toks *Toks) Iterate(t *html.Node)

Iterate - iterate through tok's and if there is a match send *token to calback

Directories

Path Synopsis
Godeps
_workspace/src/github.com/PuerkitoBio/goquery
Package goquery implements features similar to jQuery, including the chainable syntax, to manipulate and query an HTML document.
Package goquery implements features similar to jQuery, including the chainable syntax, to manipulate and query an HTML document.
_workspace/src/github.com/andybalholm/cascadia
The cascadia package is an implementation of CSS selectors.
The cascadia package is an implementation of CSS selectors.
_workspace/src/golang.org/x/net/html
Package html implements an HTML5-compliant tokenizer and parser.
Package html implements an HTML5-compliant tokenizer and parser.
_workspace/src/golang.org/x/net/html/atom
Package atom provides integer codes (also known as atoms) for a fixed set of frequently occurring HTML strings: tag names and attribute keys such as "p" and "id".
Package atom provides integer codes (also known as atoms) for a fixed set of frequently occurring HTML strings: tag names and attribute keys such as "p" and "id".
_workspace/src/golang.org/x/net/html/charset
Package charset provides common text encodings for HTML documents.
Package charset provides common text encodings for HTML documents.
_workspace/src/golang.org/x/text/encoding
Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8.
Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8.
_workspace/src/golang.org/x/text/encoding/charmap
Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.
Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.
_workspace/src/golang.org/x/text/encoding/htmlindex
Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5.
Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5.
_workspace/src/golang.org/x/text/encoding/ianaindex
Package ianaindex maps names to Encodings as specified by the IANA registry.
Package ianaindex maps names to Encodings as specified by the IANA registry.
_workspace/src/golang.org/x/text/encoding/internal
Package internal contains code that is shared among encoding implementations.
Package internal contains code that is shared among encoding implementations.
_workspace/src/golang.org/x/text/encoding/internal/identifier
Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8.
Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8.
_workspace/src/golang.org/x/text/encoding/japanese
Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.
Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.
_workspace/src/golang.org/x/text/encoding/korean
Package korean provides Korean encodings such as EUC-KR.
Package korean provides Korean encodings such as EUC-KR.
_workspace/src/golang.org/x/text/encoding/simplifiedchinese
Package simplifiedchinese provides Simplified Chinese encodings such as GBK.
Package simplifiedchinese provides Simplified Chinese encodings such as GBK.
_workspace/src/golang.org/x/text/encoding/traditionalchinese
Package traditionalchinese provides Traditional Chinese encodings such as Big5.
Package traditionalchinese provides Traditional Chinese encodings such as Big5.
_workspace/src/golang.org/x/text/encoding/unicode
Package unicode provides Unicode encodings such as UTF-16.
Package unicode provides Unicode encodings such as UTF-16.
_workspace/src/golang.org/x/text/transform
Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations.
Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL