crawler

package
v0.0.0-...-58590c3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2018 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

func New

func New(options ...Option) (*Crawler, error)

func (*Crawler) Crawl

func (c *Crawler) Crawl(q *Queue, workers int) *Stats

func (*Crawler) Do

func (c *Crawler) Do(req *Request) (*Response, error)

type Option

type Option func(*Crawler) error

func FollowHosts

func FollowHosts(hosts ...string) Option

func NoFollow

func NoFollow() Option

func UserAgentString

func UserAgentString(s string) Option

func WithContext

func WithContext(ctx context.Context) Option

type Queue

type Queue struct {
	sync.Mutex
	// contains filtered or unexported fields
}

func NewQueue

func NewQueue(size int, ctx context.Context) *Queue

func (*Queue) Close

func (q *Queue) Close()

func (*Queue) Dequeue

func (q *Queue) Dequeue() (*Request, bool)

func (*Queue) Done

func (q *Queue) Done()

Done decrements the number of queue items in-process.

func (*Queue) Enqueue

func (q *Queue) Enqueue(reqs ...*Request)

type Request

type Request struct {
	URL    *url.URL
	Follow bool
}

A Request represents a single URL in a crawling queue.

func NewRequest

func NewRequest(urlStr string, follow bool) (*Request, error)

func (*Request) String

func (c *Request) String() string

type Response

type Response struct {
	Request       *Request
	Duration      time.Duration
	StatusCode    int
	ContentLength int64
	ContentType   string
	URLs          []string
}

A Response is represents the outcome of a CrawlRequest.

func (*Response) String

func (c *Response) String() string

type Stats

type Stats struct {
	sync.Mutex
	TotalRequests int32
	TotalBytes    int64
	StatusCodes   map[int]int32
	MimeTypes     map[string]int32
}

Stats collects transfer statistics for a Crawler.

func (*Stats) AddResponse

func (c *Stats) AddResponse(resp *Response)

AddResponse increments all statistics according to the given response.

func (*Stats) JSON

func (c *Stats) JSON() string

JSON returns a JSON representation of the crawler statistics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL