getgo: github.com/h12w/getgo Index | Files | Directories

package getgo

import "github.com/h12w/getgo"

Package getgo is a concurrent web scrapping framework.

Index

Package Files

adapter.go doc.go group.go interface.go logger.go runner.go util.go

Variables

var RetryNum = 3

RetryNum is the retry number when failed to fetch a page.

func Run Uses

func Run(runner Runner, tx Tx, tasks ...interface{}) error

Run either HtmlTask, TextTask or Task. tx is commited if successful or rollbacked if failed.

type Atomized Uses

type Atomized struct {
    StorableTask
    Tx
}

Atomized is an adapter that converts a StorableTask to an atomized Task that supports transaction.

func (Atomized) Handle Uses

func (h Atomized) Handle(resp *http.Response) error

Handle implements the Handle method of Task interface.

type ConcurrentRunner Uses

type ConcurrentRunner struct {
    // contains filtered or unexported fields
}

ConcurrentRunner runs tasks concurrently.

func NewConcurrentRunner Uses

func NewConcurrentRunner(workerNum int, client Doer, errHandler ErrorHandler) ConcurrentRunner

NewConcurrentRunner creates a concurrent runner.

func (ConcurrentRunner) Close Uses

func (r ConcurrentRunner) Close()

Close implements the Close method of the Runner interface.

func (ConcurrentRunner) Run Uses

func (r ConcurrentRunner) Run(task Task) error

Run implements the Run method of the Runner interface.

type Doer Uses

type Doer interface {
    Do(req *http.Request) (resp *http.Response, err error)
}

Doer processes an HTTP request and returns an HTTP response.

type ErrorHandler Uses

type ErrorHandler interface {
    HandleError(request *http.Request, err error) error
}

ErrorHandler is used to call back an external error handler when a task fails.

type ErrorHandlerFunc Uses

type ErrorHandlerFunc func(*http.Request, error) error

ErrorHandlerFunc converts a function object to a ErrorHandler interface.

func (ErrorHandlerFunc) HandleError Uses

func (f ErrorHandlerFunc) HandleError(request *http.Request, err error) error

HandleError implements ErrorHandler interface.

type HTMLTask Uses

type HTMLTask interface {
    Requester
    Handle(root *query.Node, s Storer) error
}

HTMLTask is an HTML task should be able to Parse an HTML node tree to a slice of objects.

type HTTPLogger Uses

type HTTPLogger struct {
    // contains filtered or unexported fields
}

HTTPLogger wraps an HTTP client and logs the request and network speed.

func NewHTTPLogger Uses

func NewHTTPLogger(client *http.Client) *HTTPLogger

NewHTTPLogger creates an HTTPLogger by inspecting the connection's Read method of an http.Client.

func (*HTTPLogger) Do Uses

func (l *HTTPLogger) Do(req *http.Request) (resp *http.Response, err error)

Do implements the Doer interface.

type Requester Uses

type Requester interface {
    Request() *http.Request
}

Requester is the interface that returns an HTTP request by Request method. The Request method must be implemented to allow repeated calls.

type RetryDoer Uses

type RetryDoer struct {
    Doer
    RetryTime int
}

RetryDoer wraps a Doer and implements the retry operation for Do method.

func (RetryDoer) Do Uses

func (d RetryDoer) Do(req *http.Request) (resp *http.Response, err error)

Do implements the Doer interface.

type Runner Uses

type Runner interface {
    Run(task Task) error // Run runs a task
    Close()              // Close closes the runner
}

Runner runs Tasks. A Runner gets an HTTP request from a Task, get the HTTP response and pass the response to the Task's Handle method. When a runner failed to get a response object, a nil response must still be passed to the Handle method to notify that a transaction must be rolled back if any.

type SequentialRunner Uses

type SequentialRunner struct {
    Client Doer
    ErrorHandler
}

SequentialRunner is a simple single threaded task runner.

func (SequentialRunner) Close Uses

func (r SequentialRunner) Close()

Close implements the Close method of the Runner interface.

func (SequentialRunner) Run Uses

func (r SequentialRunner) Run(task Task) error

Run implements the Run method of the Runner interface.

type Storable Uses

type Storable struct {
    TextTask
}

Storable is an adapter that converts a TextTask to a StorableTask.

func (Storable) Handle Uses

func (b Storable) Handle(resp *http.Response, s Storer) error

Handle implements the Handle method of StorableTask interface.

type StorableTask Uses

type StorableTask interface {
    Requester
    Handle(resp *http.Response, s Storer) error
}

StorableTask is a task that should be able to store data with a Storer passed to the Handle method.

type Storer Uses

type Storer interface {
    Store(v interface{}) error
}

Storer provides the Store method to store an object parsed from an HTTP response.

type Task Uses

type Task interface {
    Requester
    Handle(resp *http.Response) error
}

Task is an HTTP crawler task. It must provide an HTTP request and a method to handle an HTTP response.

func ToTask Uses

func ToTask(t interface{}, tx Tx) Task

ToTask adapts an HTMLTask, TextTask or Task itself to a Task.

type TaskGroup Uses

type TaskGroup struct {
    Tx
    // contains filtered or unexported fields
}

TaskGroup makes a group of StorableTask as a single transaction.

func NewTaskGroup Uses

func NewTaskGroup(tx Tx) *TaskGroup

NewTaskGroup creates a TaskGroup from a trasaction object.

func (*TaskGroup) Add Uses

func (g *TaskGroup) Add(task StorableTask)

Add a StorableTask to TaskGroup.

func (*TaskGroup) Run Uses

func (g *TaskGroup) Run(runner Runner) error

Run all tasks within a TaskGroup.

type Text Uses

type Text struct {
    HTMLTask
}

Text is an adapter that converts an HTMLTask to a TextTask.

func (Text) Handle Uses

func (t Text) Handle(r io.Reader, s Storer) error

Handle implements the Handle method of TextTask interface.

type TextTask Uses

type TextTask interface {
    Requester
    Handle(r io.Reader, s Storer) error
}

TextTask is a task that only retrieves a Response's body.

type Tx Uses

type Tx interface {
    Storer
    Commit() error
    Rollback() error
}

Tx is a transaction interface that provides methods for storing objects, commit or rollback changes. Notice that there is no Delete method defined. Tx's implementation must allow concurrent use.

Directories

PathSynopsis
dbPackage db contains common interface that all implementations under db directory must satisfy.
db/postgres
db/schema
util

Package getgo imports 9 packages (graph). Updated 2018-11-16. Refresh now. Tools for package owners.