import "github.com/h12w/getgo"
Package getgo is a concurrent web scrapping framework.
adapter.go doc.go group.go interface.go logger.go runner.go util.go
var RetryNum = 3
RetryNum is the retry number when failed to fetch a page.
Run either HtmlTask, TextTask or Task. tx is commited if successful or rollbacked if failed.
type Atomized struct { StorableTask Tx }
Atomized is an adapter that converts a StorableTask to an atomized Task that supports transaction.
Handle implements the Handle method of Task interface.
type ConcurrentRunner struct {
// contains filtered or unexported fields
}
ConcurrentRunner runs tasks concurrently.
func NewConcurrentRunner(workerNum int, client Doer, errHandler ErrorHandler) ConcurrentRunner
NewConcurrentRunner creates a concurrent runner.
func (r ConcurrentRunner) Close()
Close implements the Close method of the Runner interface.
func (r ConcurrentRunner) Run(task Task) error
Run implements the Run method of the Runner interface.
Doer processes an HTTP request and returns an HTTP response.
ErrorHandler is used to call back an external error handler when a task fails.
ErrorHandlerFunc converts a function object to a ErrorHandler interface.
HandleError implements ErrorHandler interface.
HTMLTask is an HTML task should be able to Parse an HTML node tree to a slice of objects.
type HTTPLogger struct {
// contains filtered or unexported fields
}
HTTPLogger wraps an HTTP client and logs the request and network speed.
func NewHTTPLogger(client *http.Client) *HTTPLogger
NewHTTPLogger creates an HTTPLogger by inspecting the connection's Read method of an http.Client.
Do implements the Doer interface.
Requester is the interface that returns an HTTP request by Request method. The Request method must be implemented to allow repeated calls.
RetryDoer wraps a Doer and implements the retry operation for Do method.
Do implements the Doer interface.
type Runner interface { Run(task Task) error // Run runs a task Close() // Close closes the runner }
Runner runs Tasks. A Runner gets an HTTP request from a Task, get the HTTP response and pass the response to the Task's Handle method. When a runner failed to get a response object, a nil response must still be passed to the Handle method to notify that a transaction must be rolled back if any.
type SequentialRunner struct { Client Doer ErrorHandler }
SequentialRunner is a simple single threaded task runner.
func (r SequentialRunner) Close()
Close implements the Close method of the Runner interface.
func (r SequentialRunner) Run(task Task) error
Run implements the Run method of the Runner interface.
Storable is an adapter that converts a TextTask to a StorableTask.
Handle implements the Handle method of StorableTask interface.
StorableTask is a task that should be able to store data with a Storer passed to the Handle method.
Storer provides the Store method to store an object parsed from an HTTP response.
Task is an HTTP crawler task. It must provide an HTTP request and a method to handle an HTTP response.
ToTask adapts an HTMLTask, TextTask or Task itself to a Task.
TaskGroup makes a group of StorableTask as a single transaction.
NewTaskGroup creates a TaskGroup from a trasaction object.
func (g *TaskGroup) Add(task StorableTask)
Add a StorableTask to TaskGroup.
Run all tasks within a TaskGroup.
Text is an adapter that converts an HTMLTask to a TextTask.
Handle implements the Handle method of TextTask interface.
TextTask is a task that only retrieves a Response's body.
Tx is a transaction interface that provides methods for storing objects, commit or rollback changes. Notice that there is no Delete method defined. Tx's implementation must allow concurrent use.
Path | Synopsis |
---|---|
db | Package db contains common interface that all implementations under db directory must satisfy. |
db/postgres | |
db/schema | |
util |
Package getgo imports 9 packages (graph). Updated 2018-11-16. Refresh now. Tools for package owners.