mellivora

package module
v0.0.0-...-17a28f7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2024 License: MIT Imports: 19 Imported by: 0

README

mellivora

Build Status codecov.io Go Report

程序设计

  1. spider 产生请求发送给 engine
  2. engine 把请求发给 scheduler 调度
  3. engine 从 scheduler 取到待执行的请求
  4. engine 把请求发给 middleware 进行包装
  5. middleware 把请求发给 downloader 下载
  6. downloader 下载
  7. middleware 对 response 进行处理
  8. middleware 把 response 发送给 engine
  9. engine 把返回给 spider,回到开头

TODO

  • Context 序列化, 支持远程任务
  • 优化 Spider 接口
  • 增加扩展功能
  • 添加更多常用中间件

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewMiddleware

func NewMiddleware(next func(MiddlewareFunc) MiddlewareFunc) *middleware

NewMiddleware create a middleware instance nolint

Types

type BloomFilter

type BloomFilter struct {
	*bloom.BloomFilter
}

func NewBloomFilter

func NewBloomFilter() *BloomFilter

func (*BloomFilter) Add

func (b *BloomFilter) Add(c *Context)

func (*BloomFilter) Exist

func (b *BloomFilter) Exist(c *Context) bool

type Closable

type Closable interface {
	// Close release all resources used by this object, including goroutines.
	Close() error
}

Closable is the interface for objects that can release its resources.

type Context

type Context struct {
	*Response
	// contains filtered or unexported fields
}

Context represents the context of the current HTTP request. It holds request and response objects, path, path parameters, data and registered handler.

func NewContext

func NewContext(core *Engine, request *http.Request, handler HandleFunc) *Context

NewContext returns a Context instance.

func (*Context) Engine

func (c *Context) Engine() *Engine

Engine returns the `Engine` instance.

func (*Context) GetDepth

func (c *Context) GetDepth() int64

GetDepth returns `depth`.

func (*Context) GetDontFilter

func (c *Context) GetDontFilter() bool

GetDontFilter returns `depth`.

func (*Context) GetRequest

func (c *Context) GetRequest() *http.Request

GetRequest returns `*http.Request`.

func (*Context) MarshalText

func (c *Context) MarshalText() ([]byte, error)

func (*Context) MustValue

func (c *Context) MustValue(k string, v interface{})

func (*Context) Set

func (c *Context) Set(k string, v interface{})

func (*Context) SetDepth

func (c *Context) SetDepth(depth int64)

SetDepth sets `depth`.

func (*Context) SetDontFilter

func (c *Context) SetDontFilter(dontFilter bool)

SetDontFilter sets `depth`.

func (*Context) SetRequest

func (c *Context) SetRequest(req *http.Request)

SetRequest sets `*http.Request`.

func (*Context) SetResponse

func (c *Context) SetResponse(response *Response)

SetResponse sets `*Response`.

func (*Context) SetRoundTripper

func (c *Context) SetRoundTripper(rt http.RoundTripper)

SetRoundTripper sets `http.RoundTripper`.

func (*Context) UnmarshalText

func (c *Context) UnmarshalText(text []byte) error

func (*Context) Value

func (c *Context) Value(k string, v interface{}) error

type ContextSerializable

type ContextSerializable struct {
	HandlerName  string
	RequestBytes []byte
	URL          *url.URL
	Setter       setter
}

type ContextSerializer

type ContextSerializer struct {
	// contains filtered or unexported fields
}

func NewContextSerializer

func NewContextSerializer() *ContextSerializer

func (*ContextSerializer) Marshal

func (cs *ContextSerializer) Marshal(c *Context) (bs []byte, err error)

func (*ContextSerializer) Unmarshal

func (cs *ContextSerializer) Unmarshal(bs []byte) (c *Context, err error)

type Downloader

type Downloader struct{}

func NewDownloader

func NewDownloader() *Downloader

func (*Downloader) Next

func (d *Downloader) Next(handler MiddlewareFunc) MiddlewareFunc

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine is the top-level framework instance.

func NewEngine

func NewEngine(concurrency uint64) *Engine

NewEngine creates an instance of Engine.

func (*Engine) Close

func (e *Engine) Close()

Close immediately stops the server.

func (*Engine) Logger

func (e *Engine) Logger() *zap.Logger

Logger returns `*log.Logger`.

func (*Engine) Run

func (e *Engine) Run(spider Spider)

Run run a spider.

func (*Engine) SetLogger

func (e *Engine) SetLogger(l *zap.Logger)

SetLogger sets `*log.Logger`.

func (*Engine) Shutdown

func (e *Engine) Shutdown()

Shutdown stops the server gracefully.

func (*Engine) Use

func (e *Engine) Use(middlewares ...Middleware)

Use adds middleware to the chain which is run before spider.

type Extension

type Extension interface{}

type Filter

type Filter interface {
	Exist(c *Context) bool
	Add(c *Context)
}

type HandleFunc

type HandleFunc func(c *Context) Task

MiddlewareFunc defines a function to serve *Context.

type LifoScheduler

type LifoScheduler struct {
	// contains filtered or unexported fields
}

func NewLifoScheduler

func NewLifoScheduler() *LifoScheduler

func (*LifoScheduler) Close

func (l *LifoScheduler) Close()

func (*LifoScheduler) Pop

func (l *LifoScheduler) Pop() (c []byte)

func (*LifoScheduler) Push

func (l *LifoScheduler) Push(c []byte)

type Middleware

type Middleware interface {
	Next(MiddlewareFunc) MiddlewareFunc
}

Middleware defines a interface to process middleware.

type MiddlewareFunc

type MiddlewareFunc func(c *Context) error

MiddlewareFunc defines a function to serve *Context.

type Pipeline

type Pipeline interface {
	ProcessItems(items ...interface{})
}

type RequestOptions

type RequestOptions struct {
	// contains filtered or unexported fields
}

func NewRequestOptions

func NewRequestOptions() *RequestOptions

func (*RequestOptions) GetDepth

func (c *RequestOptions) GetDepth() int64

GetDepth returns `depth`.

func (*RequestOptions) GetDontFilter

func (c *RequestOptions) GetDontFilter() bool

GetDontFilter returns `depth`.

func (*RequestOptions) MarshalText

func (c *RequestOptions) MarshalText() ([]byte, error)

func (*RequestOptions) MustValue

func (c *RequestOptions) MustValue(k string, v interface{})

func (*RequestOptions) Set

func (c *RequestOptions) Set(k string, v interface{})

func (*RequestOptions) SetDepth

func (c *RequestOptions) SetDepth(depth int64)

SetDepth sets `depth`.

func (*RequestOptions) SetDontFilter

func (c *RequestOptions) SetDontFilter(dontFilter bool)

SetDontFilter sets `depth`.

func (*RequestOptions) UnmarshalText

func (c *RequestOptions) UnmarshalText(text []byte) error

func (*RequestOptions) Value

func (c *RequestOptions) Value(k string, v interface{}) error

type RequestOptionsFunc

type RequestOptionsFunc func(options *RequestOptions)

func DontFilter

func DontFilter() RequestOptionsFunc

DontFilter returns a RequestOptionsFunc which sets the dontFilter

func WithValue

func WithValue(k string, v interface{}) RequestOptionsFunc

WithValue returns a RequestOptionsFunc which sets k,v in setter The provided value must be serializable

type Response

type Response struct {
	*http.Response
}

func NewResponse

func NewResponse(response *http.Response) *Response

func (*Response) Bytes

func (resp *Response) Bytes() (bodyBytes []byte, err error)

Bytes get []byte from response.Body

func (*Response) JSON

func (resp *Response) JSON(i interface{}) error

JSON parses the resp.Body data and stores the result

func (*Response) String

func (resp *Response) String() (str string, err error)

Bytes get []byte from response.Body

func (*Response) Tokenizer

func (resp *Response) Tokenizer() (tokenizer *html.Tokenizer)

Tokenizer get *html.Tokenizer from response.Body

type Runnable

type Runnable interface {
	// Start starts the runnable object. Upon the method returning nil, the object begins to function properly.
	Start() error

	Closable
}

Runnable is the interface for objects that can start to work and stop on demand.

type Scheduler

type Scheduler interface {
	// Push push a *Context
	Push([]byte)
	// Pop get a *Context
	// return nil if empty
	Pop() []byte
	// Close close queue
	Close()
}

type Spider

type Spider interface {
	// StartRequests generate first Task
	StartRequests() Task
}

type Task

type Task interface {
	Requests() []*http.Request
	Items() []interface{}
	RequestOptions() []RequestOptionsFunc
	Handler() HandleFunc
}

func Get

func Get(url string, handler HandleFunc, options ...RequestOptionsFunc) (Task, error)

func Gets

func Gets(urls []string, handler HandleFunc, options ...RequestOptionsFunc) (t Task, err error)

func MustGet

func MustGet(url string, handler HandleFunc, options ...RequestOptionsFunc) Task

func NewItems

func NewItems(items ...interface{}) Task

func Request

func Request(request *http.Request, handler HandleFunc, options ...RequestOptionsFunc) Task

Directories

Path Synopsis
example
library
testing
core
Package core is a generated GoMock package.
Package core is a generated GoMock package.
roundtripper
Package roundtripper is a generated GoMock package.
Package roundtripper is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL