remilia

package module
v0.5.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: MIT Imports: 22 Imported by: 0

README

Remilia

GitHub license

Remilia is a high-performance web scraping framework designed for efficiency. It enables users to concentrate on extracting and utilizing web content, delegating the complexity of web scraping processes to the framework.

Features

  • Clean API & elegant mental model
  • Concurrency supporting
  • Configurable backoff retry algorithm
  • Pre-request & post-response hooks supporting

Example

titleParser := func(in *goquery.Document, put remilia.Put[string]) {
    in.Find("h1").Each(func(i int, s *goquery.Selection) {
        fmt.Println(s.Text())
    })
}

rem, _ := remilia.New()
err := rem.Do(
    rem.Just("https://go.dev/"),
    rem.Unit(titleParser),
)
if err != nil {
    fmt.Println("Error: ", err)
}

Install

go get -u github.com/ShroXd/remilia

License

This project is licensed under the MIT License. See the LICENSE file for details.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WithApiKeyAuth added in v0.5.2

func WithApiKeyAuth(apiKey string) clientOptionFunc

func WithBaseURL added in v0.3.0

func WithBaseURL(url string) clientOptionFunc

func WithBasicAuth added in v0.5.2

func WithBasicAuth(username string, password string) clientOptionFunc

func WithBearerAuth added in v0.5.2

func WithBearerAuth(token string) clientOptionFunc

func WithClientOptions added in v0.5.0

func WithClientOptions(opts ...clientOptionFunc) remiliaOption

func WithConcurrency added in v0.5.0

func WithConcurrency(concurrency uint) stageOptionFunc

func WithCookie added in v0.5.2

func WithCookie(cookie string) clientOptionFunc

func WithHeaders added in v0.3.0

func WithHeaders(headers map[string]string) clientOptionFunc

func WithInputBufferSize added in v0.5.0

func WithInputBufferSize(size uint) stageOptionFunc

func WithLinearAttempt added in v0.3.0

func WithLinearAttempt(a uint8) clientOptionFunc

func WithLogger added in v0.2.0

func WithLogger(logger Logger) remiliaOption

func WithMaxAttempt added in v0.3.0

func WithMaxAttempt(a uint8) clientOptionFunc

func WithMaxDelay added in v0.3.0

func WithMaxDelay(d time.Duration) clientOptionFunc

func WithMinDelay added in v0.3.0

func WithMinDelay(d time.Duration) clientOptionFunc

func WithMultiplier added in v0.3.0

func WithMultiplier(m float64) clientOptionFunc

func WithPostResponseHooks added in v0.3.0

func WithPostResponseHooks(hooks ...ResponseHook) clientOptionFunc

func WithPreRequestHooks added in v0.3.0

func WithPreRequestHooks(hooks ...RequestHook) clientOptionFunc

func WithTimeout added in v0.3.0

func WithTimeout(timeout time.Duration) clientOptionFunc

func WithTransformer added in v0.5.0

func WithTransformer(transformer transform.Transformer) clientOptionFunc

func WithUnitOptions added in v0.5.2

func WithUnitOptions(opts ...unitOptionFunc) remiliaOption

func WithUserAgentGenerator added in v0.5.2

func WithUserAgentGenerator(fn func() string) clientOptionFunc

func WithWorkLinearAttempt added in v0.5.4

func WithWorkLinearAttempt(a uint8) exponentialBackoffOptionFunc

func WithWorkMaxAttempt added in v0.5.4

func WithWorkMaxAttempt(a uint8) exponentialBackoffOptionFunc

func WithWorkMaxDelay added in v0.5.4

func WithWorkMaxDelay(d time.Duration) exponentialBackoffOptionFunc

func WithWorkMinDelay added in v0.5.4

func WithWorkMinDelay(d time.Duration) exponentialBackoffOptionFunc

func WithWorkMultiplier added in v0.5.4

func WithWorkMultiplier(m float64) exponentialBackoffOptionFunc

Types

type Client added in v0.2.0

type Client struct {
	// contains filtered or unexported fields
}

type Get added in v0.2.0

type Get[T any] func() (T, bool)

type Logger added in v0.2.0

type Logger interface {
	Debug(msg string, context ...logContext)
	Info(msg string, context ...logContext)
	Warn(msg string, context ...logContext)
	Error(msg string, context ...logContext)
	Panic(msg string, context ...logContext)
}

type Put added in v0.2.0

type Put[T any] func(T)

type Remilia

type Remilia struct {
	ID   string
	Name string
	// contains filtered or unexported fields
}

func New

func New(opts ...remiliaOption) (*Remilia, error)

func (*Remilia) Do added in v0.2.0

func (r *Remilia) Do(producerDef processorDef[*Request], stageDefs ...stageDef[*Request]) error

func (*Remilia) Just added in v0.2.0

func (r *Remilia) Just(urlStr string) processorDef[*Request]

func (*Remilia) Unit added in v0.3.0

func (r *Remilia) Unit(fn UnitFunc, opts ...unitOptionFunc) stageDef[*Request]

type Request added in v0.2.0

type Request struct {
	Method      string
	URL         string
	Headers     map[string]string
	Body        []byte
	QueryParams map[string]string
}

type RequestHook added in v0.2.0

type RequestHook func(*Request) error

type Response added in v0.2.0

type Response struct {
	// contains filtered or unexported fields
}

type ResponseHook added in v0.2.0

type ResponseHook func(*Response) error

type UnitFunc added in v0.5.0

type UnitFunc func(in *goquery.Document, put Put[string])

Directories

Path Synopsis
cmd
dev
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL