remilia

package module
v0.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2024 License: MIT Imports: 22 Imported by: 0

README

Remilia

GitHub license

Remilia is a high-performance web scraping framework designed for efficiency. It enables users to concentrate on extracting and utilizing web content, delegating the complexity of web scraping processes to the framework.

Features

  • Clean API & elegant mental model
  • Concurrency supporting
  • Configurable backoff retry algorithm
  • Pre-request & post-response hooks supporting

Example

titleParser := func(in *goquery.Document, put remilia.Put[string]) {
    in.Find("h1").Each(func(i int, s *goquery.Selection) {
        fmt.Println(s.Text())
    })
}

rem, _ := remilia.New()
err := rem.Do(
    rem.Just("https://go.dev/"),
    rem.Unit(titleParser),
)
if err != nil {
    fmt.Println("Error: ", err)
}

Install

go get -u github.com/ShroXd/remilia

License

This project is licensed under the MIT License. See the LICENSE file for details.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WithBackoffOptions added in v0.5.0

func WithBackoffOptions(opts ...exponentialBackoffOptionFunc) remiliaOption

func WithBaseURL added in v0.3.0

func WithBaseURL(url string) clientOptionFunc

func WithClientOptions added in v0.5.0

func WithClientOptions(opts ...clientOptionFunc) remiliaOption

func WithConcurrency added in v0.5.0

func WithConcurrency(concurrency uint) stageOptionFunc

func WithHeaders added in v0.3.0

func WithHeaders(headers map[string]string) clientOptionFunc

func WithInputBufferSize added in v0.5.0

func WithInputBufferSize(size uint) stageOptionFunc

func WithLinearAttempt added in v0.3.0

func WithLinearAttempt(a uint8) exponentialBackoffOptionFunc

func WithLogger added in v0.2.0

func WithLogger(logger Logger) remiliaOption

func WithMaxAttempt added in v0.3.0

func WithMaxAttempt(a uint8) exponentialBackoffOptionFunc

func WithMaxDelay added in v0.3.0

func WithMaxDelay(d time.Duration) exponentialBackoffOptionFunc

func WithMinDelay added in v0.3.0

func WithMinDelay(d time.Duration) exponentialBackoffOptionFunc

func WithMultiplier added in v0.3.0

func WithMultiplier(m float64) exponentialBackoffOptionFunc

func WithPostResponseHooks added in v0.3.0

func WithPostResponseHooks(hooks ...responseHook) clientOptionFunc

func WithPreRequestHooks added in v0.3.0

func WithPreRequestHooks(hooks ...requestHook) clientOptionFunc

func WithStageOptions added in v0.5.0

func WithStageOptions(opts ...stageOptionFunc) remiliaOption

func WithTimeout added in v0.3.0

func WithTimeout(timeout time.Duration) clientOptionFunc

func WithTransformer added in v0.5.0

func WithTransformer(transformer transform.Transformer) clientOptionFunc

Types

type Get added in v0.2.0

type Get[T any] func() (T, bool)

type Logger added in v0.2.0

type Logger interface {
	Debug(msg string, context ...logContext)
	Info(msg string, context ...logContext)
	Warn(msg string, context ...logContext)
	Error(msg string, context ...logContext)
	Panic(msg string, context ...logContext)
}

type Put added in v0.2.0

type Put[T any] func(T)

type Remilia

type Remilia struct {
	ID   string
	Name string
	// contains filtered or unexported fields
}

func New

func New(opts ...remiliaOption) (*Remilia, error)

func (*Remilia) Do added in v0.2.0

func (r *Remilia) Do(producerDef processorDef[*Request], stageDefs ...stageDef[*Request]) error

func (*Remilia) Just added in v0.2.0

func (r *Remilia) Just(urlStr string) processorDef[*Request]

func (*Remilia) Unit added in v0.3.0

func (r *Remilia) Unit(fn UnitFunc, opts ...stageOptionFunc) stageDef[*Request]

type Request added in v0.2.0

type Request struct {
	Method      string
	URL         string
	Headers     map[string]string
	Body        []byte
	QueryParams map[string]string
}

type Response added in v0.2.0

type Response struct {
	// contains filtered or unexported fields
}

type UnitFunc added in v0.5.0

type UnitFunc func(in *goquery.Document, put Put[string])

Directories

Path Synopsis
cmd
dev
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL