walker

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2023 License: MIT Imports: 9 Imported by: 0

README

Seamlessly fetch paginated data from any source!

godoc semver tag

walker

Walker simplifies the process of fetching paginated data from any data source. With Walker, you can easily configure the start position and count of documents to fetch, depending on your needs. Additionally, Walker supports parallel processing, allowing you to fetch data more efficiently and at a faster rate.

The real purpose of the library is to provide a solution for walking through the pagination of API endpoints. With the NewApiWalker, you can easily fetch data from any paginated API endpoint and process the data concurrently. You can also create your own custom walker to fit your specific use case.

Features

  • Provides a walker to paginate through the pagination of API endpoint. This is for scraping an API, if such a term exists.
  • cursor and offset pagination strategies.
  • Fetching and processing data concurrently without any effort.
  • Total fetch count limiting
  • Rate limiting

Examples

Basic Usage
func source(start, fetchCount int) ([]int, error) {
	return []int{start, fetchCount}, nil
}

func sink(result []int, stop func()) error {
	fmt.Println(result)
	return nil
}

func main() {
	walker.New(source,sink).Walk()
}

Output:

[0 10]
[1 10]
[4 10]
[2 10]
[3 10]
[5 10]
[8 10]
[9 10]
[7 10]
[6 10]
...
to Infinity
  • source function will receive start as the page number and count as the number of documents. Use this values to fetch data from your source.
  • sink function will receive the result you returned from source and a stop function. You can save the results in this function and decide to stop sourcing any further pages depending on your results by calling stop function, otherwise it will continue to forever unless a limit provided.
  • Beware of order is not ensured since source and sink functions called concurrently.
Walking through the pagination of API endpoints

Fetching all the breweries from Open Brewery DB:

func buildRequest(start, fetchCount int) (*http.Request, error) {
	url := fmt.Sprintf("https://api.openbrewerydb.org/breweries?page=%d&per_page=%d", start, fetchCount)
	return http.NewRequest(http.MethodGet, url, http.NoBody)
}

func sink(res *http.Response, stop func()) error {
	var payload []map[string]any
	json.NewDecoder(res.Body).Decode(&payload)

	if len(payload) == 0 {
		stop()
		return nil
	}

	return saveBreweries(payload)
}

func main() {
	walker.NewApiWalker(http.DefaultClient, buildRequest, sink).Walk()
}

To create API walker you just need to provide:

  • RequestBuilder function to create http request using provided values
  • sink function to process the http response

Check examples for more usecases.

Configuration

Option Description Default Available Values
WithPagination Defines the pagination strategy walker.OffsetPagination{} walker.OffsetPagination{}, walker.CursorPagination{}
WithMaxBatchSize Defines limit for document count to stop after reached 10 int
WithParallelism Defines number of workers to run provided source runtime.NumCPU() int
WithLimiter Defines limit for document count to stop after reached walker.InfiniteLimiter() walker.InfiniteLimiter(), walker.ConstantLimiter(int)
WithRateLimit Defines rate limit by count and per duration unlimited (int, time.Duration)
WithContext Defines context context.Background() context.Context

Contribution

I would like to accept any contributions to make walker better and feature rich. Feel free to contribute with your usecase!

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Batch

type Batch struct {
	Size  int
	Count int
}

func NewBatch

func NewBatch(batchSize, limit, parallelism int) Batch

type CursorPagination

type CursorPagination struct{}

func (CursorPagination) FetchCount

func (c CursorPagination) FetchCount(totalCount, start, batchSize int) int

func (CursorPagination) StartIndex

func (c CursorPagination) StartIndex(batchStart, workerNumber, batchSize int) int

type FailedTask

type FailedTask struct {
	Start      int
	FetchCount int
	Err        error
}

type Limiter

type Limiter func() int

func ConstantLimiter

func ConstantLimiter(limit int) Limiter

func InfiniteLimiter

func InfiniteLimiter() Limiter

type OffsetPagination

type OffsetPagination struct{}

func (OffsetPagination) FetchCount

func (o OffsetPagination) FetchCount(totalCount, start, batchSize int) int

func (OffsetPagination) StartIndex

func (o OffsetPagination) StartIndex(batchStart, workerNumber, batchSize int) int

type Option

type Option func(*config)

func WithContext

func WithContext(ctx context.Context) Option

func WithLimiter

func WithLimiter(limiter Limiter) Option

func WithMaxBatchSize

func WithMaxBatchSize(size int) Option

func WithPagination

func WithPagination(pagination Pagination) Option

func WithParallelism

func WithParallelism(parallelism int) Option

func WithRateLimit

func WithRateLimit(count int, per time.Duration) Option

type Pagination

type Pagination interface {
	StartIndex(batchStart, workerNumber, batchSize int) int
	FetchCount(limit, start, batchSize int) int
}

type RequestBuilder

type RequestBuilder func(start, fetchCount int) (*http.Request, error)

type Sink

type Sink[T any] func(result T, stop func()) error

type Source

type Source[T any] func(start, fetchCount int) (T, error)

type Walker

type Walker[T any] struct {
	// contains filtered or unexported fields
}

func New

func New[T any](source Source[T], sink Sink[T], options ...Option) *Walker[T]

func NewApiWalker

func NewApiWalker(client *http.Client, requestBuilder RequestBuilder, sink Sink[*http.Response], options ...Option) *Walker[*http.Response]

func (*Walker[T]) FailedTasks

func (w *Walker[T]) FailedTasks() []FailedTask

func (*Walker[T]) IsStopped

func (w *Walker[T]) IsStopped() bool

func (*Walker[T]) Stop

func (w *Walker[T]) Stop()

func (*Walker[T]) Walk

func (w *Walker[T]) Walk()

Directories

Path Synopsis
example

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL