crawler

package
v0.25.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 22, 2024 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Overview

Package crawler implements a STAC resource crawler.

Index

Constants

This section is empty.

Variables

View Source
var DefaultOptions = &Options{
	ErrorHandler: func(err error) error { return err },
}

DefaultOptions used when creating a new crawler.

View Source
var ErrStopRecursion = errors.New("stop recursion")

ErrStopRecursion is returned by the visitor when it wants to stop recursing.

Functions

func Crawl added in v0.11.0

func Crawl(resource string, visitor Visitor, options ...*Options) error

Crawl calls the visitor for each resolved resource.

The resource can be a file path or a URL. Any error returned by visitor will stop crawling and be returned by this function. Context cancellation will also stop crawling and the context error will be returned.

This is a shorthand for calling New, Add, and Wait when you only need to crawl a single entry.

func LinkTypeAnyJSON added in v0.10.0

func LinkTypeAnyJSON(link Link) bool

func LinkTypeApplicationJSON added in v0.10.0

func LinkTypeApplicationJSON(link Link) bool

func LinkTypeGeoJSON added in v0.10.0

func LinkTypeGeoJSON(link Link) bool

func LinkTypeNone added in v0.10.0

func LinkTypeNone(link Link) bool

Types

type Asset added in v0.12.0

type Asset map[string]interface{}

Asset provides metadata about data for an item.

func (Asset) Description added in v0.12.0

func (a Asset) Description() string

Description returns the asset's description.

func (Asset) Href added in v0.12.0

func (a Asset) Href() string

Href returns the asset's href.

func (Asset) Roles added in v0.12.0

func (a Asset) Roles() []string

Roles returns the asset's description.

func (Asset) Title added in v0.12.0

func (a Asset) Title() string

Title returns the asset's title.

func (Asset) Type added in v0.12.0

func (a Asset) Type() string

Type returns the asset's type.

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler crawls STAC resources.

func New

func New(visitor Visitor, options ...*Options) (*Crawler, error)

New creates a crawler with the provided options (or DefaultOptions if none are provided).

The visitor will be called for each resource added and for every additional resource linked from the initial entry.

func (*Crawler) Add added in v0.15.0

func (c *Crawler) Add(resource string) error

Add a new resource entry to crawl.

The resource can be a file path or a URL.

func (*Crawler) Wait added in v0.15.0

func (c *Crawler) Wait() error

Wait for a crawl to finish.

type ErrorHandler added in v0.11.0

type ErrorHandler func(error) error

ErrorHandler is called with any errors during a crawl. If the function returns nil, the crawl will continue. If the function returns an error, the crawl will stop.

type Handler added in v0.15.0

type Handler func(task *Task) error
type Link map[string]string

Link represents a link to a resource.

type LinkMatcher added in v0.10.0

type LinkMatcher func(link Link) bool
type Links []Link

Links is a slice of links.

func (Links) Rel added in v0.10.0

func (links Links) Rel(rel string, matchers ...LinkMatcher) Link

type Options

type Options struct {
	// Optional function to limit which resources to crawl.  If provided, the function
	// will be called with the URL or absolute path to a resource before it is crawled.
	// If the function returns false, the resource will not be read and the visitor will
	// not be called.
	Filter func(string) bool

	// Optional function to handle any errors during the crawl.  By default, any error
	// will stop the crawl.  To continue crawling on error, provide a function that
	// returns nil.  The special ErrStopRecursion will stop the crawler from recursing deeper
	// but will not stop the crawl altogether.
	ErrorHandler ErrorHandler

	// Optional queue to use for crawling tasks.  If not provided, an in-memory queue
	// will be used.  When running a crawl across multiple processes, it can be useful
	// to provide a queue that is shared across processes.
	Queue Queue
}

Options for creating a crawler.

type Queue added in v0.15.0

type Queue interface {
	Add(tasks []*Task) error
	Handle(handler Handler)
	Wait() error
}

func NewMemoryQueue added in v0.15.0

func NewMemoryQueue(ctx context.Context, limit int) Queue

NewMemoryQueue is used if a custom queue is not provided for a crawl.

The crawl will stop if the provided context is cancelled. The limit is used to control the number of resources that will be visited concurrently.

type Resource

type Resource map[string]interface{}

Resource represents a STAC catalog, collection, or item.

func (Resource) Assets added in v0.12.0

func (r Resource) Assets() map[string]Asset

Returns the assets (if any).

func (Resource) ConformsTo added in v0.6.0

func (r Resource) ConformsTo() []string

Returns the STAC / OGC Features API conformance classes (if any).

func (Resource) Extensions

func (r Resource) Extensions() []string

Extensions returns the resource extension URLs.

func (r Resource) Links() Links

Links returns the resource links.

func (Resource) Type

func (r Resource) Type() ResourceType

Type returns the specific resource type.

func (Resource) Version

func (r Resource) Version() string

Version returns the STAC version.

type ResourceInfo added in v0.15.0

type ResourceInfo struct {
	// Location is the URL or file path of the resource.
	Location string

	// Entry is the URL or file path of the initial resource that was crawled and pointed to this resource.
	Entry string
}

ResourceInfo includes information about how the resource was accessed.

type ResourceType

type ResourceType string

ResourceType indicates the STAC resource type.

const (
	Item       ResourceType = "item"
	Catalog    ResourceType = "catalog"
	Collection ResourceType = "collection"
)

type Task added in v0.9.0

type Task struct {
	// contains filtered or unexported fields
}

func (*Task) Entry added in v0.16.0

func (t *Task) Entry() string

func (*Task) MarshalJSON added in v0.15.0

func (t *Task) MarshalJSON() ([]byte, error)

func (*Task) Resource added in v0.16.0

func (t *Task) Resource() string

func (*Task) UnmarshalJSON added in v0.15.0

func (t *Task) UnmarshalJSON(data []byte) error

type Visitor

type Visitor func(Resource, *ResourceInfo) error

Visitor is called for each resource during crawling.

Any returned error will stop crawling and be returned by Crawl.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL