crawler

package
v0.0.0-...-0a5fad8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 24, 2021 License: MIT Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	PrivateNetworkDetector PrivateNetworkDetector
	URLGetter              URLGetter
	Graph                  Graph
	Indexer                Indexer
	FetchWorkers           int
}

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

func NewCrawler

func NewCrawler(cfg Config) *Crawler

func (*Crawler) Crawl

func (c *Crawler) Crawl(ctx context.Context, linkIt graph.LinkIterator) (int, error)

type Graph

type Graph interface {
	UpsertLink(link *graph.Link) error
	UpsertEdge(edge *graph.Edge) error
	RemoveStaleEdges(fromID uuid.UUID, updateBefore time.Time) error
}

type Indexer

type Indexer interface {
	Index(doc *index.Document) error
}

Indexer is implemented by objects that can index the contents of web-pages retrieved by the crawler pipeline.

type PrivateNetworkDetector

type PrivateNetworkDetector interface {
	IsPrivate(host string) (bool, error)
}

PrivateNetworkDetector is implemented by objects that can detect whether a host resolves to a private network address

type URLGetter

type URLGetter interface {
	Get(url string) (*http.Response, error)
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL