linkprocessor

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2021 License: AGPL-3.0 Imports: 13 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type LinkProcessor

type LinkProcessor struct {
	// contains filtered or unexported fields
}

LinkProcessor contains all connections necessary for accessing the cache, db and channel for sending urls back to rabbitmq.

func NewLinkProcessor

func NewLinkProcessor(
	storage *linkstorage.Storage,
	batchSize int,
	queue *linkqueue.LinkQueue,
	numWorkers int,
) (*LinkProcessor, error)

NewLinkProcessor is a helper function for creating the LinkProcessor.

func (*LinkProcessor) CheckURLExists

func (lp *LinkProcessor) CheckURLExists(u *url.URL) (bool, error)

CheckURLExists initially checks the in memory cache forthe url, and returns true if found. If the url is not in the in-memory cache it will check the db, and returns true/update cache if found. If not found in db or cache, then returns false.

func (*LinkProcessor) Close

func (lp *LinkProcessor) Close()

Close immediately kills batching workers.

func (*LinkProcessor) GracefulShutdown

func (lp *LinkProcessor) GracefulShutdown() <-chan bool

GracefulShutdown returns a channel that receives true when it has finished flushing the db batching cache / finished writing to the queue.

func (*LinkProcessor) MarkURLVisited

func (lp *LinkProcessor) MarkURLVisited(u *url.URL)

MarkURLVisited sets the link as visited in cache

func (*LinkProcessor) ProcessURL

func (lp *LinkProcessor) ProcessURL(u *url.URL) error

ProcessURL takes a url and processes it.

func (*LinkProcessor) ScrapeLinksFromURL

func (lp *LinkProcessor) ScrapeLinksFromURL(u *url.URL) ([]*linkstorage.Link, error)

ScrapeLinksFromURL takes a url to scrape, retrieves the page and returns all links found.

func (*LinkProcessor) SpawnWorkers

func (lp *LinkProcessor) SpawnWorkers(n int) chan *url.URL

SpawnWorkers vaguely spawns up n number of workers, that can then be communicated with by pushing urls to the channel.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL