httpsyet

package
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 30, 2018 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package httpsyet provides the configuration and execution for crawling a list of sites for links that can be updated to HTTPS.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DoneFunc

func DoneFunc(inp resultFrom, act func(a result)) (done <-chan struct{})

DoneFunc returns a channel to receive one signal after `act` has been applied to every `inp` before close.

Types

type Crawler

type Crawler struct {
	Sites    []string                             // At least one URL.
	Out      io.Writer                            // Required. Writes one detected site per line.
	Log      *log.Logger                          // Required. Errors are reported here.
	Depth    int                                  // Optional. Limit depth. Set to >= 1.
	Parallel int                                  // Optional. Set how many sites to crawl in parallel.
	Delay    time.Duration                        // Optional. Set delay between crawls.
	Get      func(string) (*http.Response, error) // Optional. Defaults to http.Get.
	Verbose  bool                                 // Optional. If set, status updates are written to logger.
}

Crawler is used as configuration for Run. Is validated in Run().

func (Crawler) Run

func (c Crawler) Run() error

Run the crawler. Can return validation errors. All crawling errors are reported via logger. Output is written to writer. Crawls sites recursively and reports all external links that can be changed to HTTPS. Also reports broken links via error logger.

type Rake

type Rake struct {
	// contains filtered or unexported fields
}

Rake represents a fanned out circular pipe network with a flexibly adjusting buffer. site item is processed once only - items seen before are filtered out.

A Rake may be used e.g. as a crawling Crawler where every link shall be visited only once.

func New

func New(
	rake func(a item),
	attr func(a item) interface{},
	somany int,
) (
	my *Rake,
)

New returns a (pointer to a) new operational Rake.

`rake` is the operation to be executed in parallel on any item which has not been seen before. Have it use `myrake.Feed(items...)` in order to provide feed-back.

`attr` allows to specify an attribute for the seen filter. Pass `nil` to filter on any item itself.

`somany` is the # of parallel processes - the parallelism of the network built by Rake, the # of parallel raking endpoints of the Rake.

func (*Rake) Attr

func (my *Rake) Attr(attr func(a item) interface{}) *Rake

Attr sets the (optional) attribute to discriminate 'seen'.

`attr` allows to specify an attribute for the 'seen' filter. If not set 'seen' will discriminate any item by itself.

Seen panics iff called after first nonempty `Feed(...)`

func (*Rake) Done

func (my *Rake) Done() (done <-chan struct{})

Done returns a channel which will be signalled and closed when traffic has subsided, nothing is left to be processed and consequently all goroutines have terminated.

func (*Rake) Feed

func (my *Rake) Feed(items ...item) *Rake

Feed registers new items on the network.

func (*Rake) Rake

func (my *Rake) Rake(rake func(a item)) *Rake

Rake sets the rake function to be applied (in parallel).

`rake` is the operation to be executed in parallel on any item which has not been seen before.

You may provide `nil` here and call `Rake(..)` later to provide it. Or have it use `myrake.Feed(items...)` in order to provide feed-back.

Rake panics iff called after first nonempty `Feed(...)`

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL