httpsyet

package
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 30, 2018 License: MIT Imports: 11 Imported by: 0

README

v4 - Teach traffic to be more gentle and easy to use

Overview

Proper use of the new struct traffic is still awkward for crawling.

Thus, time to teach traffic to behave better and more robust. Sepcifically:

  • new constructor New(): don't bother the client with initialisation - and thus now no need anymore to import "sync"
  • have Processor return a signal channel to broadcast "traffic has subsided and nothing is left to be processed"
  • lazy initialisation of this mechanism upon first Feed, and Do() only synce.Once
  • new method Done() - just a convenince to receive the broadcast channel another way
  • wrap the crawl function passed to Processor and have it register the site having left - thus: no need anymore for crawling to do so in it's crawl method.

So, crawling now is 20% shorter more more focused on it's own subject, is it not?

Please also note: "launch the results closer" now happens happily before the first "feed initial urls" - no need anymore to worry for something like "goWaitAndClose is to be used after initial traffic has been added".

The client (crawling) is free to use the channel returned from Processor (as it does now) or may even use <-crawling.Done() at any time he likes or seems fit (even before(!)) the Processor is build.

And: Done() is a method familar e.g. from the "context" package - thus easy to use and understand. Easier as is <-sites.SiteDoneWait(c.Travel, c), is it not?


Some remarks regarding changes to source files compared with the previous version:

traffic.go

Implement a.m. improvements straight-forward. Note: The network itself remains as is.

genny.go in traffic/

Just change to private site.

site.go

Just make it's (previously public) methods (Attr & Print) private.

crawling.go

Much more focused and compact now.

crawler_test.go

Just the import path.

Changes to crawler.go

No need to touch.


Back to Overview

Documentation

Overview

Package httpsyet provides the configuration and execution for crawling a list of sites for links that can be updated to HTTPS.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	Sites    []string                             // At least one URL.
	Out      io.Writer                            // Required. Writes one detected site per line.
	Log      *log.Logger                          // Required. Errors are reported here.
	Depth    int                                  // Optional. Limit depth. Set to >= 1.
	Parallel int                                  // Optional. Set how many sites to crawl in parallel.
	Delay    time.Duration                        // Optional. Set delay between crawls.
	Get      func(string) (*http.Response, error) // Optional. Defaults to http.Get.
	Verbose  bool                                 // Optional. If set, status updates are written to logger.
}

Crawler is used as configuration for Run. Is validated in Run().

func (Crawler) Run

func (c Crawler) Run() error

Run the crawler. Can return validation errors. All crawling errors are reported via logger. Output is written to writer. Crawls sites recursively and reports all external links that can be changed to HTTPS. Also reports broken links via error logger.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL