sks_spider

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2018 License: Apache-2.0 Imports: 30 Imported by: 1

README

sks_spider

Tool to spider the PGP SKS keyserver mesh.

Build Status

This code-base is horrible; it was predominantly written in a weekend, porting from some very organic Python. Do not use this as an example of how to do things in Golang.

Overview

If you don't know what PGP is or anything about the PGP keyservers, then this tool is not for you. Otherwise, read on.

This is a package which produces one binary, sks_stats_daemon. This is a web-server which goes to a seed SKS server, grabs stats, and spiders out from there.

The resulting daemon should be set behind a front-end web-server such as nginx, with the /sks-peers location dispatched to it. If you run this daemon listening on a publicly reachable port, or dispatch more of the URI namespace to the daemon, you may have issues, as administrative URIs can live outside of that prefix.

As well as a stats overview page, there is also an interface to grab lists of IPs meeting various serving criteria; I use that to build DNS zones automatically, from cron, as a client of this service. The client was unperturbed by the migration.

The original version was written in Python as a WSGI and grew organically. This version is written in Golang (the Go programming language) and makes fairly decent use of Go's concurrency features. It uses well under a fifth the total RAM, something similarly smaller in RSS, uses less CPU (when busy, 10% of an ancient CPU instead of all of one; when "idle" is not sitting at the top of top(1) output, using fractionally more CPU than a real idle process) and is significantly more responsive. These improvements are in part because of Golang and in very large part because of the ugliness of the old code. Python's good, I'm bad.

All the production serving interface features have now been copied across; all that's left are some admin hooks which aren't really applicable (eg, a list of Python threads for introspection). Those other features should not significantly impact resource consumption.

To-Do

  • Preserve more errors for the front-page?
  • Look over the admin interfaces, probably want /rescanz back
  • If add rescanz, need locking around spider starting; can preserve spider handle while at it, and make it possible to, eg kill an existing scan using a random nonce to authenticate, where the nonce has to be retrieved from the logfile.

Building

To fetch the code, all dependencies, updating them, and install the command, then run:

go get -u -v github.com/philpennock/sks_spider/...
# optionally:
cd ~/go/src/github.com/philpennock/sks_spider
make

You don't have to use make, but it does embed a version string into the binary which can be operationally useful. Note that GNU make should be used (or any other make implementation which handles GNUmakefile adequately).

If you encounter problems, look at the .travis.yml file which is used for running the Travis Continuous Integration tests: https://travis-ci.org/philpennock/sks_spider. That assumes some other prep steps run automatically by Travis, but the test log should show everything in context.

Running

You can see the accepted parameters with the -help flag:

sks_stats_daemon -help

You might run, as an unprivileged user:

sks_stats_daemon -log-file /var/log/sks-stats.log

Note that this tool does not self-detach from the terminal: I prefer to leave it where a supervising agent tool can easily watch it. If you want it to detach, then your OS should have available a wrapper command which will handle that for you.

The log-file will need to be exist and be writeable by that unprivileged user (or be in a directory which that user can create new files in).

Note that the logging does not currently log all HTTP requests; that's the responsibility of the front-end (for now?). Actually, the logging isn't production-grade. It "logs", but that doesn't mean the logs have proven themselves adequate at crunch time.

The horrible HTML templates (translated directly from my horrible Python ones ... I'm definitely not a UI designer) expect a style-sheet and a favicon.ico to be provided as part of the namespace, they're not served by this daemon.

Yes, this is a toy program. It's a useful toy, but definitely not a shipping product.

My start-up script (OS-specific, not included) touches and chowns the log-file before starting the program. It then runs, as the same run-time user as is used for sks itself (for my convenience in user management):

sks_stats_daemon -log-file /var/log/sks-stats.log \
  -json-persist /var/sks/stats-persist.json \
  -started-file /var/sks/stats.started

The -json-persist flag causes sks_stats_daemon to register a handler for SIGUSR1; receipt of that signal causes the current mesh to be written to the named file (removing any previous content), before exiting.

The start-up script takes a quickrestart argument, which sends SIGUSR1, waits for the process to disappear, then starts sks_stats_daemon once more. It then waits for the -started-file flag-file to appear, then removes it and exits.

nginx configuration

It's as simple as:

location /sks-peers {
    proxy_pass          http://127.0.0.1:8001;
    proxy_set_header    X-Real-IP $remote_addr;
}

In fact, you don't even need the X-Real-IP pass-through, but set it up now and it'll be easier to deal with a future change which logs the origin IP.

Note especially that, as suggested in the Overview above, we're only passing through the /sks-peers part of the namespace; this avoids exposing the /debug hierarchy, amongst others.

License

Apache 2.0.

Most people are nice and sane and in a world without subversion and lawyers, this next bit wouldn't be necessary. It's butt-covering, that's all.

If you send me a patch or a pull request, then by default:

  • I will add you to a CONTRIBUTORS file
  • You are assumed to be implicitly granting a license to me for your work to be distributed under the same license, as part of a larger work
  • You are assumed to have the authority to submit the modification under these terms and are implicitly testifying to this by making the submission.

In other words: please don't be a jackass, contributions are expected to contribute towards the codebase, not take away. Thanks.

That's about it.
-Phil

Copyright 2012,2013,2016 Phil Pennock.

Documentation

Overview

sks_spider is a tool to spider the PGP SKS keyserver mesh.

A more introductory overview for usage should be in "README.md", as this code is geared for use as a program, not as a library.

At present, the code is heavily geared towards providing one daemon, sks_stats_daemon. This is a web-server which goes to a seed SKS server, grabs stats, and spiders out from there.

The results are available over HTTP, with pages for humans and pages for automated retrieval. The author successfully builds DNS zones using tools running out of cron which get their data from this server.

Index

Constants

View Source
const (
	ContentTypeTextPlain = "text/plain; charset=UTF-8"
	ContentTypeJson      = "application/json"
)
View Source
const QUEUE_DEPTH int = 100
View Source
const SERVE_PREFIX = "/sks-peers"

Variables

View Source
var BlacklistedHosts = map[string]bool{}

slow slow slow to fail

View Source
var VersionString string

Functions

func CountryForIPString

func CountryForIPString(ipstr string) (country string, err error)

func DummySpiderForDiagnosticsChannel

func DummySpiderForDiagnosticsChannel()

func GenerateDepthSorted

func GenerateDepthSorted(hostmap HostMap) []string

func GenerateHostlistSorted

func GenerateHostlistSorted(hostMap HostMap) []string

func GetCurrentHostlist

func GetCurrentHostlist() []string

func GetMembershipAsNodemap

func GetMembershipAsNodemap() (map[string]*SksNode, error)

func GetMembershipHosts

func GetMembershipHosts() ([]string, error)

func HostSort

func HostSort(victim []string)

Sort a list of strings in host order, ie by DNS label from right to left

func IPDisallowed

func IPDisallowed(ipstr string) bool

func KillDummySpiderForDiagnosticsChannel

func KillDummySpiderForDiagnosticsChannel()

func Main

func Main()

func NodeUrl

func NodeUrl(name string, sn *SksNode) string

func ReverseStringSlice

func ReverseStringSlice(a []string)

func SetCurrentPersisted

func SetCurrentPersisted(p *PersistedHostInfo)

func SpiderDiagnostics

func SpiderDiagnostics(out io.Writer)

Types

type AliasMap

type AliasMap map[string]string

func GetAliasMapForHostmap

func GetAliasMapForHostmap(hostMap HostMap) AliasMap

type CountryResult

type CountryResult struct {
	// contains filtered or unexported fields
}

type CountrySet

type CountrySet sortedSet

func NewCountrySet

func NewCountrySet(s string) CountrySet

func (CountrySet) HasCountry

func (cs CountrySet) HasCountry(s string) bool

func (CountrySet) Initialized

func (cs CountrySet) Initialized() bool

func (CountrySet) String

func (cs CountrySet) String() string

type DnsResult

type DnsResult struct {
	// contains filtered or unexported fields
}

type GraphvizAttributes

type GraphvizAttributes map[string]interface{}

func (GraphvizAttributes) String

func (ga GraphvizAttributes) String() string

type HostGraph

type HostGraph struct {
	// contains filtered or unexported fields
}

func GenerateGraph

func GenerateGraph(names []string, sksnodes HostMap, aliases AliasMap) *HostGraph

func NewHostGraph

func NewHostGraph(count int, aliasMap AliasMap) *HostGraph

func (*HostGraph) AllPeersOf

func (hg *HostGraph) AllPeersOf(name string) []string
func (hg *HostGraph) ExistsLink(from, to string) bool

func (*HostGraph) Inbound

func (hg *HostGraph) Inbound(name string) <-chan string

func (*HostGraph) LabelMutualWithBase

func (hg *HostGraph) LabelMutualWithBase(name string) string

func (*HostGraph) Len

func (hg *HostGraph) Len() int

func (*HostGraph) Outbound

func (hg *HostGraph) Outbound(name string) <-chan string

type HostMap

type HostMap map[string]*SksNode

func GetCurrentHosts

func GetCurrentHosts() HostMap

func LoadJSONFromFile

func LoadJSONFromFile(filename string) (HostMap, error)

func (HostMap) DumpJSON

func (hostmap HostMap) DumpJSON(out io.Writer) error

func (HostMap) DumpJSONToFile

func (hostmap HostMap) DumpJSONToFile(filename string) error

type HostResult

type HostResult struct {
	// contains filtered or unexported fields
}

type HostsRequest

type HostsRequest struct {
	// contains filtered or unexported fields
}

type IPCountryMap

type IPCountryMap map[string]string

func GetFreshCountryForHostmap

func GetFreshCountryForHostmap(hostMap HostMap) IPCountryMap

type PersistedHostInfo

type PersistedHostInfo struct {
	HostMap      HostMap
	AliasMap     AliasMap
	IPCountryMap IPCountryMap
	Sorted       []string
	DepthSorted  []string
	Graph        *HostGraph
	Timestamp    time.Time
}

func GeneratePersistedInformation

func GeneratePersistedInformation(spider *Spider) *PersistedHostInfo

func GetCurrentPersisted

func GetCurrentPersisted() *PersistedHostInfo

func (*PersistedHostInfo) LogInformation

func (p *PersistedHostInfo) LogInformation()

func (*PersistedHostInfo) UpdateStatsCounters

func (p *PersistedHostInfo) UpdateStatsCounters(spider *Spider)

type SksNode

type SksNode struct {
	// Be sure that types of Exported fields are loadable from JSON!
	Hostname string
	Port     int

	Status         string
	ServerHeader   string
	ViaHeader      string
	Settings       map[string]string
	GossipPeers    map[string]string
	GossipPeerList []string
	MailsyncPeers  []string
	Version        string
	Software       string
	Keycount       int

	// And these are populated when converted into a HostMap
	AnalyzeError string
	IpList       []string
	Aliases      []string
	Distance     int
	// contains filtered or unexported fields
}

func (*SksNode) Analyze

func (sn *SksNode) Analyze()

func (*SksNode) Dump

func (sn *SksNode) Dump(out io.Writer)

func (*SksNode) Fetch

func (sn *SksNode) Fetch() error

func (*SksNode) Minimize

func (sn *SksNode) Minimize()

Dump the large content, let garbage collection reclaim space

func (*SksNode) Normalize

func (sn *SksNode) Normalize() bool

func (*SksNode) Url

func (sn *SksNode) Url() string

type SksVersion

type SksVersion struct {
	Major, Minor, Release uint
	Tag                   string
}

func NewSksVersion

func NewSksVersion(s string) *SksVersion

func (*SksVersion) IsAtLeast

func (sv *SksVersion) IsAtLeast(min *SksVersion) bool

func (*SksVersion) String

func (sv *SksVersion) String() string

type Spider

type Spider struct {
	// contains filtered or unexported fields
}

This persists for the length of one data gathering run.

func StartSpider

func StartSpider() *Spider

func (*Spider) AddHost

func (spider *Spider) AddHost(hostname string, distance int)

func (*Spider) BatchAddHost

func (spider *Spider) BatchAddHost(origin string, hostlist []string)

func (*Spider) Terminate

func (spider *Spider) Terminate()

func (*Spider) Wait

func (spider *Spider) Wait()

Directories

Path Synopsis
cmd
internal
string_set
Package b implements a B+tree.
Package b implements a B+tree.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL