ukpr

command module

v0.0.0-...-d4dc811 Latest Latest Go to latest Published: Feb 3, 2014 License: AGPL-3.0 Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bcampbell/ukpr

Links

Open Source Insights

README ¶

ukpr (working title)

Ben Campbell (ben@scumways.com), at the Media Standards Trust

Overview

This program:

periodically scrapes a bunch of press release sources
stores them in a database
serves up the press releases to any interested clients via HTTP (as server-sent events).

The idea is eventually it'll be set up to keep an archive of a week or so to let clients have a chance to catch up if they go down.

When ukpr is running, clients can connect to:

http://<host>:<port>/<source>/

where source is one of the scrapers. You can get a list using:

$ ukpr -l

Connected clients receive a stream of press releases, as they are scraped. Clients can send a last-event-id header to access archived press releases, or to resume after a disconnection

You can connect and view the raw stream like using any http client, eg:

$ curl http://localhost:9998/72point/ -H "Last-Event-ID: 0"

Will serve up all the stored 72point press releases.

Without last-event-id, the client will be served only new press releases as they come in.

Usage

ukpr <flags> [scraper1 scraper2 ...]

Specific scrapers can be listed after the flags - only those scrapers will be used. By default all scrapers will be used.

flags:

-l
list available scrapers and exit

-historical
use the history-collecting version of all scrapers which
have one (only 72point at the moment)

-t
test mode. Scrape, but output to stdout and don't touch
the database. Also turns off the SSE serving.

-b
brief output (for test mode only) - just dump out title of press
releases to stdout rather than the whole thing.

It uses glog for logging, so also supports all the standard glog flags.

TODOs

we've already got a http server running, so should implement a simple browsing interface for visual sanity-checking of press releases.
implement a proper config file system
run the scrapers in parallel with proper interval timing

Motivation & Goals

The main aim for this is to provide press releases for use by http://churnalism.com, hence the UK bias.

A major goal is to make it simple enough to customise for coverage of any set of press release sources you like.

It's not designed to be a full historical archive of press releases, merely a conduit to stream them out to interested clients, with a bit of buffering to make it things more fault-tolerant.

Documentation ¶

Overview ¶

This program runs a server which:

scrapes UK press releases
serves them up as HTTP server-sent
stashes them in a database for persistence, keeping a few days worth of history (at least)

For more details, see prscrape (which provides all implementation).

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
hammer
prscrape
ukscrapers

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL