newspaper-curation-app

module
v2.5.4+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2018 License: Apache-2.0

README

Newspaper Curation App

Note: this project should not be considered production-ready unless you have a developer who can make sense of some of the inner workings. The application / suite work, but there are quite a few situations where somebody needs to really dig to deal with problems.

For instance, a scanner error may require deleting the issue record from the database, moving the issue's TIFF/PDF files somewhere they can be examined and fixed by the scanning team, etc.

There are also improvements which need to be made to automate more parts of the process. For instance, right now if an issue has errors, but manages to slip through to the batching phase, fixing the batch requires low-level database fixing and running a command-line utility.

In general, there are undocumented problems which can happen out of the application's scope, and which can only be fixed by manual intervention due to features we haven't had time to build and/or general human error inherent in publisher-uploaded PDFs and scanned+OCRed historic titles.

Ye be warned.

Project

This project consists of various scripts for converting born-digital PDF newspapers, as well as scanned newspapers, into a one-batch bag which can be ingested into ONI and chronam.

Apologies: this toolsuite was built to meet our needs. It's very likely some of our assumptions won't work for everybody, and it's certain that there are pieces of the suite which need better documentation.

Please refer to the wiki for detailed documentation.

Directories

Path Synopsis
src
chronam
Package chronam describes various internal structures for deserializing data from the live app's APIs and data
Package chronam describes various internal structures for deserializing data from the live app's APIs and data
cli
Package cli provides helpers for common NCA command-line tools' needs
Package cli provides helpers for common NCA command-line tools' needs
cmd/server/internal/responder
Package responder contains all the general functionality necessary for responding to a given server request: template setup, user auth checks, rendering of pages to an http.ResponseWriter
Package responder contains all the general functionality necessary for responding to a given server request: template setup, user auth checks, rendering of pages to an http.ResponseWriter
cmd/server/internal/settings
Package settings just holds global variables all internal packages need to access, such as whether debug is on
Package settings just holds global variables all internal packages need to access, such as whether debug is on
config
Package config is the project-specific configuration reader / parser / validator.
Package config is the project-specific configuration reader / parser / validator.
db
derivatives/jp2
Package jp2 converts a PDF or TIFF into a JP2.
Package jp2 converts a PDF or TIFF into a JP2.
issuefinder
Package issuefinder sets up a process for finding all issues across the filesystem and live sites to allow for other tools to get fairly comprehensive information: where in the workflow an issue resides, which batches contain a certain LCCN, which issues have dupes, etc.
Package issuefinder sets up a process for finding all issues across the filesystem and live sites to allow for other tools to get fairly comprehensive information: where in the workflow an issue resides, which batches contain a certain LCCN, which issues have dupes, etc.
issuewatcher
Package issuewatcher wraps the issuefinder.Finder with some app-specific know-how in order to layer on top of the generic issuefinder to include behaviors necessary for finding issues from all known locations by reading our settings file and running the appropriate searches.
Package issuewatcher wraps the issuefinder.Finder with some app-specific know-how in order to layer on top of the generic issuefinder to include behaviors necessary for finding issues from all known locations by reading our settings file and running the appropriate searches.
mods
Package mods holds data structures to simplify unmarshaling the Issue XML which holds lots of MODS structures
Package mods holds data structures to simplify unmarshaling the Issue XML which holds lots of MODS structures
schema
Package schema houses simple data types for titles, issues, batches, etc.
Package schema houses simple data types for titles, issues, batches, etc.
shell
Package shell centralizes common exec.Cmd functionality
Package shell centralizes common exec.Cmd functionality
uploads
Package uploads is for the one-off validation / queue processing which only applies to issues which aren't yet in the workflow.
Package uploads is for the one-off validation / queue processing which only applies to issues which aren't yet in the workflow.
version
Package version is just for holding high-level versioning information for the project as a whole
Package version is just for holding high-level versioning information for the project as a whole
web
Package web is a namespace-only package for various web helpers specific to the NCA tools
Package web is a namespace-only package for various web helpers specific to the NCA tools
web/tmpl
Package tmpl wraps a lot of html/template for easier use with common layout setup and sub-templates
Package tmpl wraps a lot of html/template for easier use with common layout setup and sub-templates
web/webutil
Package webutil holds functions and data that other packages may need in order to generate URLs, find static files, etc.
Package webutil holds functions and data that other packages may need in order to generate URLs, find static files, etc.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL