span-crossref-sync

command
v0.1.361 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2024 License: GPL-3.0 Imports: 21 Imported by: 0

Documentation

Overview

span-crossref-sync downloads and caches raw crossref messages from the crossref works API: https://www.crossref.org/documentation/retrieve-metadata/rest-api/

Example usage:

   $ span-crossref-sync \
	         -p zstd \                    # compress program
	         -P feed-1- \                 # file prefix (to separate different runs)
	         -i d \                       # interval (daily)
	         -verbose \                   # verbose
	         -t 30m \                     # timeout
	         -s 2022-01-01 \              # start
	         -e 2023-05-01 \              # end (leave out for default: yesterday)
	         -c /data/finc/crossref/      # cache dir

Space requirements: One day yields about 1M update docs, or a ~2GB compressed file. A year equates to about 800G of compressed data.

This can run independently of other conversion processes, e.g. in a daily cron job. Processes that need this data can manually find files or create a snapshot.

Data point: https://github.com/miku/filterline#data-point-crossref-snapshot

As of 02/2024 we have 768 files (for "feed-1-") using 2.1TB (zstd, est. 12TB uncompressed).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL