doicache

package module
v0.0.0-...-cdc2417 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2023 License: GPL-3.0 Imports: 13 Imported by: 0

README

doicache

Keep a local cache of DOI API responses.

$ doicache 10.1103/PhysRevLett.118.140402
http://link.aps.org/doi/10.1103/PhysRevLett.118.140402

Installation

$ go install github.com/miku/doicache/cmd/doicache@latest

Usage

Usage of doicache:
  -db string
        leveldb directory (default "/tmp/.doicache/default")
  -dk
        dump keys
  -dkv
        dump keys and redirects
  -ttl duration
        entry expiration (default 5760h0m0s)
  -verbose
        be verbose
  -version
        show version

Dump all keys:

$ doicache -dk
10.1103/PhysRevLett.118.140402

Adjust expiration date:

$ doicache -verbose -ttl 1s 10.1103/PhysRevLett.118.140402
INFO[0000] entry expired
INFO[0000] https://doi.org/api/handles/10.1103/PhysRevLett.118.140402
INFO[0001] {"Date":"2018-05-25T01:19:02.177003048+02:00","Blob":"eyJyZ..."}
http://link.aps.org/doi/10.1103/PhysRevLett.118.140402

Read input from a file:

$ doicache < file

Example:

$ doicache < fixtures/10 | column -t
OK    10.2307/2546078                          https://www.jstor.org/stable/2546078?origin=crossref
OK    10.9783/9780812207729.91                 http://www.degruyter.com/view/books/9780812207729/97...
OK    10.1590/S0100-40422009000900046          http://www.scielo.br/scielo.php?script=sci_arttext&p...
OK    10.1097/00043764-199710000-00015         https://insights.ovid.com/crossref?an=00043764-19971...
H404  10.1016/jpaid.2003.07.001                NOTAVAILABLE
OK    10.1093/acrefore/9780199381135.013.205   http://classics.oxfordre.com/view/10.1093/acrefore/9...
OK    10.1037/h0050516                         http://content.apa.org/journals/ccp/17/3/232b
OK    10.1016/j.avb.2016.06.006                http://linkinghub.elsevier.com/retrieve/pii/S135917...
OK    10.4028/www.scientific.net/amm.29-32.61  https://www.scientific.net/AMM.29-32.61
OK    10.1136/bmj.2.1493.309                   http://www.bmj.com/cgi/doi/10.1136/bmj.2.1493.309

Status codes:

  • OK
  • H404 (invalid DOI)
  • EURL (invalid URL)

Limitation

Via LevelDB, only one process can access the cache at a time.


API docs: https://www.doi.org/factsheets/DOIProxy.html#rest-api - an example response:

{
  "responseCode": 1,
  "handle": "10.1103/PhysRevLett.118.140402",
  "values": [
    {
      "index": 1,
      "type": "URL",
      "data": {
        "format": "string",
        "value": "http://link.aps.org/doi/10.1103/PhysRevLett.118.140402"
      },
      "ttl": 86400,
      "timestamp": "2017-04-06T02:10:03Z"
    },
    {
      "index": 700050,
      "type": "700050",
      "data": {
        "format": "string",
        "value": "20170405220855"
      },
      "ttl": 86400,
      "timestamp": "2017-04-06T02:10:03Z"
    },
    {
      "index": 100,
      "type": "HS_ADMIN",
      "data": {
        "format": "admin",
        "value": {
          "handle": "0.na/10.1103",
          "index": 200,
          "permissions": "111111110010"
        }
      },
      "ttl": 86400,
      "timestamp": "2017-04-06T02:10:03Z"
    }
  ]
}

TODO

  • threaded requests
  • possible distribute requests among machines; create a drone (binary for target system), send binary to target; let binary send back results.

Distops

Examples:

  • Distributed harvesters

Interface:

func Mapper(b []byte) ([]byte, error) { ... }

This function can be distributed among threads or machines.

A single reducer is run on the host:

func Reducer(b []byte) error { ... }

Input is a sequence of items, each represented at the lowest level as bytes, e.g. reading a file off disk line by line.

  • create target programs that can communicate (HTTP, gRPC)
  • host reads input, sends data to minions
  • host receives results and runs a reducer
  • various error policies

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrCannotResolve   = errors.New("resolution failed")
	ErrMissingURLValue = errors.New("missing URL redirect entry")
	ErrMissingValueKey = errors.New("missing value key")
	ErrInvalidURL      = errors.New("invalid URL")

	Endpoint = "https://doi.org/api/handles"
)

Functions

func UserHomeDir

func UserHomeDir() string

UserHomeDir returns the home directory of the user.

Types

type Cache

type Cache struct {
	Endpoint string
	TTL      time.Duration
	Verbose  bool
	// contains filtered or unexported fields
}

Cache wraps the backend. XXX: Try to mitigate hot DNS servers by hardcoding a few of the doi.org IPs.

func New

func New(filename string) *Cache

New returns a new cache read to be queried.

func NewTTL

func NewTTL(filename string, ttl time.Duration) *Cache

NewTTL creates a new cache with a default expiration date.

func (*Cache) Close

func (c *Cache) Close() error

Close the underlying resources.

func (*Cache) DumpKeyValues

func (c *Cache) DumpKeyValues(w io.Writer) error

DumpKeyValues writes status, url and redirect as tabbed values to writer.

func (*Cache) DumpKeys

func (c *Cache) DumpKeys(w io.Writer) error

DumpKeys writes all keys to the writer, one per line.

func (*Cache) Get

func (c *Cache) Get(key string) ([]byte, error)

Get retrieves the blob associated with a key. This will go out to doi.org, if the value has not been found in the local database or the local copy has expired.

func (*Cache) Name

func (c *Cache) Name() string

Name returns the path to the database file.

func (*Cache) Resolve

func (c *Cache) Resolve(doi string) (string, error)

Resolve returns the redirect URL for a given DOI.

type Entry

type Entry struct {
	Date time.Time
	Blob []byte
}

Entry to cache. Contains raw bytes of response and some metadata.

type ProtocolError

type ProtocolError struct {
	Location   string
	Message    string
	StatusCode int
}

ProtocolError keeps HTTP status codes.

func (ProtocolError) Error

func (e ProtocolError) Error() string

type Response

type Response struct {
	Handle       string `json:"handle"`
	ResponseCode int64  `json:"responseCode"`
	Values       []struct {
		Data      interface{} `json:"data"`
		Index     int64       `json:"index"`
		Timestamp string      `json:"timestamp"`
		TTL       int64       `json:"ttl"`
		Type      string      `json:"type"`
	} `json:"values"`
}

Response from doi.org/api/handles endpoint.

func (Response) RedirectURL

func (r Response) RedirectURL() (string, error)

RedirectURL returns the first data value of a URL type value.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL