prepare-csv

command

v0.0.0-...-3814f9c Latest Latest Go to latest Published: Dec 7, 2016 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/dspace-fi/saf-archiver

Links

Open Source Insights

README ¶

Prepare CSV

This is a simple program that transforms a CSV file to CSV form that is required by the saf-packager program. Transformation means basically rearranging columns and changing the separator tokens. It is probably useful mainly in the context of University of Eastern Finland's SoleCRIS to DSpace import -process, but provided here just in case it might provide useful for others.

Installation

prepare-csv is a go-program with no external dependencies. To build it, go to source code folder and type

$ go build

which should result in executable prepare-csv.

Usage

$ prepare-csv config.json input-file.csv

prepare-csv requires two input files, one is configuration file (see below) and another is the data file that is to be processed. Data file should be in CSV-format, default separator is ';', but can be specified in the configuration file. The file should not contain a header file, i.e. all lines are processed. Processed CSV is written to stdout-stream an can be redirected to a file, if necessary, e.g.

$ prepare-csv config.json input-file.csv > output-file.csv

Configuration file

Configuration file is a JSON map with following contents:

{
    "input-separator": ";",
    "output-separator": ";",
    "split-separator": "||",
    "columns":   [
	{ "from": 0, "title": "solecris.id"},
	{ "from": 1, "title": "dc.title"},
	{ "from": 2, "title": "dc.author", "split-by": ";"},
    { "from": 3, "title": "dc.language.iso", "filters": ["uef.isolang"]},
    { "from": 9, "title": "dc.identifier.issn"},
    { "from": 12, "discard": true, "title": "dc.identifier.issue"},
    ],
    "new-columns": [
        { "title": "dc.citation", "generator": "uef.dc-citation" },
    ]
}

input-separator is a string (only first character is relevant) specifying the CSV separator in the input file (default: ";")
output-sepator is a string used to separate fields in outputted CSV, if the field itself contains this character, its content is escaped with double-quotes (default: ";")
split-separator is a string used to separate items within fields that have split-by definied (default: ";")
columns is a list containing column-maps
A column map is a map containing following keys:
- from an integer (starting from zero) that specifies input column
- discard an boolean, if true discards that column (useful in temporarily disabling column, as JSON doesn't have comments)
- title a string specifying the title of this column in output
- split-by a string specifying a string used to separate items within fields in the input file
- filters a list of strings specifying names of filters columns are filtered with. Filtering takes places after replacing the splitter string (split-by) and are applied in the order they are in the list. The up-to-date names can be found in the source code file filter.go and they are listed also below (hopefully up-to-date as well):
  - uef.isolang replace language string with its ISO-639-1 code, eg. "suomi" -> "FI". Source languages are primary those found in UEF's SoleCRIS system.
  - uef.peerreview peer review status (eprint.status), map 0/1 to either http://purl.org/eprint/status/PeerReviewed or http://purl.org/eprint/status/NonPeerReviewed
  - uef.type tries to map document types used in UEF's SoleCRIS system into ePrintTypes.
  - uef.doi tries to format dois into urls (10.1111/etc -> http://doi.org/doi:10.1111/etc) if it seems likely to succeed. Columns are output in the order they are in the columns list.
new-columnsis a list containing new-column maps
new-column map is a map containing the following keys
- title is a string containing the title for the generated column
- generator is a string specifying generator to generator. Generator names can be found in the source code file generators.go.

Author & License

The program was written during 2016 in SURIMA (Suomi rinnakkaistallennuksen mallimaaksi - Finland for a model country in parallel publishing) -project, in the University of Eastern Finland by Ilja Sidoroff ilja.sidoroff@uef.fi. It is licensed with a MIT License.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

prepare-csv.go

Directories ¶

Path	Synopsis
filter
generator

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL