chessdatamanagement

command module

v0.0.0-...-c15ad15 Latest Latest Go to latest Published: Sep 15, 2020 License: MIT Imports: 19 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/vkuznet/chessdatamanagement

README ¶

Chess Data Management service

Introduction

The CHESS data flow has been discussed in this document.

Here we propose a possible architecture for CHESS data management based on gradual enchancement of existing infrastructure:

ChessDataManagement

In particular, we propose to introduce the following components:

MetaData DB based on MongoDB or similar document-oriented database. Such solution should provide the following features:
- be able to handle free-structured text documents
- provide reach QueryLanguage (QL)
Files DB based on any relation database, e.g. MySQL or free alternative MariaDB. The purpose of this database is provide data bookkeeping capabilities and organize meta-data in the following form:
- a dataset is a collection of files (or blocks)
- each dataset name may carry on an Experiment name and additional meta-data information
- organize files in specific data-tiers, e.g. RAW for raw data, AOD for processed data, etc.
- as such each dataset will have a form of a path: /Experiment/Processing/Tier

Both databases may reside in their own data-service called MetaData Service. Such service can provide RESTful APIs for end-users, such as

inject data to DBs
fetch results
update data in DBs
delete data in DBs

In addition, we suggest to introduce Input Data Service which can take care of standardization of user inputs, e.g. key-value pairs, tagging, etc. It is not required originally, but will help in a long run to provide uniform data representation for Meta Data Service.

Finally, the data access can be organized via XrootD service.

Insert data into MetaData DB:

We provide a chess_parser.py script to parse input Microsoft Word documents, extract and inject its content into MongoDB. Here is an example of such operation

# prepare files.db (so far we use SqliteDB)
rm files.db; sqlite3 files.db < doc/schema.sql

# start MongoDB

# prepare parameter file which will contain DB, experiment parameters
# so far I choose JSON data-format, but it can be replaced with
# YAML data-format which is much more simpler and intuitive for end-users
cat doc/params.json
{
    "fname": "doc/miller-774-1_beamtime_notes.docx",
    "path": "files",
    "dburi": "mongodb://localhost:8230",
    "dbname": "chess",
    "dbcoll": "meta",
    "filesdb": "files.db",
    "experiment": "Titanium",
    "processing": "FirstPass",
    "tier": "RAW"
}

# inject data into MongoDB (MetaDataDB) and Sqlite (FilesDB)
./chess_parser.py --params=doc/params.json --verbose

Find documents in MetaData DB

We provide basic implementation of finder script chess_finder.py which should be able to find required meta-data in MongoDB via provide free-text query:

# find meta-data information
./chess_finder.py --params=doc/params.json --query="scan 74-77"
# find corredponding files
./chess_finder.py --params=doc/params.json --query="scan 74-77" --list-files --verbose

Please note, to perform free text search queries we need to define text index, e.g.

db.meta.createIndex( { description: "text" } )

References

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

chess_client.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL