srasearch

command module
v0.0.0-...-98814c4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2015 License: ISC Imports: 9 Imported by: 0

README

SRASearch

A NCBI Sequence Read Archive (SRA) upload management search utility.

The SRA produces a regularly updated set of batch telemetry files for laboratories that submit data to its repository. The telemetry files can help track the information a submitter has sent from a given center with what the SRA has received and processed.

This utility processes these telemetry files and presents that data through a "Google"-esque search interface.

BUILDING

The instructions assume that you have access to a Go compiler.

Setup a Go workspace

Initialize a proper Go development workspace:

mkdir /path/to/project
cd /path/to/project
export GOPATH=$PWD
export GOROOT=$(go env GOROOT)

Setup the git repository:

mkdir -p $GOPATH/src/github.com/indraniel/
cd $GOPATH/src/github.com/indraniel/
git clone git@github.com:indraniel/srasearch.git
cd srasearch/
Initialize the dependencies
make prepare
Build the app
make

You should now see a srasearch executable inside the $GOPATH/src/github.com/indraniel/srasearch directory. You can move that file to wherever you please.

USAGE

For the examples below we are using the SRA batch telemetry files available to the McDonnell Genome Institute; which has a NCBI center abbreviation name WUGSC.

The SRA provides monthly full and daily incremental versions of the batch telemetry files. Using these we can create an "SRA Dump" file, an intermediary file which is a collection of JSON documents.

srasearch will build upon prior "SRA Dumps" as new incremental telemetry files are obtained.

Initializing a SRA Dump

This command uses the full metadata and data transfer telemetry files produced at the beginning of the month. For example, here is how the command would run on May 1, 2015:

srasearch init-dump -m data/SRA/NCBI_SRA_Metadata_Full_WUGSC_20150501.tar.gz -u data/SRA/NCBI_SRA_Files_Full_WUGSC_20150501.gz -o 2015-05-01.dump.dat.gz

Here we initialized a SRA Dump file named 2015-05-01.dump.dat.gz.

Incrementing an existing SRA Dump

This command uses the incremental metadata and data transfer telemetry files and a prior existing SRA Dump file. For example, this is how the command would run on May 2, 2015:

srasearch increment-dump -i 2015-05-01.dump.dat.gz -m /path/to/NCBI_SRA_Metadata_WUGSC_20150502.tar.gz -u data/SRA/NCBI_SRA_Files_WUGSC_20150502.gz -o 2015-05-02.dump.dat.gz

Here we initialized a SRA Dump file named 2015-05-02.dump.dat.gz.

On May 3, 2015, the command to create a new updated "SRA Dump" file would look like so:

srasearch increment-dump -i 2015-05-02.dump.dat.gz -m /path/to/NCBI_SRA_Metadata_WUGSC_20150503.tar.gz -u data/SRA/NCBI_SRA_Files_WUGSC_20150503.gz -o 2015-05-03.dump.dat.gz

We can proceed onwards simliarly through the rest of the month. Once the next month arrives we can initialize a brand new SRA Dump again.

Creating a search index from a given SRA Dump

Given an SRA Dump file, a primary search index database can be constructed. This sub-command creates the primary search index database:

srasearch make-index -i 2015-05-03.dump.dat.gz -o /path/to/db/srasearch0503.idx
Creating a recent uploads file (optional)

This sub-command creates an abbreviated "recent uploads" TSV file:

srasearch-noweb make-uploads --ncbi-uploads="/path/to/SRA/NCBI_SRA_Files_WUGSC_20150503.gz" --output-dir="/path/to/db/srasearch0503.idx" --threshold=4000 

In this example, we've placed the last recent 4000 uploads (from May 3, 2015) as a file inside the primary search index database directory called /path/to/db/srasearch0503.idx/recent-4000-sra-uploads-20150523.tsv.

This file is generally placed within a search index directory. It provides the data for the "Recent Uploads" link in the web app.

Start up the web app
srasearch web --host="0.0.0.0" --port=9999 --index-path="/path/to/db/srasearch0503.idx"

NOTES

srasearch is using the bleve text indexing library for the underlying search engine. BoltDB is being used for bleve's underlying key/value store.

All the dependecies to this app are stored within this repository and are managed by godep.

LICENSE

ISC

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
Godeps
_workspace/src/github.com/arschles/go-bindata-html-template
Package template allows standard html/template templates to be rendered from contents embedded with the go-bindata tool instead of the filesystem See https://github.com/jteeuwen/go-bindata for more information about embedding binary data with go-bindata.
Package template allows standard html/template templates to be rendered from contents embedded with the go-bindata tool instead of the filesystem See https://github.com/jteeuwen/go-bindata for more information about embedding binary data with go-bindata.
_workspace/src/github.com/blevesearch/bleve
Package bleve is a library for indexing and searching text.
Package bleve is a library for indexing and searching text.
_workspace/src/github.com/blevesearch/bleve/index/store/cznicb
Package cznicb provides an in-memory implementation of the KVStore interfaces using the cznic/b in-memory btree.
Package cznicb provides an in-memory implementation of the KVStore interfaces using the cznic/b in-memory btree.
_workspace/src/github.com/blevesearch/bleve/index/store/gtreap
Package gtreap provides an in-memory implementation of the KVStore interfaces using the gtreap balanced-binary treap, copy-on-write data structure.
Package gtreap provides an in-memory implementation of the KVStore interfaces using the gtreap balanced-binary treap, copy-on-write data structure.
_workspace/src/github.com/blevesearch/bleve/index/upside_down
Package upside_down is a generated protocol buffer package.
Package upside_down is a generated protocol buffer package.
_workspace/src/github.com/blevesearch/segment
Package segment is a library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 http://www.unicode.org/reports/tr29/ Currently only segmentation at Word Boundaries is supported.
Package segment is a library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29 http://www.unicode.org/reports/tr29/ Currently only segmentation at Word Boundaries is supported.
_workspace/src/github.com/boltdb/bolt
Package bolt implements a low-level key/value store in pure Go.
Package bolt implements a low-level key/value store in pure Go.
_workspace/src/github.com/cznic/b
Package b implements a B+tree.
Package b implements a B+tree.
_workspace/src/github.com/cznic/b/example
Package b implements a int->int B+tree.
Package b implements a int->int B+tree.
_workspace/src/github.com/golang/protobuf/proto
Package proto converts data structures to and from the wire format of protocol buffers.
Package proto converts data structures to and from the wire format of protocol buffers.
_workspace/src/github.com/indraniel/go-sra-schemas-1.5/SRA.analysis.xsd_go
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.analysis.xsd
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.analysis.xsd
_workspace/src/github.com/indraniel/go-sra-schemas-1.5/SRA.experiment.xsd_go
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.experiment.xsd
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.experiment.xsd
_workspace/src/github.com/indraniel/go-sra-schemas-1.5/SRA.run.xsd_go
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.run.xsd
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.run.xsd
_workspace/src/github.com/indraniel/go-sra-schemas-1.5/SRA.sample.xsd_go
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.sample.xsd
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.sample.xsd
_workspace/src/github.com/indraniel/go-sra-schemas-1.5/SRA.study.xsd_go
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.study.xsd
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.study.xsd
_workspace/src/github.com/indraniel/go-sra-schemas-1.5/SRA.submission.xsd_go
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.submission.xsd
Auto-generated by the "go-xsd" package located at: github.com/metaleap/go-xsd Comments on types and fields (if any) are from the XSD file located at: SRA.submission.xsd
_workspace/src/github.com/metaleap/go-xsd/types
A tiny package imported by all "go-xsd"-generated packages.
A tiny package imported by all "go-xsd"-generated packages.
_workspace/src/github.com/ryszard/goskiplist/skiplist
Package skiplist implements skip list based maps and sets.
Package skiplist implements skip list based maps and sets.
_workspace/src/github.com/spf13/cobra
Package cobra is a commander providing a simple interface to create powerful modern CLI interfaces.
Package cobra is a commander providing a simple interface to create powerful modern CLI interfaces.
_workspace/src/github.com/spf13/pflag
pflag is a drop-in replacement for Go's flag package, implementing POSIX/GNU-style --flags.
pflag is a drop-in replacement for Go's flag package, implementing POSIX/GNU-style --flags.
_workspace/src/github.com/syndtr/goleveldb/leveldb
Package leveldb provides implementation of LevelDB key/value database.
Package leveldb provides implementation of LevelDB key/value database.
_workspace/src/github.com/syndtr/goleveldb/leveldb/cache
Package cache provides interface and implementation of a cache algorithms.
Package cache provides interface and implementation of a cache algorithms.
_workspace/src/github.com/syndtr/goleveldb/leveldb/comparer
Package comparer provides interface and implementation for ordering sets of data.
Package comparer provides interface and implementation for ordering sets of data.
_workspace/src/github.com/syndtr/goleveldb/leveldb/errors
Package errors provides common error types used throughout leveldb.
Package errors provides common error types used throughout leveldb.
_workspace/src/github.com/syndtr/goleveldb/leveldb/filter
Package filter provides interface and implementation of probabilistic data structure.
Package filter provides interface and implementation of probabilistic data structure.
_workspace/src/github.com/syndtr/goleveldb/leveldb/iterator
Package iterator provides interface and implementation to traverse over contents of a database.
Package iterator provides interface and implementation to traverse over contents of a database.
_workspace/src/github.com/syndtr/goleveldb/leveldb/journal
Package journal reads and writes sequences of journals.
Package journal reads and writes sequences of journals.
_workspace/src/github.com/syndtr/goleveldb/leveldb/memdb
Package memdb provides in-memory key/value database implementation.
Package memdb provides in-memory key/value database implementation.
_workspace/src/github.com/syndtr/goleveldb/leveldb/opt
Package opt provides sets of options used by LevelDB.
Package opt provides sets of options used by LevelDB.
_workspace/src/github.com/syndtr/goleveldb/leveldb/storage
Package storage provides storage abstraction for LevelDB.
Package storage provides storage abstraction for LevelDB.
_workspace/src/github.com/syndtr/goleveldb/leveldb/table
Package table allows read and write sorted key/value.
Package table allows read and write sorted key/value.
_workspace/src/github.com/syndtr/goleveldb/leveldb/util
Package util provides utilities used throughout leveldb.
Package util provides utilities used throughout leveldb.
_workspace/src/github.com/syndtr/gosnappy/snappy
Package snappy implements the snappy block-based compression format.
Package snappy implements the snappy block-based compression format.
_workspace/src/github.com/willf/bitset
Package bitset implements bitsets, a mapping between non-negative integers and boolean values.
Package bitset implements bitsets, a mapping between non-negative integers and boolean values.
_workspace/src/github.com/zenazn/goji
Package goji provides an out-of-box web server with reasonable defaults.
Package goji provides an out-of-box web server with reasonable defaults.
_workspace/src/github.com/zenazn/goji/bind
Package bind provides a convenient way to bind to sockets.
Package bind provides a convenient way to bind to sockets.
_workspace/src/github.com/zenazn/goji/example
Command example is a sample application built with Goji.
Command example is a sample application built with Goji.
_workspace/src/github.com/zenazn/goji/graceful
Package graceful implements graceful shutdown for HTTP servers by closing idle connections after receiving a signal.
Package graceful implements graceful shutdown for HTTP servers by closing idle connections after receiving a signal.
_workspace/src/github.com/zenazn/goji/graceful/listener
Package listener provides a way to incorporate graceful shutdown to any net.Listener.
Package listener provides a way to incorporate graceful shutdown to any net.Listener.
_workspace/src/github.com/zenazn/goji/web
Package web provides a fast and flexible middleware stack and mux.
Package web provides a fast and flexible middleware stack and mux.
_workspace/src/github.com/zenazn/goji/web/middleware
Package middleware provides several standard middleware implementations.
Package middleware provides several standard middleware implementations.
_workspace/src/github.com/zenazn/goji/web/mutil
Package mutil contains various functions that are helpful when writing http middleware.
Package mutil contains various functions that are helpful when writing http middleware.
_workspace/src/golang.org/x/text/transform
Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations.
Package transform provides reader and writer wrappers that transform the bytes passing through as well as various transformations.
_workspace/src/golang.org/x/text/unicode/norm
Package norm contains types and functions for normalizing Unicode strings.
Package norm contains types and functions for normalizing Unicode strings.
web

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL