muscato

module
v0.0.0-...-468f5dc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2019 License: MIT

README

Muscato (Multi-Genome Scalable Alignment Tool)

Muscato is a software tool for matching a collection of read sequences into a collection of target sequences (e.g. gene sequences). The approach scales efficiently to hundreds of millions of reads and target sequences. A major goal of Muscato is to perform exhaustive multi-mapping, meaning that each read is mapped to as many gene sequences as possible, subject to specified match quality constraints.

Installation

Muscato is written in Go and uses several of the Gnu core utilities. It should run on any Unix-like system on which the Go tool and Gnu utilities are available.

In most cases, installation of Muscato should only require running the following commands in the shell:

go get github.com/kshedden/muscato/...

go get github.com/kshedden/sztool/...

The executables for Muscato and its auxiliary scripts should appear in your GOBIN directory (usually ${HOME}/go/bin if installed in a user account). You will need to add GOBIN to your PATH environment variable when using Muscato. If you are using the Bash shell enter the following lines at the shell prompt, or add them to your .bashrc file to make the changes permanent.

export GOPATH=${HOME}/go
export GOBIN=${GOPATH}/bin
export PATH=${GOBIN}:${PATH}

The easiest way to update Muscato is to run rm -r on your muscato source directory (in go/src/github.com/kshedden), then reinstall as above.

Basic usage

Before running Muscato, you should prepare a version of your target sequence file using the muscato_prep_targets program. If your targets are in a fasta format file, you can simply run:

muscato_prep_targets genes.fasta

Instead of using a fasta input file, it is also possible to use a plain text file with the format id<tab>sequence<newline> for each target sequence. The sequence should consist of the upper-case characters A, T, G, and C. Any other letters are replaced with 'X'.

The muscato_prep_targets script accepts a -rev flag in which reverse complement target sequences are added to the database along with the original sequences.

After building the target datafile, you can run muscato. A basic invocation is:

muscato --ReadFileName=reads.fastq --GeneFileName=genes.fasta.sz --GeneIdFileName=genes_ids.txt.sz\
        --Windows=0,20 --WindowWidth=15 --MaxReadLength=100

Note that the target files genes.fasta.sz and genes_ids.sz were produced by the muscato_prep_targets script, run as shown above.

Many other command-line flags are available, run muscato --help for more information. The output of muscato --help is here.

The results by default are written to a file named results.txt, a tab delimited file with the following columns:

  1. Read sequence

  2. Matching subsequence of a target sequence

  3. Position within the target where the read matches (counting from 0)

  4. Number of mismatches

  5. Target sequence identifier

  6. Target sequence length

  7. Number of copies of the read in the read pool

  8. Read identifier

The tool also generates a fastq file containing all non-matching reads.

Logging

Several log files are written to the directory muscato_logs/#####, where ##### is the same unique id used for the temporary files. High-level logging messages are written to 'muscato.log'. More detailed logging information is written to logs specific to each component of the tool, e.g. 'muscato_screen.log'.

Temporary workspace

Muscato uses a temporary directory for intermediate and logging files, by default named muscato_tmp/######, where ###### is a unique id generated by Muscato. If NoCleanTemp is set to false (the default), this directory is automatically deleted after completion of the muscato run, otherwise it is retained. If retained, the temporary directory can be safely deleted when desired.

Testing

There is currently a small collection of unit tests in the tests directory. To run the tests, enter the test directory and type:

go run test.go

Any errors will be printed to the terminal. Detailed results of the tests are written to the file test.log.

Dependencies

Muscato has the following dependencies. The sztool package must me installed manually with go get, as shown above. All other dependencies should be automatically installed by go get when installing muscato.

github.com/kshedden/sztool

github.com/chmduquesne/rollinghash

github.com/golang-collections/go-datastructures/bitarray

github.com/golang/snappy

github.com/willf/bloom

Issues and feedback

Please file an issue if you encounter any difficulties.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL