prohits-viz-analysis

module
v1.0.0-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2021 License: BSD-3-Clause, MIT

README

ProHits-viz analysis scripts

Golang scripts for analyzing data at ProHits-viz.

Prerequisites

Install

go get github.com/knightjdr/prohits-viz-analysis
cd $HOME/go/src/github.com/knightjdr/prohits-viz-analysis
go get -d ./...
go install ./...

Nomenclature

We consider screen data to consist of a series of "conditions" that could be, for example, experimental timepoints, treatments or proteomic baits. For each condition there are "readouts" with associated data for the condition. Readouts will typically by genes or proteins. Readouts have an "abundance" that could be a gene expression value, spectral count or peptide intensity. The abundance is always assumed to be a non-negative number. Finally readouts have a "score" which is an indication of the confidence in the abundance.

All of the tools use this nomenclature. Input data files are expected to be in tabular format with a row for each data point, with each data point having a value for the condition, readout, abundance and score. Files can have additional columns and they will be ignored.

condition readout abundance score
condition a readout x 5 0.23
condition a readout y 15 0.04
condition a readout z 47 0.01
condition b readout x 8 0.21
condition b readout y 13 0.06
condition b readout z 35 0.02
condition c readout x 15 0.04
condition c readout y 5 0.23
condition c readout z 93 0.00

A sample file for testing can be found in sample-files/analysis-file.txt

Command line arguments

Use a json file to specify all analysis settings. The type field is used to specify the type of analysis being performed.

pvanalyze --settings="settings.json"

Settings file format

{
  "fileList": ["file1.txt", "file2.txt"],
  "primaryFilter": 0.01,
  "type": "dotplot"
}

Dot plot analysis

Required arguments
Argument Description
fileList list of files in csv or tsv format
abundance name of column containing abundance values
condition name of column containing condition names
readout name of column containing readout names
score name of column containing readout scores
primaryFilter score filter for readouts, i.e. a readout must pass this filter to be included
secondaryFilter secondary filter for visually marking readouts below the primary filter, but above another threshold
Output options
Argument Description
png out pngs in addition to svg
writeDotplot output a dot plot image
writeHeatmap output a heat map image
writeDistance output condition-condition and readout-readout distance matrices
Image options
Argument Description
abundanceCap threshold for capping abundances on the image
edgeColor1 edge color on dot plots
fillColor1 fill color on dot plots and heat maps

1 Options: blue, green, grey, red, yellow

Data filtering
Argument Description
minimumAbundance minimum threshold a readout must satisfy
scoreType specify if smaller (lte) or larger (gte) scores are better
Data transformation

Data can be transformed prior to analysis:

  1. Control values can be subtracted from readout abundances
  2. Abundances can be adjusted to the "length" of a readout. For example if a readout is a protein, the abundance can be adjusted to the protein length, with abundances from smaller proteins increased and adundances from larger proteins reduced. Specifically the median readout length is divided by each readout's length to calculate the multiplier to use for adjustment.
  3. Abundances can be normalized by condition, either by 1) total condition abundance or 2) a specific readout. For 1) the total readout abundance is summed for each condition and then each condition's readouts are normalized relative to the median of these sums. For 2) the median abundance of a specific readout is used to normalize all other readouts between conditions.
  4. Abundances can be log transformed
Argument Description
control name of column with control values to subtract from abundances (must be a pipe-separated list)
logBase log transform data; options: 2, e, 10
normalization normalize data; options: none, readout, total
normalizationReadout readout to use for normalization
readoutLength name of column with readout lengths for abundance adjustment
Clustering options
Argument Description
clustering options: biclustering, hierarchical, none
Hierarchical
Argument Description
clusteringMethod linkage method; options: average, centroid, complete, mcquitty, median, single and ward
clusteringOptimize optimize leaf order using the method of Bar-Jospeh, et al.
distance distance metric; options: binary, canberra, euclidean, jaccard, manhattan and maximum
Biclustering
Argument Description
biclusteringApprox perform approximate biclustering(faster)
No clustering
Argument Description
conditionClustering cluster by condition; options: none or conditions
conditionList ordered and comma separated list of conditions
readoutClustering cluster by readouts; options: none or readouts
readoutList ordered and comma separated list of readouts

To create images with conditions in a specific order, set conditionClustering to "none" and specify a list of conditions in the order you wish them to appear. If conditionClustering is set to "conditions", conditions will be hierarchically clustered.

You can control what conditions and readouts are shown on the image by setting both conditionClustering and readoutClustering to "none" and supplying lists for each. Alternatively, if you only what to specify a list of conditions, set readoutClustering to "readouts" and all readouts will be included in the analysis and they will be hierarchically clustered.

Tests

go test

Directories

Path Synopsis
cmd
pkg
color
Package color has functions for create color scales and color transformations.
Package color has functions for create color scales and color transformations.
correlation
Package correlation calculates the correlation between matrices or vectors.
Package correlation calculates the correlation between matrices or vectors.
data/filter
Package filter filters data based on conditions, readouts and score.
Package filter filters data based on conditions, readouts and score.
data/parser
Package parser reads csv formatted files and returns specified columns.
Package parser reads csv formatted files and returns specified columns.
data/transform
Package transform will adjust readout values to the user's requirements.
Package transform will adjust readout values to the user's requirements.
downsample
Package downsample will downsample a matrix using area averaging.
Package downsample will downsample a matrix using area averaging.
files
Packages files has functions for interacting with the file system.
Packages files has functions for interacting with the file system.
flags
Package flags handles flag parsing
Package flags handles flag parsing
float
Package float defines functions for float transformations.
Package float defines functions for float transformations.
font
Package font loads and creates a context for rendering text in Arial.
Package font loads and creates a context for rendering text in Arial.
fs
Package fs creates a filesystem to use (for easy mocking) This replaces most of the filesystem and io methods from os and io.
Package fs creates a filesystem to use (for easy mocking) This replaces most of the filesystem and io methods from os and io.
heatmap/dimensions
Package dimensions calculates the dimensions for a heat map
Package dimensions calculates the dimensions for a heat map
interactive
Package interactive generates files for the interactive viewer.
Package interactive generates files for the interactive viewer.
interactive/heatmap
Package heatmap creates an interactive heatmap/dotplot file.
Package heatmap creates an interactive heatmap/dotplot file.
interactive/scatter
Package scatter creates an interactive scatter plot file.
Package scatter creates an interactive scatter plot file.
log
Package log writes a message to a log file or console.
Package log writes a message to a log file or console.
mapf
Package mapf contains functions for manipulating maps.
Package mapf contains functions for manipulating maps.
math
Package math defines common math operations.
Package math defines common math operations.
matrix
Package matrix contains methods for operating on matrices.
Package matrix contains methods for operating on matrices.
matrix/convert
Package convert has functions for convert to matrix format.
Package convert has functions for convert to matrix format.
matrix/frontend
Package frontend creates matrices from frontend format.
Package frontend creates matrices from frontend format.
minimap
Package minimap creates a "small" png for dotplots and heatmaps.
Package minimap creates a "small" png for dotplots and heatmaps.
normalize
Package normalize has functions for normalization.
Package normalize has functions for normalization.
parse
Package parse has functions for parsing data columns.
Package parse has functions for parsing data columns.
png
Package png has functions for generating and converting png images.
Package png has functions for generating and converting png images.
png/heatmap
Package heatmap draws a png heatmap.
Package heatmap draws a png heatmap.
png/uri
Package uri converts a png to a data uri.
Package uri converts a png to a data uri.
read/csv
Package csv reads csv files.
Package csv reads csv files.
slice
Package slice contains functions for manipulating slices
Package slice contains functions for manipulating slices
sort
Package sort contains functions for sorting slices.
Package sort contains functions for sorting slices.
svg
Package svg has functions for generating and converting svg files.
Package svg has functions for generating and converting svg files.
svg/convert
Package convert turns an SVG into a PNG.
Package convert turns an SVG into a PNG.
svg/dotplot
Package dotplot draws a svg dotplot.
Package dotplot draws a svg dotplot.
svg/heatmap
Package heatmap draws a svg heatmap.
Package heatmap draws a svg heatmap.
svg/scatter
Package scatter draws a svg scatter plot.
Package scatter draws a svg scatter plot.
system
Package system has functions for profiling resource usage.
Package system has functions for profiling resource usage.
tools/analyze
Package analyze runs main analysis programs at ProHits-viz.
Package analyze runs main analysis programs at ProHits-viz.
tools/analyze/arguments
Package arguments parses and validates command line arguments.
Package arguments parses and validates command line arguments.
tools/analyze/cc
Package cc creates a scatter plot between two conditions.
Package cc creates a scatter plot between two conditions.
tools/analyze/correlation
Package correlation calculates the correlation between conditions and readouts.
Package correlation calculates the correlation between conditions and readouts.
tools/analyze/dotplot
Package dotplot clusters conditions and readouts for visualization as a dotplot.
Package dotplot clusters conditions and readouts for visualization as a dotplot.
tools/analyze/dotplot/biclustering
Package biclustering clusters data using nestedcluster by H. Choi.
Package biclustering clusters data using nestedcluster by H. Choi.
tools/analyze/dotplot/hierarchical
Package hierarchical clusters data for dot plots.
Package hierarchical clusters data for dot plots.
tools/analyze/dotplot/nocluster
Package nocluster generates dot plots based on requested condition and readout ordering.
Package nocluster generates dot plots based on requested condition and readout ordering.
tools/analyze/settings
Package settings logs analysis settings.
Package settings logs analysis settings.
tools/analyze/specificity
Package specificity creates scatter plots for condition showing readout specificty.
Package specificity creates scatter plots for condition showing readout specificty.
tools/analyze/validate/data
Package data ensures that the input file (passed as slice map) has no errors in formatting.
Package data ensures that the input file (passed as slice map) has no errors in formatting.
tools/analyze/validate/settings
Package settings validates user analysis settings.
Package settings validates user analysis settings.
tools/convert
Package convert takes a file from ProHits-viz V1 and converts it to V2 JSON.
Package convert takes a file from ProHits-viz V1 and converts it to V2 JSON.
tools/convert/settings
Package settings parsed and/or infers settings for files to convert.
Package settings parsed and/or infers settings for files to convert.
tools/export
Package export creates images in png or svg format.
Package export creates images in png or svg format.
tools/export/dotplot
Package dotplot exports images as a dotplot in svg or png format.
Package dotplot exports images as a dotplot in svg or png format.
tools/export/heatmap
Package heatmap exports images in png or svg format.
Package heatmap exports images in png or svg format.
tools/sync
Package sync will create a minimap from settings.
Package sync will create a minimap from settings.
treeview
Package treeview exports a matrix and tree to java treeview format.
Package treeview exports a matrix and tree to java treeview format.
types
Package types contains type declarations.
Package types contains type declarations.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL