bigly

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 9, 2016 License: Apache-2.0 Imports: 13 Imported by: 0

README

bigly: a pileup library that embraces the huge

Build Status ![GoDoc] (https://godoc.org/github.com/brentp/bigly?status.png)

bigly is an API and a command-line app. It is similar to samtools mpileup but it reports a huge number of additional variables that are useful for structural variant calling and visualization.

Each for each requested position, the struct below is filled with by the appropriate position in any overlapping alignment that meets the requested filters:

// Pile holds the information about a single base.
type Pile struct {
    Chrom                  string
    Pos                    int
    Depth                  int    // count of reads passing filters.
    RefBase                byte   // Reference base a this position.
    MisMatches             uint32 // number of mismatches .
    ProperPairs            int    // count of reads with paired flag
    SoftStarts             uint32 // counts of base preceding an 'S' cigar op
    SoftEnds               uint32 // ...  following ...
    HardStarts             uint32 // counts of base preceding an 'H' cigar op
    HardEnds               uint32
    InsertionStarts        uint32 // counts of base preceding an 'I' cigar op
    InsertionEnds          uint32
    Deletions              uint32  // counts of deletions 'D' at this base
    Heads                  uint32  // counts of starts of reads at this base
    Tails                  uint32  // counts of ends of reads at this base
    Splitters              uint32  // count of non-secondary reads with SA tags.
    Splitters1             uint32  // count of non-secondary reads with exactly 1 SA tag.
    Bases                  []byte  // All bases from reads covering this position
    Quals                  []uint8 // All quals from reads covering this position
    MeanInsertSizeLP       uint32  // Calcuated with left-most of pair
    MeanInsertSizeRM       uint32  // Calcuated with right-most of pair
    WeirdCount             uint32  // Calcuated with right-most of pair
    Discordant             uint32  // Number of reads with insert size > ConcordantCutoff
    DiscordantChrom        uint32  // Number of reads mapping on different chroms
    DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
    GC65                   uint32  // GC content in a 65 base window centered on the current base.
    GC257                  uint32  
    Duplicity65            float32 // measure of lack of sequence entropy.
    Duplicity257           float32 // measure of lack of sequence entropy.
    SplitterPositions      []int   // Start, End Positions of splitters for reads overlapping this base.
    SplitterStrings        []string
}

The program in cmd/bigly/main.go is distributed as an example program of what one can do with this library--namely make an enhanced pileup in a few lines of code.

Usage

At this time, the usage of the example program is very simple. Default exclude flags are (sam.Unmapped | sam.QCFail | sam.Duplicate)

bigly $bam $chrom:$start-$end > o

if a reference is specified with -r it will report statistics about GC content in windows surrounding each base.

help:

bigly 0.2.0
usage: bigly [--minbasequality MINBASEQUALITY] [--minmappingquality MINMAPPINGQUALITY] [--excludeflag EXCLUDEFLAG] [--includeflag INCLUDEFLAG] [--mincliplength MINCLIPLENGTH] [--includebases] [--splitterverbosity SPLITTERVERBOSITY] [--reference REFERENCE] BAMPATH REGION

positional arguments:
  bampath
  region

options:
  --minbasequality MINBASEQUALITY, -q MINBASEQUALITY
                         base quality threshold [default: 10]
  --minmappingquality MINMAPPINGQUALITY, -Q MINMAPPINGQUALITY
                         mapping quality threshold [default: 5]
  --excludeflag EXCLUDEFLAG, -F EXCLUDEFLAG
  --includeflag INCLUDEFLAG, -f INCLUDEFLAG
  --mincliplength MINCLIPLENGTH, -c MINCLIPLENGTH
                         only count H/S clips of at least this length [default: 15]
  --includebases, -b     output each base and base quality score
  --splitterverbosity SPLITTERVERBOSITY, -s SPLITTERVERBOSITY
                         0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions
  --reference REFERENCE, -r REFERENCE
                         optional path to reference fasta.
  --help, -h             display this help and exit
  --version              display version and exit

API

GoDoc Documentation is here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly?status.png)

With a supporting library easing regional bam queries here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly/bamat?status.png)

View the tests for more examples. bigly uses biogo library for bam access.

Plotting

python scripts/plotter.py bigly.output

will make a plot with python+matplotlib on output from bigly.

An example image looks like: Example

This is a homozygous deletion easily seen from the depth, but we can see that the deletion is nicely delineated by soft-clips and we see aberrant insert sizes bounding the deletion.

TODO

  • Track strand of bases.
  • Report the 5th and 95th percentile of insert size.
  • More efficient insert-size calc

Credits

Ryan Layer's svv was the inspiration for the plotter.

Obviously, samtools is the reference pileup implementation.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ConcordantCutoff = 10000

ConcordantCutoff excludes any pairs with a distance larger than this from the calculation of MeanInsertSize

View Source
var SkipBase byte = '.'

SkipBase holds the byte used for a Skip Operation.

Functions

func FirstMatch

func FirstMatch(c sam.Cigar) int

FirstMatch reports the first base in the read that matches the reference.

func ReadPieces

func ReadPieces(c sam.Cigar) []int

ReadPieces gives the offsets into the cigar that consome reference bases.

func RefPieces

func RefPieces(pos int, c sam.Cigar) []int

RefPieces returns an array of start, end pairs of places where the given Cigar covers the reference. E.g. 6, 8M2I4M1D3M will return {6, 18, 19, 22} meaning 6-18, 19-22.

Types

type Align

type Align struct {
	*sam.Record
	// track the into the Cigar Ops.
	CursorCigar int
	// track basepairs into the Reference
	CursorPos int
	// track basepairs into the read
	CursorRead int

	// Sequence holds the expanded sequence.
	Sequence []byte
	// contains filtered or unexported fields
}

Align is a sam.Record with a cursor to track position in the read and reference.

func (*Align) At

func (a *Align) At(pos0 int) *CigarSummary

At returns the CigarOp for a particular genomic position of the given read.

type CigarSummary

type CigarSummary struct {
	At        sam.CigarOp
	Left      sam.CigarOp
	Right     sam.CigarOp
	Qual      uint8
	Head      bool
	Tail      bool
	Base      byte
	Insertion []byte
}

CigarSummary is a summary of the context of a position for 1 alignment

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator generates piles the order they were queried. Internally, it holds a cache of alignment objects that overlap the current position. It requests new ones and drops old ones as they start or end overlapping with the (updated) pile position.

func Up

func Up(bampath string, opts Options, pos Position, fai *faidx.Faidx) *Iterator

Up is the user-facing function that performs the pileup.

func (*Iterator) Close

func (it *Iterator) Close() error

Close the underlying bam iterator and bam file.

func (*Iterator) Error

func (it *Iterator) Error() error

Error returns any error encountered by the Iterator

func (*Iterator) Next

func (it *Iterator) Next() bool

Next returns true as long as any remaning pileups are available.

func (*Iterator) Pile

func (it *Iterator) Pile() *Pile

Pile returns the next pile from the iterator.

type Options

type Options struct {
	MinBaseQuality    uint8  `arg:"-q,help:base quality threshold"`
	MinMappingQuality uint8  `arg:"-Q,help:mapping quality threshold"`
	ExcludeFlag       uint16 `arg:"-F"`
	IncludeFlag       uint16 `arg:"-f"`
	MinClipLength     int    `arg:"-c,help:only count H/S clips of at least this length"`
	IncludeBases      bool   `arg:"-b,help:output each base and base quality score"`
	SplitterVerbosity int    `arg:"-s,help:0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions"`
}

Options holds information about what is reported in the Pile

type Pile

type Pile struct {
	Chrom                  string
	Pos                    int
	Depth                  int    // count of reads passing filters.
	RefBase                byte   // Reference base a this position.
	MisMatches             uint32 // number of mismatches .
	ProperPairs            int    // count of reads with paired flag
	SoftStarts             uint32 // counts of base preceding an 'S' cigar op
	SoftEnds               uint32 // ...  following ...
	HardStarts             uint32 // counts of base preceding an 'H' cigar op
	HardEnds               uint32
	InsertionStarts        uint32 // counts of base preceding an 'I' cigar op
	InsertionEnds          uint32
	Deletions              uint32  // counts of deletions 'D' at this base
	Heads                  uint32  // counts of starts of reads at this base
	Tails                  uint32  // counts of ends of reads at this base
	Splitters              uint32  // count of non-secondary reads with SA tags.
	Splitters1             uint32  // count of non-secondary reads with exactly 1 SA tag.
	Bases                  []byte  // All bases from reads covering this position
	Quals                  []uint8 // All quals from reads covering this position
	MeanInsertSizeLP       uint32  // Calcuated with left-most of pair
	MeanInsertSizeRM       uint32  // Calcuated with right-most of pair
	WeirdCount             uint32  // Calcuated with right-most of pair
	Discordant             uint32  // Number of reads with insert size > ConcordantCutoff
	DiscordantChrom        uint32  // Number of reads mapping on different chroms
	DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
	GC65                   uint32
	GC257                  uint32
	Duplicity65            float32 // measure of lack of sequence entropy.
	Duplicity257           float32 // measure of lack of sequence entropy.
	SplitterPositions      []int
	SplitterStrings        []string
}

Pile holds the information about a single base.

func (Pile) TabString

func (p Pile) TabString(o Options) string

TabString prints a tab-delimited version of the Pile

func (*Pile) Update

func (p *Pile) Update(o Options, alns []*Align)

Update the Pile with info from the Alignment if it meets the requirements in Options

type Position

type Position struct {
	Chrom string
	Start int
	End   int
}

Position indicates where the pileup is performed

type SA

type SA struct {
	Chrom  []byte
	Pos    int
	Strand int8
	Cigar  []byte
	MapQ   uint8
	NM     uint16

	Parsed sam.Cigar // parsed Cigar
	// contains filtered or unexported fields
}

SA holds the attributes in a SAM SA:Z tag.

func AsSAs

func AsSAs(r *sam.Record, cigs []byte) []*SA

func ParseSA

func ParseSA(sa []byte) SA

ParseSA returns an SA struct from the bytes

func ParseSAs

func ParseSAs(s []byte) []*SA

func RecordToSA

func RecordToSA(r *sam.Record) *SA

func (*SA) End

func (s *SA) End() int

End returns the end of the Cigar string

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL