bigly

package module
v0.0.0-...-d281f1e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 9, 2022 License: Apache-2.0 Imports: 13 Imported by: 0

README

bigly: a pileup library that embraces the huge

Build Status GoDoc

bigly is an API and a command-line app (binaries here). It is similar to samtools mpileup but it reports a huge number of additional variables that are useful for structural variant calling and visualization.

For each requested position, the struct below is filled by the appropriate position in any overlapping alignment that meets the requested filters:

// Pile holds the information about a single base.
type Pile struct {
	Chrom                  string
	Pos                    int
	Depth                  int    // count of reads passing filters.
	RefBase                byte   // Reference base a this position.
	MisMatches             uint32 // number of mismatches .
	ProperPairs            int    // count of reads with paired flag
	SoftStarts             uint32 // counts of base preceding an 'S' cigar op
	SoftEnds               uint32 // ...  following ...
	HardStarts             uint32 // counts of base preceding an 'H' cigar op
	HardEnds               uint32
	InsertionStarts        uint32 // counts of base preceding an 'I' cigar op
	InsertionEnds          uint32
	Deletions              uint32  // counts of deletions 'D' at this base
	Heads                  uint32  // counts of starts of reads at this base
	Tails                  uint32  // counts of ends of reads at this base
	Splitters              uint32  // count of non-secondary reads with SA tags.
	Splitters1             uint32  // count of non-secondary reads with exactly 1 SA tag.
	Bases                  []byte  // All bases from reads covering this position
	Quals                  []uint8 // All quals from reads covering this position
	MeanInsertSizeLP       uint32  // Calculated with left-most of pair
	MeanInsertSizeRM       uint32  // Calculated with right-most of pair
	OrientationPlusPlus    uint32  // Paired reads mapped in +/+ orientation
	OrientationMinusMinus  uint32  // Paired reads mapped in -/- orientation
	OrientationMinusPlus   uint32  // Paired reads mapped in -/+ orientation
    OrientationSplitter    uint32  // Count of +/- or -/+ splitters.
	Discordant             uint32  // Number of reads with insert size > ConcordantCutoff
	DiscordantChrom        uint32  // Number of reads mapping on different chroms
	DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
	GC65                   uint32
	GC257                  uint32
	Duplicity65            float32 // measure of lack of sequence entropy.
	Duplicity257           float32 // measure of lack of sequence entropy.
	SplitterPositions      []int
	SplitterStrings        []string
}

The program in cmd/bigly/main.go is distributed as an example program of what one can do with this library--namely make an enhanced pileup in a few lines of code.

Usage

At this time, the usage of the example program is very simple. Default exclude flags are (sam.Unmapped | sam.QCFail | sam.Duplicate)

bigly $bam $chrom:$start-$end > o

if a reference is specified with -r it will report statistics about GC content in windows surrounding each base.

help:

bigly 0.2.0
usage: bigly [--minbasequality MINBASEQUALITY] [--minmappingquality MINMAPPINGQUALITY] [--excludeflag EXCLUDEFLAG] [--includeflag INCLUDEFLAG] [--mincliplength MINCLIPLENGTH] [--includebases] [--splitterverbosity SPLITTERVERBOSITY] [--reference REFERENCE] BAMPATH REGION

positional arguments:
  bampath
  region

options:
  --minbasequality MINBASEQUALITY, -q MINBASEQUALITY
                         base quality threshold [default: 10]
  --minmappingquality MINMAPPINGQUALITY, -Q MINMAPPINGQUALITY
                         mapping quality threshold [default: 5]
  --excludeflag EXCLUDEFLAG, -F EXCLUDEFLAG
  --includeflag INCLUDEFLAG, -f INCLUDEFLAG
  --mincliplength MINCLIPLENGTH, -c MINCLIPLENGTH
                         only count H/S clips of at least this length [default: 15]
  --includebases, -b     output each base and base quality score
  --splitterverbosity SPLITTERVERBOSITY, -s SPLITTERVERBOSITY
                         0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions
  --reference REFERENCE, -r REFERENCE
                         optional path to reference fasta.
  --help, -h             display this help and exit
  --version              display version and exit

API

GoDoc Documentation is here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly?status.png)

With a supporting library easing regional bam queries here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly/bamat?status.png)

View the tests for more examples. bigly uses biogo library for bam access.

Plotting

python scripts/plotter.py bigly.output

will make a plot with python+matplotlib on output from bigly.

An example image looks like: Example

This is a homozygous deletion easily seen from the depth, but we can see that the deletion is nicely delineated by soft-clips and we see aberrant insert sizes bounding the deletion. In most libraries, we would also see splitters flanking the region.

TODO

  • Track strand of bases.
  • Report the 5th and 95th percentile of insert size.
  • More efficient insert-size calc

Credits

Ryan Layer's svv was the inspiration for the plotter.

Obviously, samtools is the reference pileup implementation.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var SkipBase byte = '.'

SkipBase holds the byte used for a Skip Operation.

Functions

func FirstMatch

func FirstMatch(c sam.Cigar) int

FirstMatch reports the first base in the read that matches the reference.

func Mode

func Mode(a []int) (int, int)

Mode returns the most frequent value and the count of times it was seen.

func ReadPieces

func ReadPieces(c sam.Cigar) []int

ReadPieces gives the offsets into the cigar that consome reference bases.

func RefPieces

func RefPieces(pos int, c sam.Cigar) []int

RefPieces returns an array of start, end pairs of places where the given Cigar covers the reference. E.g. 6, 8M2I4M1D3M will return {6, 18, 19, 22} meaning 6-18, 19-22.

Types

type Align

type Align struct {
	*sam.Record
	// track the into the Cigar Ops.
	CursorCigar int
	// track basepairs into the Reference
	CursorPos int
	// track basepairs into the read
	CursorRead int

	// Sequence holds the expanded sequence.
	Sequence []byte
	// contains filtered or unexported fields
}

Align is a sam.Record with a cursor to track position in the read and reference.

func (*Align) At

func (a *Align) At(pos0 int) *CigarSummary

At returns the CigarOp for a particular genomic position of the given read.

type CigarSummary

type CigarSummary struct {
	At        sam.CigarOp
	Left      sam.CigarOp
	Right     sam.CigarOp
	Qual      uint8
	Head      bool
	Tail      bool
	Base      byte
	Insertion []byte
}

CigarSummary is a summary of the context of a position for 1 alignment

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator generates piles the order they were queried. Internally, it holds a cache of alignment objects that overlap the current position. It requests new ones and drops old ones as they start or end overlapping with the (updated) pile position.

func AtUp

func AtUp(b *bamat.BamAt, opts Options, pos Position, fai *faidx.Faidx) *Iterator

AtUp performs the pileup given a BamAt object.

func Up

func Up(bampath string, opts Options, pos Position, fai *faidx.Faidx) *Iterator

Up performs the pileup given a path to a bam

func (*Iterator) Close

func (it *Iterator) Close() error

Close the underlying bam iterator and bam file.

func (*Iterator) Error

func (it *Iterator) Error() error

Error returns any error encountered by the Iterator

func (*Iterator) Next

func (it *Iterator) Next() bool

Next returns true as long as any remaning pileups are available.

func (*Iterator) Pile

func (it *Iterator) Pile() *Pile

Pile returns the next pile from the iterator.

type Options

type Options struct {
	MinBaseQuality    uint8  `arg:"-q,help:base quality threshold"`
	MinMappingQuality uint8  `arg:"-Q,help:mapping quality threshold"`
	ExcludeFlag       uint16 `arg:"-F"`
	IncludeFlag       uint16 `arg:"-f"`
	MinClipLength     int    `arg:"-c,help:only count H/S clips of at least this length"`
	IncludeBases      bool   `arg:"-b,help:output each base and base quality score"`
	SplitterVerbosity int    `arg:"-s,help:0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions"`
	ConcordantCutoff  int    `arg:"-o,help:distance beyond which mates are called discordant"`
}

Options holds information about what is reported in the Pile

type Pile

type Pile struct {
	Chrom                 string
	Pos                   int
	Depth                 int    // count of reads passing filters.
	RefBase               byte   // Reference base a this position.
	MisMatches            uint32 // number of mismatches .
	ProperPairs           int    // count of reads with paired flag
	SoftStarts            uint32 // counts of base preceding an 'S' cigar op
	SoftEnds              uint32 // ...  following ...
	HardStarts            uint32 // counts of base preceding an 'H' cigar op
	HardEnds              uint32
	InsertionStarts       uint32 // counts of base preceding an 'I' cigar op
	InsertionEnds         uint32
	Deletions             uint32  // counts of deletions 'D' at this base
	Skips                 uint32  // counts of skips 'N' at this base
	Heads                 uint32  // counts of starts of reads at this base
	Tails                 uint32  // counts of ends of reads at this base
	Splitters             uint32  // count of non-secondary reads with SA tags.
	Splitters1            uint32  // count of non-secondary reads with exactly 1 SA tag.
	Bases                 []byte  // All bases from reads covering this position
	Quals                 []uint8 // All quals from reads covering this position
	InsertSizeLPs         []int32 // All insert size for left-most of pair
	InsertSizeRMs         []int32 // ...                 right-most of pair
	OrientationPlusPlus   uint32  // Paired reads mapped in +/+ orientation
	OrientationMinusMinus uint32  // Paired reads mapped in -/- orientation
	OrientationMinusPlus  uint32  // Paired reads mapped in -/+ orientation
	// count of reads where splitters were in different orientation than read.
	// only set if SplitterVerbosity > 1
	OrientationSplitter uint32
	/*
		TODO: figure out how to do this because these are excluded by ExcludeFlag.
			   could try to calculate and it will ony be non-zero if these are not excluded.C
		Duplicates             uint32  // reads counted as duplicates
		Supplementary          uint32  // reads counted as supplementary
	*/
	Discordant             uint32  // Number of reads with insert size > ConcordantCutoff
	DiscordantChrom        uint32  // Number of reads mapping on different chroms
	DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
	GC65                   uint32
	GC257                  uint32
	Duplicity65            float32 // measure of lack of sequence entropy.
	Duplicity257           float32 // measure of lack of sequence entropy.
	SplitterPositions      []Position
}

Pile holds the information about a single base.

func (Pile) TabString

func (p Pile) TabString(o Options) string

TabString prints a tab-delimited version of the Pile

func (*Pile) Update

func (p *Pile) Update(o Options, alns []*Align)

Update the Pile with info from the Alignment if it meets the requirements in Options

type Position

type Position struct {
	Chrom  string
	Start  int
	End    int
	Strand bool // + == True
}

Position is a chrom, start, end (0-based, half-open)

func (Position) String

func (p Position) String() string

type SA

type SA struct {
	Chrom  []byte
	Pos    int
	Strand bool
	Cigar  []byte
	MapQ   uint8
	NM     uint16

	Parsed sam.Cigar // parsed Cigar
	// contains filtered or unexported fields
}

SA holds the attributes in a SAM SA:Z tag.

func AsSAs

func AsSAs(r *sam.Record, cigs []byte) []*SA

func ParseSA

func ParseSA(sa []byte) SA

ParseSA returns an SA struct from the bytes

func ParseSAs

func ParseSAs(s []byte) []*SA

func RecordToSA

func RecordToSA(r *sam.Record) *SA

func (*SA) End

func (s *SA) End() int

End returns the end of the Cigar string

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL