faidx

package module
v0.0.0-...-c39eb85 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2020 License: MIT Imports: 8 Imported by: 12

README

Build Status

faidx reader for golang using biogo's io.seqio.fai

f, err := faidx.New("some.fasta") 
check(err)

seq, err := f.Get("chr1", 1234, 4444)

st, err := f.Stats("chr1", 1234, 4444)

// fractions of GC content, CpG content and masked (lower-case)
st.GC, st.CpG, st.Masked

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrorNoFai = errors.New("no fai for fasta")

ErrorNoFai is returned if the fasta doesn't have an associated .fai

Functions

This section is empty.

Types

type FaPos

type FaPos struct {
	Chrom string
	Start int
	End   int

	As uint32
	Cs uint32
	Gs uint32
	Ts uint32
	// contains filtered or unexported fields
}

FaPos allows the user to specify the position and internally, faidx will store information in it to speed GC calcs to adjacent regions. Useful for, when we sweep along the genome 1 base at a time, but we want to know the GC content for a window around each base.

func (*FaPos) Duplicity

func (p *FaPos) Duplicity() float32

Duplicity returns a scaled entropy value of the counts of each base in p. Values approaching 1 are repetitive sequence values close to 0 have a more even distribution among the bases. This is likely to be called after `Q()` which populates the base-counts.

type Faidx

type Faidx struct {
	Index fai.Index
	// contains filtered or unexported fields
}

Faidx is used to provide random access to the sequence data.

func New

func New(fasta string) (*Faidx, error)

New returns a faidx object from a fasta file that has an existing index.

func (*Faidx) At

func (f *Faidx) At(chrom string, pos int) (byte, error)

At takes a single point and returns the single base.

func (*Faidx) Close

func (f *Faidx) Close()

Close the associated Reader.

func (*Faidx) Get

func (f *Faidx) Get(chrom string, start int, end int) (string, error)

Get takes a position and returns the string sequence. Start and end are 0-based.

func (*Faidx) GetRaw

func (f *Faidx) GetRaw(chrom string, start int, end int) ([]byte, error)

GetRaw takes a position and returns the string sequence that includes the newlines. Start and end are 0-based.

func (*Faidx) Q

func (f *Faidx) Q(pos *FaPos) (uint32, error)

Q returns only the count of GCs it can do the calculation quickly for repeated calls marching to higher bases along the genome. It also updates the number of As, Cs, Ts, and Gs in FaPosition so the user can then calculate Entropy or use Duplicity above.

func (*Faidx) Stats

func (f *Faidx) Stats(chrom string, start int, end int) (Stats, error)

Stats returns the proportion of GC's (GgCc), the CpG content (Cc follow by Gg) and the proportion of lower-case bases (masked). CpG will be 1.0 if the requested sequence is CGC and the base that follows is G

type Stats

type Stats struct {
	// GC content fraction
	GC float64
	// CpG content fraction
	CpG float64
	// masked (lower-case fraction
	Masked float64
}

Stats hold sequenc information.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL