seq

package

v0.13.3 Latest Latest Go to latest Published: Mar 11, 2024 License: MIT Imports: 10 Imported by: 17

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/shenwei356/bio

Links

Open Source Insights

README ¶

seq

This package defines Seq and Alphabet type, and provides some basic operations of sequence, like validation of DNA/RNA/Protein sequence, getting reverse complement sequence and translation of RNA to protein.

This package was inspired by biogo.

Documentation ¶

Overview ¶

Package seq defines a *Seq* type, and provides some basic operations of sequence, like validation of DNA/RNA/Protein sequence and getting reverse complement sequence.

This package was inspired by [biogo](https://code.google.com/p/biogo/source/browse/#git%2Falphabet).

IUPAC nucleotide code: ACGTURYSWKMBDHVN

http://droog.gs.washington.edu/parc/images/iupac.html

code	base	Complement
A	A	T
C	C	G
G	G	C
T/U	T	A

M	A/C	K
R	A/G	Y
W	A/T	W
S	C/G	S
Y	C/T	R
K	G/T	M

V	A/C/G	B
H	A/C/T	D
D	A/G/T	H
B	C/G/T	V

X/N	A/C/G/T	X
.	not A/C/G/T
 or-	gap

IUPAC amino acid code

A	Ala	Alanine
B	Asx	Aspartic acid or Asparagine [2]
C	Cys	Cysteine
D	Asp	Aspartic Acid
E	Glu	Glutamic Acid
F	Phe	Phenylalanine
G	Gly	Glycine
H	His	Histidine
I	Ile	Isoleucine
J		Isoleucine or Leucine [4]
K	Lys	Lysine
L	Leu	Leucine
M	Met	Methionine
N	Asn	Asparagine
O		pyrrolysine [6]
P	Pro	Proline
Q	Gln	Glutamine
R	Arg	Arginine
S	Ser	Serine
T	Thr	Threonine
U	Sec	selenocysteine [5,6]
V	Val	Valine
W	Trp	Tryptophan
Y	Tyr	Tyrosine
Z	Glx	Glutamine or Glutamic acid [2]

X	unknown amino acid
.	gaps
*	End

Reference:

https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=tgencodes

Index ¶

Constants
Variables
func AmbBase2Bases0(b byte) ([]byte, error)
func Bases2AmbBase(bs []byte) (byte, error)
func Codes2AmbCode(codes []int) (int, error)
func Degenerate2Seqs(s []byte) (dseqs [][]byte, err error)
func Phred2Solexa(q float64) (float64, error)
func QualityConvert(from, to QualityEncoding, quality []byte, force bool) ([]byte, error)
func QualityValue(encoding QualityEncoding, quality []byte) ([]int, error)
func Solexa2Phred(q float64) (float64, error)
func SubLocation(length, start, end int) (int, int, bool)
type Alphabet
- func GuessAlphabet(seqs []byte) *Alphabet
- func GuessAlphabetLessConservatively(seqs []byte) *Alphabet
- func NewAlphabet(t string, isUnlimit bool, letters []byte, pairs []byte, gap []byte, ...) (*Alphabet, error)
- func (a *Alphabet) AllLetters() []byte
- func (a *Alphabet) AmbiguousLetters() []byte
- func (a *Alphabet) Clone() *Alphabet
- func (a *Alphabet) Gaps() []byte
- func (a *Alphabet) IsValid(s []byte) error
- func (a *Alphabet) IsValidLetter(b byte) bool
- func (a *Alphabet) Letters() []byte
- func (a *Alphabet) PairLetter(b byte) (byte, error)
- func (a *Alphabet) String() string
- func (a *Alphabet) Type() string
type CodonTable
- func NewCodonTable(id int, name string) *CodonTable
- func (t *CodonTable) Clone() CodonTable
- func (t *CodonTable) Get(codon []byte, allowUnknownCodon bool) (byte, error)
- func (t *CodonTable) Get2(codon string, allowUnknownCodon bool) (byte, error)
- func (t *CodonTable) Set(codon []byte, aminoAcid byte) error
- func (t *CodonTable) Set2(codon string, aminoAcid byte) error
- func (t CodonTable) String() string
- func (t CodonTable) StringWithAmbiguousCodons() string
- func (t *CodonTable) Translate(sequence []byte, frame int, trim bool, clean bool, allowUnknownCodon bool, ...) ([]byte, error)
type QualityEncoding
- func GuessQualityEncoding(quality []byte) []QualityEncoding
- func (qe QualityEncoding) IsSolexa() bool
- func (qe QualityEncoding) Offset() int
- func (qe QualityEncoding) QualityRange() []int
- func (qe QualityEncoding) String() string
type Seq
- func NewSeq(t *Alphabet, s []byte) (*Seq, error)
- func NewSeqWithQual(t *Alphabet, s []byte, q []byte) (*Seq, error)
- func NewSeqWithQualWithoutValidation(t *Alphabet, s []byte, q []byte) (*Seq, error)
- func NewSeqWithoutValidation(t *Alphabet, s []byte) (*Seq, error)
- func (seq *Seq) AvgQual(asciiBase int) float64
- func (seq *Seq) BaseContent(list string) float64
- func (seq *Seq) BaseContentCaseSensitive(list string) float64
- func (seq *Seq) BaseCount(list string) int
- func (seq *Seq) BaseCountCaseSensitive(list string) int
- func (seq *Seq) Bases(gapLetters string) int
- func (seq *Seq) Clone() *Seq
- func (seq *Seq) Clone2() *Seq
- func (seq *Seq) Complement() *Seq
- func (seq *Seq) ComplementInplace() *Seq
- func (seq *Seq) Degenerate2Regexp() string
- func (seq *Seq) FormatSeq(width int) []byte
- func (seq *Seq) GC() float64
- func (seq *Seq) Length() int
- func (seq *Seq) ParseQual(asciiBase int)
- func (seq *Seq) RemoveGaps(letters string) *Seq
- func (seq *Seq) RemoveGapsInplace(letters string) *Seq
- func (seq *Seq) RevCom() *Seq
- func (seq *Seq) RevComInplace() *Seq
- func (seq *Seq) Reverse() *Seq
- func (seq *Seq) ReverseInplace() *Seq
- func (seq *Seq) Slider(window int, step int, circular bool, greedy bool) func() (*Seq, bool)
- func (seq *Seq) String() string
- func (seq *Seq) SubSeq(start int, end int) *Seq
- func (seq *Seq) SubSeqInplace(start int, end int) *Seq
- func (seq *Seq) Translate(transl_table int, frame int, trim bool, clean bool, allowUnknownCodon bool, ...) (*Seq, error)

Constants ¶

View Source

const NQualityEncoding int = 6

NQualityEncoding is the number of QualityEncoding + 1: 5 + 1 = 6

Variables ¶

View Source

var AlphabetGuessSeqLengthThreshold = 10000

AlphabetGuessSeqLengthThreshold is the length of sequence prefix of the first FASTA record based which FastaRecord guesses the sequence type. 0 for whole seq

View Source

var AmbBase2Bases = map[byte][]byte{
	'A': {'A'},
	'a': {'A'},
	'C': {'C'},
	'c': {'C'},
	'G': {'G'},
	'g': {'G'},
	'T': {'T'},
	't': {'T'},
	'U': {'T'},
	'u': {'T'},

	'M': {'A', 'C', 'M'},
	'm': {'A', 'C', 'M'},
	'R': {'A', 'G', 'R'},
	'r': {'A', 'G', 'R'},
	'W': {'A', 'T', 'W'},
	'w': {'A', 'T', 'W'},
	'S': {'C', 'G', 'S'},
	's': {'C', 'G', 'S'},
	'Y': {'C', 'T', 'Y'},
	'y': {'C', 'T', 'Y'},
	'K': {'G', 'T', 'K'},
	'k': {'G', 'T', 'K'},

	'V': {'A', 'C', 'G', 'M', 'R', 'S', 'V'},
	'v': {'A', 'C', 'G', 'M', 'R', 'S', 'V'},
	'H': {'A', 'C', 'T', 'M', 'W', 'Y', 'H'},
	'h': {'A', 'C', 'T', 'M', 'W', 'Y', 'H'},
	'D': {'A', 'G', 'T', 'R', 'W', 'K', 'D'},
	'd': {'A', 'G', 'T', 'R', 'W', 'K', 'D'},
	'B': {'C', 'G', 'T', 'S', 'Y', 'K', 'B'},
	'b': {'C', 'G', 'T', 'S', 'Y', 'K', 'B'},

	'N': {'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', 'N'},
	'n': {'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', 'N'},
}

AmbBase2Bases holds relationship of ambiguous base and bases it represents, faster than AmbBase2Bases0

View Source

var AmbCodes2Codes = map[int][]int{
	1: {1},
	2: {2},
	4: {4},
	8: {8},

	3:  {1, 2, 3},
	5:  {1, 4, 5},
	9:  {1, 8, 9},
	6:  {2, 4, 6},
	10: {2, 8, 10},
	12: {4, 8, 12},

	7:  {1, 2, 4, 3, 5, 6, 7},
	11: {1, 2, 8, 3, 9, 10, 11},
	13: {1, 4, 8, 5, 9, 12, 13},
	14: {2, 4, 8, 6, 10, 12, 14},

	15: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
}

AmbCodes2Codes is code version of AmbBase2Bases

View Source

var CodonTables map[int]*CodonTable

CodonTables contains all the codon tables from NCBI:

1: The Standard Code
2: The Vertebrate Mitochondrial Code
3: The Yeast Mitochondrial Code
4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5: The Invertebrate Mitochondrial Code
6: The Ciliate, Dasycladacean and Hexamita Nuclear Code
9: The Echinoderm and Flatworm Mitochondrial Code
10: The Euplotid Nuclear Code
11: The Bacterial, Archaeal and Plant Plastid Code
12: The Alternative Yeast Nuclear Code
13: The Ascidian Mitochondrial Code
14: The Alternative Flatworm Mitochondrial Code
16: Chlorophycean Mitochondrial Code
21: Trematode Mitochondrial Code
22: Scenedesmus obliquus Mitochondrial Code
23: Thraustochytrium Mitochondrial Code
24: Pterobranchia Mitochondrial Code
25: Candidate Division SR1 and Gracilibacteria Code
26: Pachysolen tannophilus Nuclear Code
27: Karyorelict Nuclear
28: Condylostoma Nuclear
29: Mesodinium Nuclear
30: Peritrich Nuclear
31: Blastocrithidia Nuclear

View Source

var ComplementSeqLenThreshold = 1000000

ComplementSeqLenThreshold is the threshold of sequence length that needed to parallelly complement sequence

View Source

var ComplementThreads = runtime.NumCPU()

ComplementThreads is the threads number of parallelly complement sequence

View Source

var DegenerateBaseMapNucl = map[byte]string{
	'A': "A",
	'T': "[TU]",
	'U': "[TU]",
	'C': "C",
	'G': "G",
	'R': "[AG]",
	'Y': "[CTU]",
	'M': "[AC]",
	'K': "[GTU]",
	'S': "[CG]",
	'W': "[ATU]",
	'H': "[ACTU]",
	'B': "[CGTU]",
	'V': "[ACG]",
	'D': "[AGTU]",
	'N': "[ACGTU]",
	'a': "a",
	't': "[tu]",
	'u': "[tu]",
	'c': "c",
	'g': "g",
	'r': "[ag]",
	'y': "[ctu]",
	'm': "[ac]",
	'k': "[gtu]",
	's': "[cg]",
	'w': "[atu]",
	'h': "[actu]",
	'b': "[cgtu]",
	'v': "[acg]",
	'd': "[agtu]",
	'n': "[acgtu]",
}

DegenerateBaseMapNucl mappings nucleic acid degenerate base to regular expression

View Source

var DegenerateBaseMapNucl2 = map[byte]string{
	'A': "A",
	'T': "TU",
	'U': "TU",
	'C': "C",
	'G': "G",
	'R': "AG",
	'Y': "CTU",
	'M': "AC",
	'K': "GTU",
	'S': "CG",
	'W': "ATU",
	'H': "ACTU",
	'B': "CGTU",
	'V': "ACG",
	'D': "AGTU",
	'N': "ACGTU",
	'a': "a",
	't': "tu",
	'u': "tu",
	'c': "c",
	'g': "g",
	'r': "ag",
	'y': "ctu",
	'm': "ac",
	'k': "gtu",
	's': "cg",
	'w': "atu",
	'h': "actu",
	'b': "cgtu",
	'v': "acg",
	'd': "agtu",
	'n': "acgtu",
}

DegenerateBaseMapNucl2 mappings nucleic acid degenerate base to all bases.

View Source

var DegenerateBaseMapProt = map[byte]string{
	'A': "A",
	'B': "[DN]",
	'C': "C",
	'D': "D",
	'E': "E",
	'F': "F",
	'G': "G",
	'H': "H",
	'I': "I",
	'J': "[IL]",
	'K': "K",
	'L': "L",
	'M': "M",
	'N': "N",
	'P': "P",
	'Q': "Q",
	'R': "R",
	'S': "S",
	'T': "T",
	'V': "V",
	'W': "W",
	'X': "[A-Z]",
	'Y': "Y",
	'Z': "[QE]",
	'a': "a",
	'b': "[dn]",
	'c': "c",
	'd': "d",
	'e': "e",
	'f': "f",
	'g': "g",
	'h': "h",
	'i': "i",
	'j': "[il]",
	'k': "k",
	'l': "l",
	'm': "m",
	'n': "n",
	'p': "p",
	'q': "q",
	'r': "r",
	's': "s",
	't': "t",
	'v': "v",
	'w': "w",
	'x': "[a-z]",
	'y': "y",
	'z': "[qe]",
}

DegenerateBaseMapProt mappings protein degenerate base to regular expression

View Source

var ErrInvalidCodon = errors.New("seq: invalid codon")

ErrInvalidCodon means the length of codon is not 3.

View Source

var ErrInvalidDNABase = errors.New("seq: invalid DNA base")

ErrInvalidDNABase means invalid DNA base

View Source

var ErrInvalidPhredQuality = errors.New("seq: invalid Phred quality")

ErrInvalidPhredQuality occurs for phred quality less than 0.

View Source

var ErrInvalidSolexaQuality = errors.New("seq: invalid Solexa quality")

ErrInvalidSolexaQuality occurs for solexa quality less than -5.

View Source

var ErrUnknownCodon = errors.New("seq: unknown codon")

ErrUnknownCodon means the codon is not in the codon table, or the codon contains bases expcet for A C T G U.

View Source

var ErrUnknownQualityEncoding = errors.New("unknown quality encoding")

ErrUnknownQualityEncoding is error for Unknown quality encoding type

View Source

var NMostCommonThreshold = 2

NMostCommonThreshold is the threshold of 'B' in top N most common quality for guessing Illumina 1.5.

View Source

var QUAL_MAP [256]float64

View Source

var ValidSeqLengthThreshold = 10000

ValidSeqLengthThreshold is the threshold of sequence length that needed to parallelly checking sequence

View Source

var ValidSeqThreads = runtime.NumCPU()

ValidSeqThreads is the threads number of parallelly checking sequence

View Source

var ValidateSeq = true

ValidateSeq decides whether check sequence or not

Functions ¶

func AmbBase2Bases0 ¶

func AmbBase2Bases0(b byte) ([]byte, error)

AmbBase2Bases0 converts ambiguous base to bases it represents, slower than AmbBase2Bases

func Bases2AmbBase ¶

func Bases2AmbBase(bs []byte) (byte, error)

Bases2AmbBase converts list of bases to ambiguous base

func Codes2AmbCode ¶

func Codes2AmbCode(codes []int) (int, error)

Codes2AmbCode converts list of codes of bases to code of ambiguous base

func Degenerate2Seqs ¶

func Degenerate2Seqs(s []byte) (dseqs [][]byte, err error)

Degenerate2Seqs transforms seqs containing degenrate bases to all possible sequences.

func Phred2Solexa ¶

func Phred2Solexa(q float64) (float64, error)

Phred2Solexa converts Phred quality to Solexa quality.

func QualityConvert ¶

func QualityConvert(from, to QualityEncoding, quality []byte, force bool) ([]byte, error)

QualityConvert convert quality from one encoding to another encoding. Force means forcely truncate scores > 40 to 40 when converting Illumina-1.8+ to Sanger.

func QualityValue ¶

func QualityValue(encoding QualityEncoding, quality []byte) ([]int, error)

QualityValue returns quality value for given encoding and quality string

func Solexa2Phred ¶

func Solexa2Phred(q float64) (float64, error)

Solexa2Phred converts Solexa quality to Phred quality.

func SubLocation ¶

func SubLocation(length, start, end int) (int, int, bool)

SubLocation is my sublocation strategy, start, end and returned start and end are all 1-based

1-based index    1 2 3 4 5 6 7 8 9 10

negative index 0-9-8-7-6-5-4-3-2-1

           seq    A C G T N a c g t n
           1:1    A
           2:4      C G T
         -4:-2                c g t
         -4:-1                c g t n
         -1:-1                      n
          2:-2      C G T N a c g t
          1:-1    A C G T N a c g t n
		  1:12    A C G T N a c g t n
		-12:-1    A C G T N a c g t n

Types ¶

type Alphabet ¶

type Alphabet struct {
	// contains filtered or unexported fields
}

Alphabet could be defined. Attention that, **the letters are case sensitive**.

For example, DNA:

DNA, _ = NewAlphabet(
	"DNA",
	[]byte("acgtACGT"),
	[]byte("tgcaTGCA"),
	[]byte(" -"),
	[]byte("nN"))

var (
	DNA          *Alphabet
	DNAredundant *Alphabet
	RNA          *Alphabet
	RNAredundant *Alphabet
	Protein      *Alphabet
	Unlimit      *Alphabet
)

Four types of alphabets are pre-defined:

DNA           Deoxyribonucleotide code
DNAredundant  DNA + Ambiguity Codes
RNA           Oxyribonucleotide code
RNAredundant  RNA + Ambiguity Codes
Protein       Amino Acide single-letter Code
Unlimit       Self-defined, including all 26 English letters

func GuessAlphabet ¶

func GuessAlphabet(seqs []byte) *Alphabet

GuessAlphabet guesses alphabet by given

func GuessAlphabetLessConservatively ¶

func GuessAlphabetLessConservatively(seqs []byte) *Alphabet

GuessAlphabetLessConservatively change DNA to DNAredundant and RNA to RNAredundant

func NewAlphabet ¶

func NewAlphabet(
	t string,
	isUnlimit bool,
	letters []byte,
	pairs []byte,
	gap []byte,
	ambiguous []byte,
) (*Alphabet, error)

NewAlphabet is Constructor for type *Alphabet*

func (*Alphabet) AllLetters ¶

func (a *Alphabet) AllLetters() []byte

AllLetters return all letters

func (*Alphabet) AmbiguousLetters ¶

func (a *Alphabet) AmbiguousLetters() []byte

AmbiguousLetters returns AmbiguousLetters

func (*Alphabet) Clone ¶

func (a *Alphabet) Clone() *Alphabet

Clone of a Alphabet

func (*Alphabet) Gaps ¶

func (a *Alphabet) Gaps() []byte

Gaps returns gaps

func (*Alphabet) IsValid ¶

func (a *Alphabet) IsValid(s []byte) error

IsValid is used to validate a byte slice

func (*Alphabet) IsValidLetter ¶

func (a *Alphabet) IsValidLetter(b byte) bool

IsValidLetter is used to validate a letter

func (*Alphabet) Letters ¶

func (a *Alphabet) Letters() []byte

Letters returns letters

func (*Alphabet) PairLetter ¶

func (a *Alphabet) PairLetter(b byte) (byte, error)

PairLetter return the Pair Letter

func (*Alphabet) String ¶

func (a *Alphabet) String() string

String returns type of the alphabet

func (*Alphabet) Type ¶

func (a *Alphabet) Type() string

Type returns type of the alphabet

type CodonTable ¶

type CodonTable struct {
	ID         int
	Name       string
	InitCodons map[string]struct{} // upper-case of codon as string, map for fast quering
	StopCodons map[string]struct{} // upper-case of codon as string, map for fast quering
	// contains filtered or unexported fields
}

CodonTable represents a codon table

func NewCodonTable ¶

func NewCodonTable(id int, name string) *CodonTable

NewCodonTable contructs a CodonTable with ID and Name, you need to set the detailed codon table by calling Set or Set2.

func (*CodonTable) Clone ¶

func (t *CodonTable) Clone() CodonTable

Clone returns a deep copy of the CodonTable.

func (*CodonTable) Get ¶

func (t *CodonTable) Get(codon []byte, allowUnknownCodon bool) (byte, error)

Get returns the amino acid of the codon ([]byte), codon can be DNA or RNA. When allowUnknownCodon is true, codons that not int the codon table will still be translated to 'X', and "---" is translated to "-".

func (*CodonTable) Get2 ¶

func (t *CodonTable) Get2(codon string, allowUnknownCodon bool) (byte, error)

Get2 returns the amino acid of the codon (string), codon can be DNA or RNA.

func (*CodonTable) Set ¶

func (t *CodonTable) Set(codon []byte, aminoAcid byte) error

Set sets a codon of byte slice.

func (*CodonTable) Set2 ¶

func (t *CodonTable) Set2(codon string, aminoAcid byte) error

Set2 sets a codon of string.

func (CodonTable) String ¶

func (t CodonTable) String() string

String returns details of the CodonTable.

func (CodonTable) StringWithAmbiguousCodons ¶

func (t CodonTable) StringWithAmbiguousCodons() string

StringWithAmbiguousCodons returns details of the CodonTable， including ambiguous codons.

func (*CodonTable) Translate ¶

func (t *CodonTable) Translate(sequence []byte, frame int, trim bool, clean bool, allowUnknownCodon bool, markInitCodonAsM bool) ([]byte, error)

Translate translates a DNA/RNA sequence to amino acid sequences. Available frame: 1, 2, 3, -1, -2 ,-3. If option trim is true, it removes all 'X' and '*' characters from the right end of the translation. If option clean is true, it changes all STOP codon positions from the '*' character to 'X' (an unknown residue). If option allowUnknownCodon is true, codons not in the codon table will be translated to 'X'. If option markInitCodonAsM is true, initial codon at beginning will be represented as 'M'.

type QualityEncoding ¶

type QualityEncoding int

QualityEncoding is the type of quality encoding

const (
	// Unknown quality encoding
	Unknown QualityEncoding = iota
	// Sanger format can encode a Phred quality score from 0 to 93 using
	// ASCII 33 to 126 (although in raw read data the Phred quality score
	// rarely exceeds 60, higher scores are possible in assemblies or read maps).
	Sanger
	// Solexa /Illumina 1.0 format can encode a Solexa/Illumina quality score
	// from -5 to 62 using ASCII 59 to 126 (although in raw read data Solexa
	// scores from -5 to 40 only are expected).
	Solexa
	// Illumina1p3 means Illumina 1.3+.
	// Starting with Illumina 1.3 and before Illumina 1.8, the format
	// encoded a Phred quality score from 0 to 62 using ASCII 64 to 126
	// (although in raw read data Phred scores from 0 to 40 only are expected).
	Illumina1p3
	// Illumina1p5 means Illumina 1.5+.
	// Starting in Illumina 1.5 and before Illumina 1.8, the Phred scores
	//  0 to 2 have a slightly different meaning. The values 0 and 1 are
	// no longer used and the value 2, encoded by ASCII 66 "B", is used
	// also at the end of reads as a Read Segment Quality Control Indicator.
	Illumina1p5
	// Illumina1p8 means Illumina 1.8+.
	// Starting in Illumina 1.8, the quality scores have basically
	// returned to the use of the Sanger format (Phred+33)
	Illumina1p8
)

func GuessQualityEncoding ¶

func GuessQualityEncoding(quality []byte) []QualityEncoding

GuessQualityEncoding returns potential quality encodings.

func (QualityEncoding) IsSolexa ¶

func (qe QualityEncoding) IsSolexa() bool

IsSolexa tells whether the encoding is Solexa

func (QualityEncoding) Offset ¶

func (qe QualityEncoding) Offset() int

Offset is the ASCII offset

func (QualityEncoding) QualityRange ¶

func (qe QualityEncoding) QualityRange() []int

QualityRange is the typical quality range

func (QualityEncoding) String ¶

func (qe QualityEncoding) String() string

type Seq ¶

type Seq struct {
	Alphabet  *Alphabet
	Seq       []byte
	Qual      []byte
	QualValue []int
}

Seq represents a FASTA/Q record

func NewSeq ¶

func NewSeq(t *Alphabet, s []byte) (*Seq, error)

NewSeq is constructor for type *Seq*

func NewSeqWithQual ¶

func NewSeqWithQual(t *Alphabet, s []byte, q []byte) (*Seq, error)

NewSeqWithQual is used to store fastq sequence

func NewSeqWithQualWithoutValidation ¶

func NewSeqWithQualWithoutValidation(t *Alphabet, s []byte, q []byte) (*Seq, error)

NewSeqWithQualWithoutValidation create Seq with quality without check the sequences

func NewSeqWithoutValidation ¶

func NewSeqWithoutValidation(t *Alphabet, s []byte) (*Seq, error)

NewSeqWithoutValidation create Seq without check the sequences

func (*Seq) AvgQual ¶

func (seq *Seq) AvgQual(asciiBase int) float64

AvgQual calculates average quality value.

func (*Seq) BaseContent ¶

func (seq *Seq) BaseContent(list string) float64

BaseContent returns base content for given bases. For example:

seq.BaseContent("gc")

func (*Seq) BaseContentCaseSensitive ¶

func (seq *Seq) BaseContentCaseSensitive(list string) float64

BaseContentCaseSensitive returns base content for given case sensitive bases.

func (*Seq) BaseCount ¶

func (seq *Seq) BaseCount(list string) int

BaseCount counts bases

func (*Seq) BaseCountCaseSensitive ¶

func (seq *Seq) BaseCountCaseSensitive(list string) int

BaseCountCaseSensitive counts bases, case is not ignored.

func (*Seq) Bases ¶ added in v0.1.1

func (seq *Seq) Bases(gapLetters string) int

Bases counts non-gap bases

func (*Seq) Clone ¶

func (seq *Seq) Clone() *Seq

Clone of a Seq

func (*Seq) Clone2 ¶

func (seq *Seq) Clone2() *Seq

Clone2 clones the sequence except the alphabet

func (*Seq) Complement ¶

func (seq *Seq) Complement() *Seq

Complement returns complement sequence.

func (*Seq) ComplementInplace ¶

func (seq *Seq) ComplementInplace() *Seq

ComplementInplace returns complement sequence.

func (*Seq) Degenerate2Regexp ¶

func (seq *Seq) Degenerate2Regexp() string

Degenerate2Regexp transforms seqs containing degenrate base to regular expression

func (*Seq) FormatSeq ¶

func (seq *Seq) FormatSeq(width int) []byte

FormatSeq wrap seq

func (*Seq) GC ¶

func (seq *Seq) GC() float64

GC returns the GC content

func (*Seq) Length ¶

func (seq *Seq) Length() int

Length returns the length of sequence

func (*Seq) ParseQual ¶

func (seq *Seq) ParseQual(asciiBase int)

ParseQual parses sequence quality, asciiBase = 33 for Phred+33.

func (*Seq) RemoveGaps ¶

func (seq *Seq) RemoveGaps(letters string) *Seq

RemoveGaps return a new seq without gaps

func (*Seq) RemoveGapsInplace ¶

func (seq *Seq) RemoveGapsInplace(letters string) *Seq

RemoveGapsInplace removes gaps in place

func (*Seq) RevCom ¶

func (seq *Seq) RevCom() *Seq

RevCom returns reverse complement sequence

func (*Seq) RevComInplace ¶

func (seq *Seq) RevComInplace() *Seq

RevComInplace reverses complement sequence in place

func (*Seq) Reverse ¶

func (seq *Seq) Reverse() *Seq

Reverse a sequence

func (*Seq) ReverseInplace ¶

func (seq *Seq) ReverseInplace() *Seq

ReverseInplace reverses the sequence content

func (*Seq) Slider ¶

func (seq *Seq) Slider(window int, step int, circular bool, greedy bool) func() (*Seq, bool)

Slider returns a function for sliding the sequence. Circular is for circular genome, and it overides greedy. If not circular and greedy is true, last fragment shorter than window will be returned.

func (*Seq) String ¶

func (seq *Seq) String() string

func (*Seq) SubSeq ¶

func (seq *Seq) SubSeq(start int, end int) *Seq

SubSeq returns a sub seq. start and end is 1-based.

Examples:

1-based index    1 2 3 4 5 6 7 8 9 10

negative index 0-9-8-7-6-5-4-3-2-1

           seq    A C G T N a c g t n
           1:1    A
           2:4      C G T
         -4:-2                c g t
         -4:-1                c g t n
         -1:-1                      n
          2:-2      C G T N a c g t
          1:-1    A C G T N a c g t n
		  1:12    A C G T N a c g t n
		-12:-1    A C G T N a c g t n

func (*Seq) SubSeqInplace ¶

func (seq *Seq) SubSeqInplace(start int, end int) *Seq

SubSeqInplace return subseq inplace

func (*Seq) Translate ¶

func (seq *Seq) Translate(transl_table int, frame int, trim bool, clean bool, allowUnknownCodon bool, markInitCodonAsM bool) (*Seq, error)

Translate translates the RNA/DNA to amino acid sequence. Available frame: 1, 2, 3, -1, -2 ,-3. If option trim is true, it removes all 'X' and '*' characters from the right end of the translation. If option clean is true, it changes all STOP codon positions from the '*' character to 'X' (an unknown residue). If option allowUnknownCodon is true, codons not in the codon table will be translated to 'X'. If option markInitCodonAsM is true, initial codon at beginning will be represented as 'M'.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL