Documentation ¶
Overview ¶
Package poly is a go package for engineering organisms.
Poly can be used in two ways.
- As a Go library where you have finer control and can make magical things happen.
- As a command line utility where you can bash script your way to greatness and make DNA go brrrrrrrr.
Installation ¶
These instructions assume that you already have a working go environment. If not see:
https://golang.org/doc/install
Building Poly CLI and package from scratch:
git clone https://github.com/TimothyStiles/poly.git && cd poly && go build ./... && go install ./...
Installing latest release of poly as a go package:
go get github.com/TimothyStiles/poly
For CLI only instructions please checkout: https://pkg.go.dev/github.com/TimothyStiles/poly/poly
Index ¶
- func AllVariantsIUPAC(seq string) ([]string, error)
- func BuildGbk(sequence Sequence) []byte
- func BuildGff(sequence Sequence) []byte
- func ComplementBase(basePair rune) rune
- func CreateBarcodes(length int, maxSubSequence int) []string
- func CreateBarcodesWithBannedSequences(length int, maxSubSequence int, bannedSequences []string, ...) []string
- func FindBsaI(sequence string, c chan DnaSuggestion, wg *sync.WaitGroup)
- func FindTypeIIS(sequence string, c chan DnaSuggestion, wg *sync.WaitGroup)
- func FixCds(sqlitePath string, sequence string, codontable CodonTable, ...) (string, error)
- func Hash(sequence string, sequenceType string, circular bool, doubleStranded bool) (string, error)
- func MarmurDoty(sequence string) float64
- func MeltingTemp(sequence string) float64
- func NucleobaseDeBruijnSequence(substringLength int) string
- func Optimize(aminoAcids string, codonTable CodonTable) (string, error)
- func ParseFASTAGz(r io.Reader, sequences chan<- Fasta)
- func RandomProteinSequence(length int, seed int64) (string, error)
- func ReadFASTAConcurrent(path string, sequences chan<- Fasta)
- func ReadFASTAGz(path string, sequences chan<- Fasta)
- func ReverseComplement(sequence string) string
- func RotateSequence(sequence string) string
- func SantaLucia(sequence string, ...) (meltingTemp, dH, dS float64)
- func Translate(sequence string, codonTable CodonTable) (string, error)
- func WriteCodonJSON(codontable CodonTable, path string)
- func WriteGbk(sequence Sequence, path string)
- func WriteGff(sequence Sequence, path string)
- func WriteJSON(sequence Sequence, path string)
- type AminoAcid
- type CloneSequence
- type Codon
- type CodonTable
- func AddCodonTable(firstCodonTable CodonTable, secondCodonTable CodonTable) CodonTable
- func CompromiseCodonTable(firstCodonTable CodonTable, secondCodonTable CodonTable, cutOff float64) (CodonTable, error)
- func GetCodonTable(index int) CodonTable
- func ParseCodonJSON(file []byte) CodonTable
- func ReadCodonJSON(path string) CodonTable
- type DnaSuggestion
- type Enzyme
- type Fasta
- type Feature
- type Fragment
- type Location
- type Locus
- type Meta
- type Overhang
- type Reference
- type Sequence
- func ParseGbk(file []byte) Sequence
- func ParseGbkFlat(file []byte) []Sequence
- func ParseGbkMulti(file []byte) []Sequence
- func ParseGff(file []byte) Sequence
- func ParseJSON(file []byte) Sequence
- func ReadGbk(path string) Sequence
- func ReadGbkFlat(path string) []Sequence
- func ReadGbkFlatGz(path string) []Sequence
- func ReadGbkMulti(path string) []Sequence
- func ReadGff(path string) Sequence
- func ReadJSON(path string) Sequence
Examples ¶
- AddCodonTable
- AllVariantsIUPAC
- BuildGbk
- BuildGff
- CompromiseCodonTable
- CreateBarcodes
- CreateBarcodesWithBannedSequences
- FixCds
- GoldenGate
- Hash
- MarmurDoty
- MeltingTemp
- NucleobaseDeBruijnSequence
- Optimize
- ParseCodonJSON
- ParseGbk
- ParseGbkFlat
- ParseGbkMulti
- ParseGff
- ParseJSON
- RandomProteinSequence
- ReadCodonJSON
- ReadFASTAConcurrent
- ReadFASTAGz
- ReadGbk
- ReadGbkFlat
- ReadGbkFlatGz
- ReadGbkMulti
- ReadGff
- ReadJSON
- RotateSequence
- SantaLucia
- Sequence.Hash
- Translate
- WriteCodonJSON
- WriteGbk
- WriteGff
- WriteJSON
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AllVariantsIUPAC ¶ added in v0.12.0
AllVariantsIUPAC takes a string as input and returns all iupac variants as output
Example ¶
// AllVariantsIUPAC takes a string as input // and returns all iupac variants as output mendelIUPAC := "ATGGARAAYGAYGARCTN" // ambiguous IUPAC codes for most of the sequences that code for the protein MENDEL mendelIUPACvariants, _ := AllVariantsIUPAC(mendelIUPAC) fmt.Println(mendelIUPACvariants)
Output: [ATGGAGAATGATGAGCTG ATGGAGAATGATGAGCTA ATGGAGAATGATGAGCTT ATGGAGAATGATGAGCTC ATGGAGAATGATGAACTG ATGGAGAATGATGAACTA ATGGAGAATGATGAACTT ATGGAGAATGATGAACTC ATGGAGAATGACGAGCTG ATGGAGAATGACGAGCTA ATGGAGAATGACGAGCTT ATGGAGAATGACGAGCTC ATGGAGAATGACGAACTG ATGGAGAATGACGAACTA ATGGAGAATGACGAACTT ATGGAGAATGACGAACTC ATGGAGAACGATGAGCTG ATGGAGAACGATGAGCTA ATGGAGAACGATGAGCTT ATGGAGAACGATGAGCTC ATGGAGAACGATGAACTG ATGGAGAACGATGAACTA ATGGAGAACGATGAACTT ATGGAGAACGATGAACTC ATGGAGAACGACGAGCTG ATGGAGAACGACGAGCTA ATGGAGAACGACGAGCTT ATGGAGAACGACGAGCTC ATGGAGAACGACGAACTG ATGGAGAACGACGAACTA ATGGAGAACGACGAACTT ATGGAGAACGACGAACTC ATGGAAAATGATGAGCTG ATGGAAAATGATGAGCTA ATGGAAAATGATGAGCTT ATGGAAAATGATGAGCTC ATGGAAAATGATGAACTG ATGGAAAATGATGAACTA ATGGAAAATGATGAACTT ATGGAAAATGATGAACTC ATGGAAAATGACGAGCTG ATGGAAAATGACGAGCTA ATGGAAAATGACGAGCTT ATGGAAAATGACGAGCTC ATGGAAAATGACGAACTG ATGGAAAATGACGAACTA ATGGAAAATGACGAACTT ATGGAAAATGACGAACTC ATGGAAAACGATGAGCTG ATGGAAAACGATGAGCTA ATGGAAAACGATGAGCTT ATGGAAAACGATGAGCTC ATGGAAAACGATGAACTG ATGGAAAACGATGAACTA ATGGAAAACGATGAACTT ATGGAAAACGATGAACTC ATGGAAAACGACGAGCTG ATGGAAAACGACGAGCTA ATGGAAAACGACGAGCTT ATGGAAAACGACGAGCTC ATGGAAAACGACGAACTG ATGGAAAACGACGAACTA ATGGAAAACGACGAACTT ATGGAAAACGACGAACTC]
func BuildGbk ¶
BuildGbk builds a GBK string to be written out to db or file.
Example ¶
sequence := ReadGbk("data/puc19.gbk") gbkBytes := BuildGbk(sequence) testSequence := ParseGbk(gbkBytes) fmt.Println(testSequence.Meta.Locus.ModificationDate)
Output: 22-OCT-2019
func BuildGff ¶
BuildGff takes an Annotated sequence and returns a byte array representing a gff to be written out.
Example ¶
sequence := ReadGff("data/ecoli-mg1655-short.gff") gffBytes := BuildGff(sequence) reparsedSequence := ParseGff(gffBytes) fmt.Println(reparsedSequence.Meta.Name)
Output: U00096.3
func ComplementBase ¶
ComplementBase accepts a base pair and returns its complement base pair
func CreateBarcodes ¶ added in v0.12.0
CreateBarcodes is a simplified version of CreateBarcodesWithBannedSequences with sane defaults.
Example ¶
barcodes := CreateBarcodes(20, 4) fmt.Println(barcodes[0])
Output: AAAATAAAGAAACAATTAAT
func CreateBarcodesWithBannedSequences ¶ added in v0.12.0
func CreateBarcodesWithBannedSequences(length int, maxSubSequence int, bannedSequences []string, bannedFunctions []func(string) bool) []string
CreateBarcodesWithBannedSequences creates a list of barcodes given a desired barcode length, the maxSubSequence shared in each barcode, Sequences may be marked as banned by passing a static list, `bannedSequences`, or, if more flexibility is needed, through a list of `bannedFunctions` that dynamically generates bannedSequences. If a sequence is banned, it will not appear within a barcode. The a `bannedFunctions` function can determine if a barcode should be banned or not on the fly. If it is banned, we will continuing iterating until a barcode is found that satisfies the bannedFunction requirement.
Example ¶
barcodes := CreateBarcodesWithBannedSequences(20, 4, []string{"CTCTCGGTCGCTCC"}, []func(string) bool{}) fmt.Println(barcodes[0])
Output: AAAATAAAGAAACAATTAAT
func FindBsaI ¶ added in v0.11.2
func FindBsaI(sequence string, c chan DnaSuggestion, wg *sync.WaitGroup)
FindBsaI is a simple problematicSequenceFunc, for use in testing
func FindTypeIIS ¶ added in v0.11.2
func FindTypeIIS(sequence string, c chan DnaSuggestion, wg *sync.WaitGroup)
FindTypeIIS is a problematicSequenceFunc used for finding TypeIIS restriction enzymes. It finds BbsI, BsaI, BtgZI, BsmBI, SapI, and PaqCI(AarI)
func FixCds ¶ added in v0.11.2
func FixCds(sqlitePath string, sequence string, codontable CodonTable, problematicSequenceFuncs []func(string, chan DnaSuggestion, *sync.WaitGroup)) (string, error)
FixCds fixes a CDS given the CDS sequence, a codon table, and a list of functions to solve for.
Example ¶
bla := "ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAA" sequence := ReadGbk("data/ecoli-mg1655.gff") codonTable := GetCodonTable(11) optimizationTable := sequence.GetOptimizationTable(codonTable) var functions []func(string, chan DnaSuggestion, *sync.WaitGroup) //functions = append(functions, FindBsaI) functions = append(functions, FindTypeIIS) fixedSeq, _ := FixCds(":memory:", bla, optimizationTable, functions) fmt.Println(fixedSeq)
Output: ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGATCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGG
func Hash ¶ added in v0.11.3
Hash is a function to create Seqhashes, a specific kind of identifier.
Example ¶
sequence := ReadGbk("data/puc19.gbk") seqhash, _ := Hash(sequence.Sequence, "DNA", true, true) fmt.Println(seqhash)
Output: v1_DCD_4b0616d1b3fc632e42d78521deb38b44fba95cca9fde159e01cd567fa996ceb9
func MarmurDoty ¶
MarmurDoty calculates the melting point of an extremely short DNA sequence (<15 bp) using a modified Marmur Doty formula [Marmur J & Doty P (1962). Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature. J Mol Biol, 5, 109-118.]
Example ¶
sequenceString := "ACGTCCGGACTT" meltingTemp := MarmurDoty(sequenceString) fmt.Println(meltingTemp)
Output: 31
func MeltingTemp ¶
TODO make custom function for phusion according to https://tmcalculator.neb.com/#!/help MeltingTemp calls SantaLucia with default inputs for primer and salt concentration.
Example ¶
sequenceString := "GTAAAACGACGGCCAGT" // M13 fwd expectedTM := 52.8 meltingTemp := MeltingTemp(sequenceString) withinMargin := math.Abs(expectedTM-meltingTemp)/expectedTM >= 0.02 fmt.Println(withinMargin)
Output: false
func NucleobaseDeBruijnSequence ¶ added in v0.12.0
NucleobaseDeBruijnSequence generates a DNA DeBruijn sequence with alphabet ATGC. DeBruijn sequences are basically a string with all unique substrings of an alphabet represented exactly once. Code is adapted from https://rosettacode.org/wiki/De_Bruijn_sequences#Go
Example ¶
a := NucleobaseDeBruijnSequence(4) fmt.Println(a)
Output: AAAATAAAGAAACAATTAATGAATCAAGTAAGGAAGCAACTAACGAACCATATAGATACATTTATTGATTCATGTATGGATGCATCTATCGATCCAGAGACAGTTAGTGAGTCAGGTAGGGAGGCAGCTAGCGAGCCACACTTACTGACTCACGTACGGACGCACCTACCGACCCTTTTGTTTCTTGGTTGCTTCGTTCCTGTGTCTGGGTGGCTGCGTGCCTCTCGGTCGCTCCGTCCCGGGGCGGCCGCGCCCCAAA
func Optimize ¶
func Optimize(aminoAcids string, codonTable CodonTable) (string, error)
Optimize takes an amino acid sequence and CodonTable and returns an optimized codon sequence
Example ¶
gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" sequence := ReadGbk("data/puc19.gbk") codonTable := GetCodonTable(11) optimizationTable := sequence.GetOptimizationTable(codonTable) optimizedSequence, _ := Optimize(gfpTranslation, optimizationTable) optimizedSequenceTranslation, _ := Translate(optimizedSequence, optimizationTable) fmt.Println(optimizedSequenceTranslation == gfpTranslation)
Output: true
func ParseFASTAGz ¶ added in v0.9.0
func RandomProteinSequence ¶ added in v0.11.13
RandomProteinSequence returns a random protein sequence as a string that have size length, starts with aminoacid M (Methionine) and finishes with * (stop codon). The random generator uses the seed provided as parameter.
Example ¶
// RandomProteinSequence builds a Protein Sequence by only passing through arguments a length and a seed that will be use to generate a randomly the sequence. The length needs to be greater than two because every sequence already have a start and stop codon. Seed makes this test deterministic. randomProtein, _ := RandomProteinSequence(15, 2) fmt.Println(randomProtein)
Output: MHHPAFRMFNTMYG*
func ReadFASTAConcurrent ¶ added in v0.9.5
Example ¶
fastas := make(chan Fasta, 1000) go ReadFASTAConcurrent("data/smallfasta.fasta", fastas) var name string for fasta := range fastas { name = fasta.Name } fmt.Println(name)
Output: camR-2|AGAC,AGGT
func ReadFASTAGz ¶ added in v0.9.0
Example ¶
fastas := make(chan Fasta, 1000) go ReadFASTAGz("data/uniprot_1mb_test.fasta.gz", fastas) var name string for fasta := range fastas { name = fasta.Name } fmt.Println(name)
Output: sp|P86857|AGP_MYTCA Alanine and glycine-rich protein (Fragment) OS=Mytilus californianus OX=6549 PE=1 SV=1
func ReverseComplement ¶
ReverseComplement takes the reverse complement of a sequence
func RotateSequence ¶
RotateSequence rotates circular sequences to deterministic point.
Example ¶
sequence := ReadGbk("data/puc19.gbk") sequenceLength := len(sequence.Sequence) testSequence := sequence.Sequence[sequenceLength/2:] + sequence.Sequence[0:sequenceLength/2] fmt.Println(RotateSequence(sequence.Sequence) == RotateSequence(testSequence))
Output: true
func SantaLucia ¶
func SantaLucia(sequence string, primerConcentration, saltConcentration, magnesiumConcentration float64) (meltingTemp, dH, dS float64)
SantaLucia calculates the melting point of a short DNA sequence (15-200 bp), using the Nearest Neighbors method [SantaLucia, J. (1998) PNAS, doi:10.1073/pnas.95.4.1460]
Example ¶
sequenceString := "ACGATGGCAGTAGCATGC" //"GTAAAACGACGGCCAGT" // M13 fwd testCPrimer := 0.1e-6 // primer concentration testCNa := 350e-3 // salt concentration testCMg := 0.0 // magnesium concentration expectedTM := 62.7 // roughly what we're expecting with a margin of error meltingTemp, _, _ := SantaLucia(sequenceString, testCPrimer, testCNa, testCMg) withinMargin := math.Abs(expectedTM-meltingTemp)/expectedTM >= 0.02 // checking margin of error fmt.Println(withinMargin)
Output: false
func Translate ¶
func Translate(sequence string, codonTable CodonTable) (string, error)
Translate translates a codon sequence to an amino acid sequence
Example ¶
gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" gfpDnaSequence := "ATGGCTAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA" testTranslation, _ := Translate(gfpDnaSequence, GetCodonTable(11)) // need to specify which codons map to which amino acids per NCBI table fmt.Println(gfpTranslation == testTranslation)
Output: true
func WriteCodonJSON ¶
func WriteCodonJSON(codontable CodonTable, path string)
WriteCodonJSON writes a CodonTable struct out to JSON.
Example ¶
codontable := ReadCodonJSON("data/bsub_codon_test.json") WriteCodonJSON(codontable, "data/codon_test.json") testCodonTable := ReadCodonJSON("data/codon_test.json") // cleaning up test data os.Remove("data/codon_test.json") fmt.Println(testCodonTable.AminoAcids[0].Codons[0].Weight)
Output: 28327
func WriteGbk ¶
WriteGbk takes an Sequence struct and a path string and writes out a gff to that path.
Example ¶
tmpDataDir, err := ioutil.TempDir("", "data-*") if err != nil { fmt.Println(err.Error()) } defer os.RemoveAll(tmpDataDir) sequence := ReadGbk("data/puc19.gbk") tmpGbkFilePath := filepath.Join(tmpDataDir, "puc19.gbk") WriteGbk(sequence, tmpGbkFilePath) testSequence := ReadGbk(tmpGbkFilePath) fmt.Println(testSequence.Meta.Locus.ModificationDate)
Output: 22-OCT-2019
func WriteGff ¶
WriteGff takes an Sequence struct and a path string and writes out a gff to that path.
Example ¶
tmpDataDir, err := ioutil.TempDir("", "data-*") if err != nil { fmt.Println(err.Error()) } defer os.RemoveAll(tmpDataDir) sequence := ReadGff("data/ecoli-mg1655-short.gff") tmpGffFilePath := filepath.Join(tmpDataDir, "ecoli-mg1655-short.gff") WriteGff(sequence, tmpGffFilePath) testSequence := ReadGff(tmpGffFilePath) fmt.Println(testSequence.Meta.Name)
Output: U00096.3
func WriteJSON ¶
WriteJSON writes an Sequence struct out to json.
Example ¶
tmpDataDir, err := ioutil.TempDir("", "data-*") if err != nil { fmt.Println(err.Error()) } defer os.RemoveAll(tmpDataDir) sequence := ReadJSON("data/sample.json") tmpJSONFilePath := filepath.Join(tmpDataDir, "sample.json") WriteJSON(sequence, tmpJSONFilePath) testSequence := ReadJSON(tmpJSONFilePath) fmt.Println(testSequence.Meta.Source)
Output: Saccharomyces cerevisiae (baker's yeast)
Types ¶
type CloneSequence ¶ added in v0.11.0
func GoldenGate ¶ added in v0.11.0
func GoldenGate(sequences []CloneSequence, enzymeStr string) ([]CloneSequence, error)
Example ¶
// Fragment 1 has a palindrome at the end fragment1 := CloneSequence{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} fragment2 := CloneSequence{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} popen := CloneSequence{"TAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGTAgtcttcGCcatcgCtACTAAAagccagataacagtatgcgtatttgcgcgctgatttttgcggtataagaatatatactgatatgtatacccgaagtatgtcaaaaagaggtatgctatgaagcagcgtattacagtgacagttgacagcgacagctatcagttgctcaaggcatatatgatgtcaatatctccggtctggtaagcacaaccatgcagaatgaagcccgtcgtctgcgtgccgaacgctggaaagcggaaaatcaggaagggatggctgaggtcgcccggtttattgaaatgaacggctcttttgctgacgagaacagggGCTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCCTTATACACAGgcgatgttgaagaccaCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGG", true} Clones, _ := GoldenGate([]CloneSequence{fragment1, fragment2, popen}, "BbsI") fmt.Println(Clones[0].Sequence)
Output: AAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAG
func Ligate ¶ added in v0.11.0
func Ligate(fragments []Fragment, maxClones int) []CloneSequence
type Codon ¶
type Codon struct { Triplet string `json:"triplet"` Weight int `json:"weight"` // needs to be set to 1 for random chooser }
Codon holds information for a codon triplet in a struct
type CodonTable ¶
type CodonTable struct { StartCodons []string `json:"start_codons"` StopCodons []string `json:"stop_codons"` AminoAcids []AminoAcid `json:"amino_acids"` }
CodonTable holds information for a codon table.
func AddCodonTable ¶ added in v0.11.2
func AddCodonTable(firstCodonTable CodonTable, secondCodonTable CodonTable) CodonTable
AddCodonTable takes 2 CodonTables and adds them together to create a new CodonTable.
Example ¶
sequence := ReadGbk("data/puc19.gbk") codonTable := GetCodonTable(11) optimizationTable := sequence.GetOptimizationTable(codonTable) sequence2 := ReadGbk("data/phix174.gb") codonTable2 := GetCodonTable(11) optimizationTable2 := sequence2.GetOptimizationTable(codonTable2) finalTable := AddCodonTable(optimizationTable, optimizationTable2) for _, aa := range finalTable.AminoAcids { for _, codon := range aa.Codons { if codon.Triplet == "GGC" { fmt.Println(codon.Weight) } } }
Output: 90
func CompromiseCodonTable ¶ added in v0.11.2
func CompromiseCodonTable(firstCodonTable CodonTable, secondCodonTable CodonTable, cutOff float64) (CodonTable, error)
CompromiseCodonTable takes 2 CodonTables and makes a new CodonTable that is an equal compromise between the two tables.
Example ¶
sequence := ReadGbk("data/puc19.gbk") codonTable := GetCodonTable(11) optimizationTable := sequence.GetOptimizationTable(codonTable) sequence2 := ReadGbk("data/phix174.gb") codonTable2 := GetCodonTable(11) optimizationTable2 := sequence2.GetOptimizationTable(codonTable2) finalTable, _ := CompromiseCodonTable(optimizationTable, optimizationTable2, 0.1) for _, aa := range finalTable.AminoAcids { for _, codon := range aa.Codons { if codon.Triplet == "TAA" { fmt.Println(codon.Weight) } } }
Output: 2727
func GetCodonTable ¶
func GetCodonTable(index int) CodonTable
GetCodonTable takes the index of desired NCBI codon table and returns it.
func ParseCodonJSON ¶
func ParseCodonJSON(file []byte) CodonTable
ParseCodonJSON parses a CodonTable JSON file.
Example ¶
file, _ := ioutil.ReadFile("data/bsub_codon_test.json") codontable := ParseCodonJSON(file) fmt.Println(codontable.AminoAcids[0].Codons[0].Weight)
Output: 28327
func ReadCodonJSON ¶
func ReadCodonJSON(path string) CodonTable
ReadCodonJSON reads a CodonTable JSON file.
Example ¶
codontable := ReadCodonJSON("data/bsub_codon_test.json") fmt.Println(codontable.AminoAcids[0].Codons[0].Weight)
Output: 28327
func (CodonTable) OptimizeTable ¶
func (codonTable CodonTable) OptimizeTable(sequence string) CodonTable
OptimizeTable weights each codon in a codon table according to input string codon frequency. This function actually mutates the CodonTable struct itself.
type DnaSuggestion ¶ added in v0.11.2
type DnaSuggestion struct { Start int `db:"start"` End int `db:"end"` Bias string `db:"gcbias"` QuantityFixes int `db:"quantityfixes"` SuggestionType string `db:"suggestiontype"` Step int `db:"step"` Id int `db:"id"` }
DnaSuggestion is a suggestion of a fixer, generated by a problematicSequenceFunc.
type Feature ¶
type Feature struct { Name string //Seqid in gff, name in gbk //gff specific Source string `json:"source"` Type string `json:"type"` Score string `json:"score"` Strand string `json:"strand"` Phase string `json:"phase"` Attributes map[string]string `json:"attributes"` GbkLocationString string `json:"gbk_location_string"` Sequence string `json:"sequence"` SequenceLocation Location `json:"sequence_location"` SequenceHash string `json:"sequence_hash"` Description string `json:"description"` SequenceHashFunction string `json:"hash_function"` ParentSequence *Sequence `json:"-"` }
Feature holds a single annotation in a struct. from https://github.com/blachlylab/gff3/blob/master/gff3.go
func (Feature) GetSequence ¶
GetSequence is a method wrapper to get a Feature's sequence. Mutates with Sequence.
type Fragment ¶ added in v0.11.0
func RestrictionEnzymeCut ¶ added in v0.11.0
func RestrictionEnzymeCut(seq CloneSequence, enzymeStr string) ([]Fragment, error)
func RestrictionEnzymeCutEnzymeStruct ¶ added in v0.11.0
func RestrictionEnzymeCutEnzymeStruct(seq CloneSequence, enzyme Enzyme) []Fragment
type Location ¶
type Location struct { Start int `json:"start"` End int `json:"end"` Complement bool `json:"complement"` Join bool `json:"join"` FivePrimePartial bool `json:"five_prime_partial"` ThreePrimePartial bool `json:"three_prime_partial"` SubLocations []Location `json:"sub_locations"` }
Location holds nested location info for sequence region.
type Locus ¶
type Locus struct { Name string `json:"name"` SequenceLength string `json:"sequence_length"` MoleculeType string `json:"molecule_type"` GenbankDivision string `json:"genbank_division"` ModificationDate string `json:"modification_date"` SequenceCoding string `json:"sequence_coding"` Circular bool `json:"circular"` Linear bool `json:"linear"` }
Locus holds Locus information in a Meta struct.
type Meta ¶
type Meta struct { Name string `json:"name"` GffVersion string `json:"gff_version"` RegionStart int `json:"region_start"` RegionEnd int `json:"region_end"` Size int `json:"size"` Type string `json:"type"` Date string `json:"date"` Definition string `json:"definition"` Accession string `json:"accession"` Version string `json:"version"` Keywords string `json:"keywords"` Organism string `json:"organism"` Source string `json:"source"` Origin string `json:"origin"` Locus Locus `json:"locus"` References []Reference `json:"references"` Other map[string]string `json:"other"` }
Meta Holds all the meta information of an Sequence struct.
type Reference ¶
type Reference struct { Index string `json:"index"` Authors string `json:"authors"` Title string `json:"title"` Journal string `json:"journal"` PubMed string `json:"pub_med"` Remark string `json:"remark"` Range string `json:"range"` }
Reference holds information one reference in a Meta struct.
type Sequence ¶
type Sequence struct { Meta Meta `json:"meta"` Description string `json:"description"` SequenceHash string `json:"sequence_hash"` SequenceHashFunction string `json:"hash_function"` Sequence string `json:"sequence"` Features []Feature `json:"features"` }
Sequence holds all sequence information in a single struct.
func ParseGbk ¶
ParseGbk takes in a string representing a gbk/gb/genbank file and parses it into an Sequence object.
Example ¶
file, _ := ioutil.ReadFile("data/puc19.gbk") sequence := ParseGbk(file) fmt.Println(sequence.Meta.Locus.ModificationDate)
Output: 22-OCT-2019
func ParseGbkFlat ¶
ParseGbkFlat specifically takes the output of a Genbank Flat file that from the genbank ftp dumps. These files have 10 line headers, which are entirely removed
Example ¶
file, _ := ioutil.ReadFile("data/flatGbk_test.seq") sequences := ParseGbkFlat(file) var locus []string for _, sequence := range sequences { locus = append(locus, sequence.Meta.Locus.Name) } fmt.Println(strings.Join(locus, ", "))
Output: AB000100, AB000106
func ParseGbkMulti ¶
ParseGbkMulti parses multiple Genbank files in a byte array to multiple sequences
Example ¶
file, _ := ioutil.ReadFile("data/multiGbk_test.seq") sequences := ParseGbkMulti(file) var locus []string for _, sequence := range sequences { locus = append(locus, sequence.Meta.Locus.Name) } fmt.Println(strings.Join(locus, ", "))
Output: AB000100, AB000106
func ParseGff ¶
ParseGff Takes in a string representing a gffv3 file and parses it into an Sequence object.
Example ¶
file, _ := ioutil.ReadFile("data/ecoli-mg1655-short.gff") sequence := ParseGff(file) fmt.Println(sequence.Meta.Name)
Output: U00096.3
func ParseJSON ¶
ParseJSON parses an Sequence JSON file and adds appropriate pointers to struct.
Example ¶
file, _ := ioutil.ReadFile("data/sample.json") sequence := ParseJSON(file) fmt.Println(sequence.Meta.Source)
Output: Saccharomyces cerevisiae (baker's yeast)
func ReadGbk ¶
ReadGbk reads a Gbk from path and parses into an Annotated sequence struct.
Example ¶
sequence := ReadGbk("data/puc19.gbk") fmt.Println(sequence.Meta.Locus.ModificationDate)
Output: 22-OCT-2019
func ReadGbkFlat ¶
ReadGbkFlat reads flat genbank files, like the ones provided by the NCBI FTP server (after decompression)
Example ¶
sequences := ReadGbkFlat("data/long_comment.seq") var locus []string for _, sequence := range sequences { locus = append(locus, sequence.Meta.Locus.Name) } fmt.Println(strings.Join(locus, ", "))
Output: AB000100, AB000106
func ReadGbkFlatGz ¶
ReadGbkFlatGz reads flat gzip'd genbank files, like the ones provided by the NCBI FTP server
Example ¶
sequences := ReadGbkFlatGz("data/flatGbk_test.seq.gz") //sequences := ReadGbkFlatGz("data/gbbct358.seq.gz") var locus []string for _, sequence := range sequences { locus = append(locus, sequence.Meta.Locus.Name) } fmt.Println(strings.Join(locus, ", "))
Output: AB000100, AB000106
func ReadGbkMulti ¶
ReadGbkMulti reads multiple genbank files from a single file
Example ¶
sequences := ReadGbkMulti("data/multiGbk_test.seq") var locus []string for _, sequence := range sequences { locus = append(locus, sequence.Meta.Locus.Name) } fmt.Println(strings.Join(locus, ", "))
Output: AB000100, AB000106
func ReadGff ¶
ReadGff takes in a filepath for a .gffv3 file and parses it into an Annotated Sequence struct.
Example ¶
sequence := ReadGff("data/ecoli-mg1655-short.gff") fmt.Println(sequence.Meta.Name)
Output: U00096.3
func ReadJSON ¶
ReadJSON reads an Sequence JSON file.
Example ¶
sequence := ReadJSON("data/sample.json") fmt.Println(sequence.Meta.Source)
Output: Saccharomyces cerevisiae (baker's yeast)
func (*Sequence) AddFeature ¶
AddFeature is the canonical way to add a Feature into a Sequence struct. Appending a Feature struct directly to Sequence.Feature's will break .GetSequence() method.
func (Sequence) GetOptimizationTable ¶
func (sequence Sequence) GetOptimizationTable(codonTable CodonTable) CodonTable
GetOptimizationTable is a Sequence method that takes a CodonTable and weights it to be used to optimize inserts.
func (Sequence) GetSequence ¶
GetSequence is a method to get the full sequence of an annotated sequence
func (Sequence) Hash ¶
Hash is a method wrapper for hashing Sequence structs. Note that all sequence structs are, by default, double-stranded sequences, since Genbank does not track whether or not a given sequence in their database is single stranded or double stranded.
Example ¶
sequence := ReadGbk("data/puc19.gbk") // Seqhash assumes doubleStranded sequence and defaults to linear // if sequence.Meta.Locus.Circular is not set seqhash, _ := sequence.Hash() fmt.Println(seqhash)
Output: v1_DCD_4b0616d1b3fc632e42d78521deb38b44fba95cca9fde159e01cd567fa996ceb9
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
parsers
|
|
Poly command line utility installation instructions: Mac OSX brew install timothystiles/poly/poly Linux - deb/rpm Download the .deb or .rpm from the releases page https://github.com/TimothyStiles/poly/releases and install with `dpkg -i` and `rpm -i` respectively Windows Coming soon...
|
Poly command line utility installation instructions: Mac OSX brew install timothystiles/poly/poly Linux - deb/rpm Download the .deb or .rpm from the releases page https://github.com/TimothyStiles/poly/releases and install with `dpkg -i` and `rpm -i` respectively Windows Coming soon... |