poly

package module
v0.12.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2021 License: MIT Imports: 23 Imported by: 2

README

(Poly)merase

PkgGoDev GitHub license Tests Test Coverage

Poly is a Go package for engineering organisms.

  • Fast: Poly is fast and scalable.

  • Modern: Poly tackles issues that other libraries and utilities just don't. From general codon optimization and primer design to circular sequence hashing. All written in a language that was designed to be fast, scalable, and easy to develop in and maintain. Did we say it was fast?

  • Reproducible: Poly is well tested and designed to be used in industrial, academic, and hobbyist settings. No more copy and pasting strings into random websites to process the data you need.

  • Ambitious: Poly's goal is to be the most complete, open, and well used collection of computational synthetic biology tools ever assembled. If you like our dream and want to support us please star this repo, request a feature, open a pull request, or sponsor the project.

Documentation

Community

  • Discord: Chat about Poly and join us for game nights on our discord server!

Contributing

  • Code of conduct: Please read the full text so you can understand what we're all about and remember to be excellent to each other!

  • Contributor's guide: Please read through it before you start hacking away and pushing contributions to this fine codebase.

Sponsor

  • Sponsor: 🤘 Thanks for your support 🤘

License

  • MIT

  • Copyright (c) 2021 Timothy Stiles

Documentation

Overview

Package poly is a go package for engineering organisms.

Poly can be used in two ways.

  1. As a Go library where you have finer control and can make magical things happen.
  2. As a command line utility where you can bash script your way to greatness and make DNA go brrrrrrrr.

Installation

These instructions assume that you already have a working go environment. If not see:

https://golang.org/doc/install

Building Poly CLI and package from scratch:

git clone https://github.com/TimothyStiles/poly.git && cd poly && go build ./... && go install ./...

Installing latest release of poly as a go package:

go get github.com/TimothyStiles/poly

For CLI only instructions please checkout: https://pkg.go.dev/github.com/TimothyStiles/poly/poly

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func AllVariantsIUPAC added in v0.12.0

func AllVariantsIUPAC(seq string) ([]string, error)

AllVariantsIUPAC takes a string as input and returns all iupac variants as output

Example
// AllVariantsIUPAC takes a string as input
// and returns all iupac variants as output
mendelIUPAC := "ATGGARAAYGAYGARCTN"
// ambiguous IUPAC codes for most of the sequences that code for the protein MENDEL
mendelIUPACvariants, _ := AllVariantsIUPAC(mendelIUPAC)
fmt.Println(mendelIUPACvariants)
Output:

[ATGGAGAATGATGAGCTG ATGGAGAATGATGAGCTA ATGGAGAATGATGAGCTT ATGGAGAATGATGAGCTC ATGGAGAATGATGAACTG ATGGAGAATGATGAACTA ATGGAGAATGATGAACTT ATGGAGAATGATGAACTC ATGGAGAATGACGAGCTG ATGGAGAATGACGAGCTA ATGGAGAATGACGAGCTT ATGGAGAATGACGAGCTC ATGGAGAATGACGAACTG ATGGAGAATGACGAACTA ATGGAGAATGACGAACTT ATGGAGAATGACGAACTC ATGGAGAACGATGAGCTG ATGGAGAACGATGAGCTA ATGGAGAACGATGAGCTT ATGGAGAACGATGAGCTC ATGGAGAACGATGAACTG ATGGAGAACGATGAACTA ATGGAGAACGATGAACTT ATGGAGAACGATGAACTC ATGGAGAACGACGAGCTG ATGGAGAACGACGAGCTA ATGGAGAACGACGAGCTT ATGGAGAACGACGAGCTC ATGGAGAACGACGAACTG ATGGAGAACGACGAACTA ATGGAGAACGACGAACTT ATGGAGAACGACGAACTC ATGGAAAATGATGAGCTG ATGGAAAATGATGAGCTA ATGGAAAATGATGAGCTT ATGGAAAATGATGAGCTC ATGGAAAATGATGAACTG ATGGAAAATGATGAACTA ATGGAAAATGATGAACTT ATGGAAAATGATGAACTC ATGGAAAATGACGAGCTG ATGGAAAATGACGAGCTA ATGGAAAATGACGAGCTT ATGGAAAATGACGAGCTC ATGGAAAATGACGAACTG ATGGAAAATGACGAACTA ATGGAAAATGACGAACTT ATGGAAAATGACGAACTC ATGGAAAACGATGAGCTG ATGGAAAACGATGAGCTA ATGGAAAACGATGAGCTT ATGGAAAACGATGAGCTC ATGGAAAACGATGAACTG ATGGAAAACGATGAACTA ATGGAAAACGATGAACTT ATGGAAAACGATGAACTC ATGGAAAACGACGAGCTG ATGGAAAACGACGAGCTA ATGGAAAACGACGAGCTT ATGGAAAACGACGAGCTC ATGGAAAACGACGAACTG ATGGAAAACGACGAACTA ATGGAAAACGACGAACTT ATGGAAAACGACGAACTC]

func BuildGbk

func BuildGbk(sequence Sequence) []byte

BuildGbk builds a GBK string to be written out to db or file.

Example
sequence := ReadGbk("data/puc19.gbk")
gbkBytes := BuildGbk(sequence)
testSequence := ParseGbk(gbkBytes)

fmt.Println(testSequence.Meta.Locus.ModificationDate)
Output:

22-OCT-2019

func BuildGff

func BuildGff(sequence Sequence) []byte

BuildGff takes an Annotated sequence and returns a byte array representing a gff to be written out.

Example
sequence := ReadGff("data/ecoli-mg1655-short.gff")
gffBytes := BuildGff(sequence)
reparsedSequence := ParseGff(gffBytes)

fmt.Println(reparsedSequence.Meta.Name)
Output:

U00096.3

func ComplementBase

func ComplementBase(basePair rune) rune

ComplementBase accepts a base pair and returns its complement base pair

func CreateBarcodes added in v0.12.0

func CreateBarcodes(length int, maxSubSequence int) []string

CreateBarcodes is a simplified version of CreateBarcodesWithBannedSequences with sane defaults.

Example
barcodes := CreateBarcodes(20, 4)

fmt.Println(barcodes[0])
Output:

AAAATAAAGAAACAATTAAT

func CreateBarcodesWithBannedSequences added in v0.12.0

func CreateBarcodesWithBannedSequences(length int, maxSubSequence int, bannedSequences []string, bannedFunctions []func(string) bool) []string

CreateBarcodesWithBannedSequences creates a list of barcodes given a desired barcode length, the maxSubSequence shared in each barcode, Sequences may be marked as banned by passing a static list, `bannedSequences`, or, if more flexibility is needed, through a list of `bannedFunctions` that dynamically generates bannedSequences. If a sequence is banned, it will not appear within a barcode. The a `bannedFunctions` function can determine if a barcode should be banned or not on the fly. If it is banned, we will continuing iterating until a barcode is found that satisfies the bannedFunction requirement.

Example
barcodes := CreateBarcodesWithBannedSequences(20, 4, []string{"CTCTCGGTCGCTCC"}, []func(string) bool{})

fmt.Println(barcodes[0])
Output:

AAAATAAAGAAACAATTAAT

func FindBsaI added in v0.11.2

func FindBsaI(sequence string, c chan DnaSuggestion, wg *sync.WaitGroup)

FindBsaI is a simple problematicSequenceFunc, for use in testing

func FindTypeIIS added in v0.11.2

func FindTypeIIS(sequence string, c chan DnaSuggestion, wg *sync.WaitGroup)

FindTypeIIS is a problematicSequenceFunc used for finding TypeIIS restriction enzymes. It finds BbsI, BsaI, BtgZI, BsmBI, SapI, and PaqCI(AarI)

func FixCds added in v0.11.2

func FixCds(sqlitePath string, sequence string, codontable CodonTable, problematicSequenceFuncs []func(string, chan DnaSuggestion, *sync.WaitGroup)) (string, error)

FixCds fixes a CDS given the CDS sequence, a codon table, and a list of functions to solve for.

Example
bla := "ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAA"
sequence := ReadGbk("data/ecoli-mg1655.gff")
codonTable := GetCodonTable(11)
optimizationTable := sequence.GetOptimizationTable(codonTable)

var functions []func(string, chan DnaSuggestion, *sync.WaitGroup)
//functions = append(functions, FindBsaI)
functions = append(functions, FindTypeIIS)

fixedSeq, _ := FixCds(":memory:", bla, optimizationTable, functions)
fmt.Println(fixedSeq)
Output:

ATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGCGTGGATCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGG

func Hash added in v0.11.3

func Hash(sequence string, sequenceType string, circular bool, doubleStranded bool) (string, error)

Hash is a function to create Seqhashes, a specific kind of identifier.

Example
sequence := ReadGbk("data/puc19.gbk")

seqhash, _ := Hash(sequence.Sequence, "DNA", true, true)
fmt.Println(seqhash)
Output:

v1_DCD_4b0616d1b3fc632e42d78521deb38b44fba95cca9fde159e01cd567fa996ceb9

func MarmurDoty

func MarmurDoty(sequence string) float64

MarmurDoty calculates the melting point of an extremely short DNA sequence (<15 bp) using a modified Marmur Doty formula [Marmur J & Doty P (1962). Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature. J Mol Biol, 5, 109-118.]

Example
sequenceString := "ACGTCCGGACTT"
meltingTemp := MarmurDoty(sequenceString)

fmt.Println(meltingTemp)
Output:

31

func MeltingTemp

func MeltingTemp(sequence string) float64

TODO make custom function for phusion according to https://tmcalculator.neb.com/#!/help MeltingTemp calls SantaLucia with default inputs for primer and salt concentration.

Example
sequenceString := "GTAAAACGACGGCCAGT" // M13 fwd
expectedTM := 52.8
meltingTemp := MeltingTemp(sequenceString)
withinMargin := math.Abs(expectedTM-meltingTemp)/expectedTM >= 0.02

fmt.Println(withinMargin)
Output:

false

func NucleobaseDeBruijnSequence added in v0.12.0

func NucleobaseDeBruijnSequence(substringLength int) string

NucleobaseDeBruijnSequence generates a DNA DeBruijn sequence with alphabet ATGC. DeBruijn sequences are basically a string with all unique substrings of an alphabet represented exactly once. Code is adapted from https://rosettacode.org/wiki/De_Bruijn_sequences#Go

Example
a := NucleobaseDeBruijnSequence(4)

fmt.Println(a)
Output:

AAAATAAAGAAACAATTAATGAATCAAGTAAGGAAGCAACTAACGAACCATATAGATACATTTATTGATTCATGTATGGATGCATCTATCGATCCAGAGACAGTTAGTGAGTCAGGTAGGGAGGCAGCTAGCGAGCCACACTTACTGACTCACGTACGGACGCACCTACCGACCCTTTTGTTTCTTGGTTGCTTCGTTCCTGTGTCTGGGTGGCTGCGTGCCTCTCGGTCGCTCCGTCCCGGGGCGGCCGCGCCCCAAA

func Optimize

func Optimize(aminoAcids string, codonTable CodonTable) (string, error)

Optimize takes an amino acid sequence and CodonTable and returns an optimized codon sequence

Example
gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*"

sequence := ReadGbk("data/puc19.gbk")
codonTable := GetCodonTable(11)

optimizationTable := sequence.GetOptimizationTable(codonTable)

optimizedSequence, _ := Optimize(gfpTranslation, optimizationTable)
optimizedSequenceTranslation, _ := Translate(optimizedSequence, optimizationTable)

fmt.Println(optimizedSequenceTranslation == gfpTranslation)
Output:

true

func ParseFASTAGz added in v0.9.0

func ParseFASTAGz(r io.Reader, sequences chan<- Fasta)

func RandomProteinSequence added in v0.11.13

func RandomProteinSequence(length int, seed int64) (string, error)

RandomProteinSequence returns a random protein sequence as a string that have size length, starts with aminoacid M (Methionine) and finishes with * (stop codon). The random generator uses the seed provided as parameter.

Example
// RandomProteinSequence builds a Protein Sequence by only passing through arguments a length and a seed that will be use to generate a randomly the sequence. The length needs to be greater than two because every sequence already have a start and stop codon. Seed makes this test deterministic.
randomProtein, _ := RandomProteinSequence(15, 2)
fmt.Println(randomProtein)
Output:

MHHPAFRMFNTMYG*

func ReadFASTAConcurrent added in v0.9.5

func ReadFASTAConcurrent(path string, sequences chan<- Fasta)
Example
fastas := make(chan Fasta, 1000)
go ReadFASTAConcurrent("data/smallfasta.fasta", fastas)
var name string
for fasta := range fastas {
	name = fasta.Name
}

fmt.Println(name)
Output:

camR-2|AGAC,AGGT

func ReadFASTAGz added in v0.9.0

func ReadFASTAGz(path string, sequences chan<- Fasta)
Example
fastas := make(chan Fasta, 1000)
go ReadFASTAGz("data/uniprot_1mb_test.fasta.gz", fastas)
var name string
for fasta := range fastas {
	name = fasta.Name
}

fmt.Println(name)
Output:

sp|P86857|AGP_MYTCA Alanine and glycine-rich protein (Fragment) OS=Mytilus californianus OX=6549 PE=1 SV=1

func ReverseComplement

func ReverseComplement(sequence string) string

ReverseComplement takes the reverse complement of a sequence

func RotateSequence

func RotateSequence(sequence string) string

RotateSequence rotates circular sequences to deterministic point.

Example
sequence := ReadGbk("data/puc19.gbk")
sequenceLength := len(sequence.Sequence)
testSequence := sequence.Sequence[sequenceLength/2:] + sequence.Sequence[0:sequenceLength/2]

fmt.Println(RotateSequence(sequence.Sequence) == RotateSequence(testSequence))
Output:

true

func SantaLucia

func SantaLucia(sequence string, primerConcentration, saltConcentration, magnesiumConcentration float64) (meltingTemp, dH, dS float64)

SantaLucia calculates the melting point of a short DNA sequence (15-200 bp), using the Nearest Neighbors method [SantaLucia, J. (1998) PNAS, doi:10.1073/pnas.95.4.1460]

Example
sequenceString := "ACGATGGCAGTAGCATGC" //"GTAAAACGACGGCCAGT" // M13 fwd
testCPrimer := 0.1e-6                  // primer concentration
testCNa := 350e-3                      // salt concentration
testCMg := 0.0                         // magnesium concentration
expectedTM := 62.7                     // roughly what we're expecting with a margin of error
meltingTemp, _, _ := SantaLucia(sequenceString, testCPrimer, testCNa, testCMg)
withinMargin := math.Abs(expectedTM-meltingTemp)/expectedTM >= 0.02 // checking margin of error

fmt.Println(withinMargin)
Output:

false

func Translate

func Translate(sequence string, codonTable CodonTable) (string, error)

Translate translates a codon sequence to an amino acid sequence

Example
gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*"
gfpDnaSequence := "ATGGCTAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA"
testTranslation, _ := Translate(gfpDnaSequence, GetCodonTable(11)) // need to specify which codons map to which amino acids per NCBI table

fmt.Println(gfpTranslation == testTranslation)
Output:

true

func WriteCodonJSON

func WriteCodonJSON(codontable CodonTable, path string)

WriteCodonJSON writes a CodonTable struct out to JSON.

Example
codontable := ReadCodonJSON("data/bsub_codon_test.json")
WriteCodonJSON(codontable, "data/codon_test.json")
testCodonTable := ReadCodonJSON("data/codon_test.json")

// cleaning up test data
os.Remove("data/codon_test.json")

fmt.Println(testCodonTable.AminoAcids[0].Codons[0].Weight)
Output:

28327

func WriteGbk

func WriteGbk(sequence Sequence, path string)

WriteGbk takes an Sequence struct and a path string and writes out a gff to that path.

Example
tmpDataDir, err := ioutil.TempDir("", "data-*")
if err != nil {
	fmt.Println(err.Error())
}
defer os.RemoveAll(tmpDataDir)

sequence := ReadGbk("data/puc19.gbk")

tmpGbkFilePath := filepath.Join(tmpDataDir, "puc19.gbk")
WriteGbk(sequence, tmpGbkFilePath)

testSequence := ReadGbk(tmpGbkFilePath)

fmt.Println(testSequence.Meta.Locus.ModificationDate)
Output:

22-OCT-2019

func WriteGff

func WriteGff(sequence Sequence, path string)

WriteGff takes an Sequence struct and a path string and writes out a gff to that path.

Example
tmpDataDir, err := ioutil.TempDir("", "data-*")
if err != nil {
	fmt.Println(err.Error())
}
defer os.RemoveAll(tmpDataDir)

sequence := ReadGff("data/ecoli-mg1655-short.gff")

tmpGffFilePath := filepath.Join(tmpDataDir, "ecoli-mg1655-short.gff")
WriteGff(sequence, tmpGffFilePath)

testSequence := ReadGff(tmpGffFilePath)

fmt.Println(testSequence.Meta.Name)
Output:

U00096.3

func WriteJSON

func WriteJSON(sequence Sequence, path string)

WriteJSON writes an Sequence struct out to json.

Example
tmpDataDir, err := ioutil.TempDir("", "data-*")
if err != nil {
	fmt.Println(err.Error())
}
defer os.RemoveAll(tmpDataDir)

sequence := ReadJSON("data/sample.json")

tmpJSONFilePath := filepath.Join(tmpDataDir, "sample.json")
WriteJSON(sequence, tmpJSONFilePath)

testSequence := ReadJSON(tmpJSONFilePath)

fmt.Println(testSequence.Meta.Source)
Output:

Saccharomyces cerevisiae (baker's yeast)

Types

type AminoAcid

type AminoAcid struct {
	Letter string  `json:"letter"`
	Codons []Codon `json:"codons"`
}

AminoAcid holds information for an amino acid and related codons in a struct

type CloneSequence added in v0.11.0

type CloneSequence struct {
	Sequence string
	Circular bool
}

func GoldenGate added in v0.11.0

func GoldenGate(sequences []CloneSequence, enzymeStr string) ([]CloneSequence, error)
Example
// Fragment 1 has a palindrome at the end
fragment1 := CloneSequence{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false}
fragment2 := CloneSequence{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false}
popen := CloneSequence{"TAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGTAgtcttcGCcatcgCtACTAAAagccagataacagtatgcgtatttgcgcgctgatttttgcggtataagaatatatactgatatgtatacccgaagtatgtcaaaaagaggtatgctatgaagcagcgtattacagtgacagttgacagcgacagctatcagttgctcaaggcatatatgatgtcaatatctccggtctggtaagcacaaccatgcagaatgaagcccgtcgtctgcgtgccgaacgctggaaagcggaaaatcaggaagggatggctgaggtcgcccggtttattgaaatgaacggctcttttgctgacgagaacagggGCTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCCTTATACACAGgcgatgttgaagaccaCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGG", true}

Clones, _ := GoldenGate([]CloneSequence{fragment1, fragment2, popen}, "BbsI")

fmt.Println(Clones[0].Sequence)
Output:

AAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAG

func Ligate added in v0.11.0

func Ligate(fragments []Fragment, maxClones int) []CloneSequence

type Codon

type Codon struct {
	Triplet string `json:"triplet"`
	Weight  int    `json:"weight"` // needs to be set to 1 for random chooser
}

Codon holds information for a codon triplet in a struct

type CodonTable

type CodonTable struct {
	StartCodons []string    `json:"start_codons"`
	StopCodons  []string    `json:"stop_codons"`
	AminoAcids  []AminoAcid `json:"amino_acids"`
}

CodonTable holds information for a codon table.

func AddCodonTable added in v0.11.2

func AddCodonTable(firstCodonTable CodonTable, secondCodonTable CodonTable) CodonTable

AddCodonTable takes 2 CodonTables and adds them together to create a new CodonTable.

Example
sequence := ReadGbk("data/puc19.gbk")
codonTable := GetCodonTable(11)
optimizationTable := sequence.GetOptimizationTable(codonTable)

sequence2 := ReadGbk("data/phix174.gb")
codonTable2 := GetCodonTable(11)
optimizationTable2 := sequence2.GetOptimizationTable(codonTable2)

finalTable := AddCodonTable(optimizationTable, optimizationTable2)
for _, aa := range finalTable.AminoAcids {
	for _, codon := range aa.Codons {
		if codon.Triplet == "GGC" {
			fmt.Println(codon.Weight)
		}
	}
}
Output:

90

func CompromiseCodonTable added in v0.11.2

func CompromiseCodonTable(firstCodonTable CodonTable, secondCodonTable CodonTable, cutOff float64) (CodonTable, error)

CompromiseCodonTable takes 2 CodonTables and makes a new CodonTable that is an equal compromise between the two tables.

Example
sequence := ReadGbk("data/puc19.gbk")
codonTable := GetCodonTable(11)
optimizationTable := sequence.GetOptimizationTable(codonTable)

sequence2 := ReadGbk("data/phix174.gb")
codonTable2 := GetCodonTable(11)
optimizationTable2 := sequence2.GetOptimizationTable(codonTable2)

finalTable, _ := CompromiseCodonTable(optimizationTable, optimizationTable2, 0.1)
for _, aa := range finalTable.AminoAcids {
	for _, codon := range aa.Codons {
		if codon.Triplet == "TAA" {
			fmt.Println(codon.Weight)
		}
	}
}
Output:

2727

func GetCodonTable

func GetCodonTable(index int) CodonTable

GetCodonTable takes the index of desired NCBI codon table and returns it.

func ParseCodonJSON

func ParseCodonJSON(file []byte) CodonTable

ParseCodonJSON parses a CodonTable JSON file.

Example
file, _ := ioutil.ReadFile("data/bsub_codon_test.json")
codontable := ParseCodonJSON(file)

fmt.Println(codontable.AminoAcids[0].Codons[0].Weight)
Output:

28327

func ReadCodonJSON

func ReadCodonJSON(path string) CodonTable

ReadCodonJSON reads a CodonTable JSON file.

Example
codontable := ReadCodonJSON("data/bsub_codon_test.json")

fmt.Println(codontable.AminoAcids[0].Codons[0].Weight)
Output:

28327

func (CodonTable) OptimizeTable

func (codonTable CodonTable) OptimizeTable(sequence string) CodonTable

OptimizeTable weights each codon in a codon table according to input string codon frequency. This function actually mutates the CodonTable struct itself.

type DnaSuggestion added in v0.11.2

type DnaSuggestion struct {
	Start          int    `db:"start"`
	End            int    `db:"end"`
	Bias           string `db:"gcbias"`
	QuantityFixes  int    `db:"quantityfixes"`
	SuggestionType string `db:"suggestiontype"`
	Step           int    `db:"step"`
	Id             int    `db:"id"`
}

DnaSuggestion is a suggestion of a fixer, generated by a problematicSequenceFunc.

type Enzyme added in v0.11.0

type Enzyme struct {
	EnzymeName        string
	RegexpFor         *regexp.Regexp
	RegexpRev         *regexp.Regexp
	EnzymeSkip        int
	EnzymeOverhangLen int
	Directional       bool
}

type Fasta added in v0.9.0

type Fasta struct {
	Name     string `json:"name"`
	Sequence string `json:"sequence"`
}

type Feature

type Feature struct {
	Name string //Seqid in gff, name in gbk
	//gff specific
	Source               string            `json:"source"`
	Type                 string            `json:"type"`
	Score                string            `json:"score"`
	Strand               string            `json:"strand"`
	Phase                string            `json:"phase"`
	Attributes           map[string]string `json:"attributes"`
	GbkLocationString    string            `json:"gbk_location_string"`
	Sequence             string            `json:"sequence"`
	SequenceLocation     Location          `json:"sequence_location"`
	SequenceHash         string            `json:"sequence_hash"`
	Description          string            `json:"description"`
	SequenceHashFunction string            `json:"hash_function"`
	ParentSequence       *Sequence         `json:"-"`
}

Feature holds a single annotation in a struct. from https://github.com/blachlylab/gff3/blob/master/gff3.go

func (Feature) GetSequence

func (feature Feature) GetSequence() string

GetSequence is a method wrapper to get a Feature's sequence. Mutates with Sequence.

type Fragment added in v0.11.0

type Fragment struct {
	Sequence        string
	ForwardOverhang string
	ReverseOverhang string
}

func RestrictionEnzymeCut added in v0.11.0

func RestrictionEnzymeCut(seq CloneSequence, enzymeStr string) ([]Fragment, error)

func RestrictionEnzymeCutEnzymeStruct added in v0.11.0

func RestrictionEnzymeCutEnzymeStruct(seq CloneSequence, enzyme Enzyme) []Fragment

type Location

type Location struct {
	Start             int        `json:"start"`
	End               int        `json:"end"`
	Complement        bool       `json:"complement"`
	Join              bool       `json:"join"`
	FivePrimePartial  bool       `json:"five_prime_partial"`
	ThreePrimePartial bool       `json:"three_prime_partial"`
	SubLocations      []Location `json:"sub_locations"`
}

Location holds nested location info for sequence region.

type Locus

type Locus struct {
	Name             string `json:"name"`
	SequenceLength   string `json:"sequence_length"`
	MoleculeType     string `json:"molecule_type"`
	GenbankDivision  string `json:"genbank_division"`
	ModificationDate string `json:"modification_date"`
	SequenceCoding   string `json:"sequence_coding"`
	Circular         bool   `json:"circular"`
	Linear           bool   `json:"linear"`
}

Locus holds Locus information in a Meta struct.

type Meta

type Meta struct {
	Name        string            `json:"name"`
	GffVersion  string            `json:"gff_version"`
	RegionStart int               `json:"region_start"`
	RegionEnd   int               `json:"region_end"`
	Size        int               `json:"size"`
	Type        string            `json:"type"`
	Date        string            `json:"date"`
	Definition  string            `json:"definition"`
	Accession   string            `json:"accession"`
	Version     string            `json:"version"`
	Keywords    string            `json:"keywords"`
	Organism    string            `json:"organism"`
	Source      string            `json:"source"`
	Origin      string            `json:"origin"`
	Locus       Locus             `json:"locus"`
	References  []Reference       `json:"references"`
	Other       map[string]string `json:"other"`
}

Meta Holds all the meta information of an Sequence struct.

type Overhang added in v0.11.0

type Overhang struct {
	Length   int
	Position int
	Forward  bool
}

type Reference

type Reference struct {
	Index   string `json:"index"`
	Authors string `json:"authors"`
	Title   string `json:"title"`
	Journal string `json:"journal"`
	PubMed  string `json:"pub_med"`
	Remark  string `json:"remark"`
	Range   string `json:"range"`
}

Reference holds information one reference in a Meta struct.

type Sequence

type Sequence struct {
	Meta                 Meta      `json:"meta"`
	Description          string    `json:"description"`
	SequenceHash         string    `json:"sequence_hash"`
	SequenceHashFunction string    `json:"hash_function"`
	Sequence             string    `json:"sequence"`
	Features             []Feature `json:"features"`
}

Sequence holds all sequence information in a single struct.

func ParseGbk

func ParseGbk(file []byte) Sequence

ParseGbk takes in a string representing a gbk/gb/genbank file and parses it into an Sequence object.

Example
file, _ := ioutil.ReadFile("data/puc19.gbk")
sequence := ParseGbk(file)

fmt.Println(sequence.Meta.Locus.ModificationDate)
Output:

22-OCT-2019

func ParseGbkFlat

func ParseGbkFlat(file []byte) []Sequence

ParseGbkFlat specifically takes the output of a Genbank Flat file that from the genbank ftp dumps. These files have 10 line headers, which are entirely removed

Example
file, _ := ioutil.ReadFile("data/flatGbk_test.seq")
sequences := ParseGbkFlat(file)
var locus []string
for _, sequence := range sequences {
	locus = append(locus, sequence.Meta.Locus.Name)
}

fmt.Println(strings.Join(locus, ", "))
Output:

AB000100, AB000106

func ParseGbkMulti

func ParseGbkMulti(file []byte) []Sequence

ParseGbkMulti parses multiple Genbank files in a byte array to multiple sequences

Example
file, _ := ioutil.ReadFile("data/multiGbk_test.seq")
sequences := ParseGbkMulti(file)
var locus []string
for _, sequence := range sequences {
	locus = append(locus, sequence.Meta.Locus.Name)
}

fmt.Println(strings.Join(locus, ", "))
Output:

AB000100, AB000106

func ParseGff

func ParseGff(file []byte) Sequence

ParseGff Takes in a string representing a gffv3 file and parses it into an Sequence object.

Example
file, _ := ioutil.ReadFile("data/ecoli-mg1655-short.gff")
sequence := ParseGff(file)

fmt.Println(sequence.Meta.Name)
Output:

U00096.3

func ParseJSON

func ParseJSON(file []byte) Sequence

ParseJSON parses an Sequence JSON file and adds appropriate pointers to struct.

Example
file, _ := ioutil.ReadFile("data/sample.json")
sequence := ParseJSON(file)

fmt.Println(sequence.Meta.Source)
Output:

Saccharomyces cerevisiae (baker's yeast)

func ReadGbk

func ReadGbk(path string) Sequence

ReadGbk reads a Gbk from path and parses into an Annotated sequence struct.

Example
sequence := ReadGbk("data/puc19.gbk")
fmt.Println(sequence.Meta.Locus.ModificationDate)
Output:

22-OCT-2019

func ReadGbkFlat

func ReadGbkFlat(path string) []Sequence

ReadGbkFlat reads flat genbank files, like the ones provided by the NCBI FTP server (after decompression)

Example
sequences := ReadGbkFlat("data/long_comment.seq")
var locus []string
for _, sequence := range sequences {
	locus = append(locus, sequence.Meta.Locus.Name)
}

fmt.Println(strings.Join(locus, ", "))
Output:

AB000100, AB000106

func ReadGbkFlatGz

func ReadGbkFlatGz(path string) []Sequence

ReadGbkFlatGz reads flat gzip'd genbank files, like the ones provided by the NCBI FTP server

Example
sequences := ReadGbkFlatGz("data/flatGbk_test.seq.gz")
//sequences := ReadGbkFlatGz("data/gbbct358.seq.gz")
var locus []string
for _, sequence := range sequences {
	locus = append(locus, sequence.Meta.Locus.Name)
}
fmt.Println(strings.Join(locus, ", "))
Output:

AB000100, AB000106

func ReadGbkMulti

func ReadGbkMulti(path string) []Sequence

ReadGbkMulti reads multiple genbank files from a single file

Example
sequences := ReadGbkMulti("data/multiGbk_test.seq")
var locus []string
for _, sequence := range sequences {
	locus = append(locus, sequence.Meta.Locus.Name)
}

fmt.Println(strings.Join(locus, ", "))
Output:

AB000100, AB000106

func ReadGff

func ReadGff(path string) Sequence

ReadGff takes in a filepath for a .gffv3 file and parses it into an Annotated Sequence struct.

Example
sequence := ReadGff("data/ecoli-mg1655-short.gff")
fmt.Println(sequence.Meta.Name)
Output:

U00096.3

func ReadJSON

func ReadJSON(path string) Sequence

ReadJSON reads an Sequence JSON file.

Example
sequence := ReadJSON("data/sample.json")

fmt.Println(sequence.Meta.Source)
Output:

Saccharomyces cerevisiae (baker's yeast)

func (*Sequence) AddFeature

func (sequence *Sequence) AddFeature(feature Feature) []Feature

AddFeature is the canonical way to add a Feature into a Sequence struct. Appending a Feature struct directly to Sequence.Feature's will break .GetSequence() method.

func (Sequence) GetOptimizationTable

func (sequence Sequence) GetOptimizationTable(codonTable CodonTable) CodonTable

GetOptimizationTable is a Sequence method that takes a CodonTable and weights it to be used to optimize inserts.

func (Sequence) GetSequence

func (sequence Sequence) GetSequence() string

GetSequence is a method to get the full sequence of an annotated sequence

func (Sequence) Hash

func (sequence Sequence) Hash() (string, error)

Hash is a method wrapper for hashing Sequence structs. Note that all sequence structs are, by default, double-stranded sequences, since Genbank does not track whether or not a given sequence in their database is single stranded or double stranded.

Example
sequence := ReadGbk("data/puc19.gbk")

// Seqhash assumes doubleStranded sequence and defaults to linear
// if sequence.Meta.Locus.Circular is not set
seqhash, _ := sequence.Hash()
fmt.Println(seqhash)
Output:

v1_DCD_4b0616d1b3fc632e42d78521deb38b44fba95cca9fde159e01cd567fa996ceb9

Directories

Path Synopsis
parsers
Poly command line utility installation instructions: Mac OSX brew install timothystiles/poly/poly Linux - deb/rpm Download the .deb or .rpm from the releases page https://github.com/TimothyStiles/poly/releases and install with `dpkg -i` and `rpm -i` respectively Windows Coming soon...
Poly command line utility installation instructions: Mac OSX brew install timothystiles/poly/poly Linux - deb/rpm Download the .deb or .rpm from the releases page https://github.com/TimothyStiles/poly/releases and install with `dpkg -i` and `rpm -i` respectively Windows Coming soon...

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL