Documentation ¶
Overview ¶
Package fasta provides routines for reading and writing FASTA files and aligned FASTA files.
The format used is the one described by NCBI: http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtmlj
By default, sequences are checked to make sure they contain only valid characters: a-z, A-Z, * and -. All lowercases letters are translated to their upper case equivalent.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func QuickSequenceCount ¶
QuickSequenceCount consumes the given reader and returns the number of times ">" appears at the start of a line.
func SequenceFasta ¶
SequenceFasta returns the FASTA string corresponding to a sequence with the sequence wrapped at the number of columns given.
If cols is <= 0, then no wrapping is done.
func SequenceString ¶
SequenceStrings chops up one long sequence into multiple strings based on the number of columns provided.
If cols is <= 0, then no wrapping is done and a single string is returned.
Types ¶
type Reader ¶
type Reader struct { // When set to true, the sequences will not be checked for errors. // If you trust the data, this may improve performance. // This may be set at any time. TrustSequences bool // contains filtered or unexported fields }
A Reader reads entries from FASTA encoded input.
If TrustSequences is true, then sequence data will not be checked to make sure that it conforms to the NCBI spec. (See the Read method for details.) By default, TrustSequences is false.
func NewReader ¶
NewReader creates a new Reader that is ready to read sequences from some io.Reader.
func (*Reader) Read ¶
Read will read the next entry in the FASTA input. The format roughly corresponds to that described by NCBI: http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
In particular, the only characters allowed in the sequence section are a-z, A-Z, * and -. Any other character will result in an error.
All lower case letters in the sequence section are translated to upper case.
Blank lines, leading and trailing whitespace are always ignored (regardless of where they are).
No distinction is made between DNA/RNA or amino acid sequences. (Currently.)
It is NOT safe to call this function from multiple goroutines.
If the underlying reader is seekable, it is OK to use its seek operation provided that you call (*Reader).SeekerReset before the next time Read is called. If you don't, the behavior is undefined. Moreover, seeking will result in erroneous line numbers in error messages. Finally, you MUST seek to a location that corresponds precisely to an entry boundary. i.e., the file pointer should be at a '>' character.
func (*Reader) ReadAll ¶
ReadAll will read all entries in the FASTA input and return them as a slice. If an error is encountered, processing is stopped, and the error is returned.
func (*Reader) ReadSequence ¶
func (r *Reader) ReadSequence(translate Translator) (seq.Sequence, error)
ReadSequence is exported for use in other packages that read FASTA-like files.
The 'translate' function is used when sequences are checked for valid characters.
If you're just reading FASTA files, this method SHOULD NOT be used.
func (*Reader) SeekerReset ¶
func (r *Reader) SeekerReset()
SeekerReset will reset the internal state of Reader to allow Read to be called at arbitrary entry boundaries in the input.
See the comments for Read for more details.
type Translator ¶
A Translator is a function that accepts a single character, checks whether it's valid, and optionally maps it to a new character. Additionally, if the zero byte is returned, then the character should not be included in the final sequence.
Translators are ONLY applicable to developers writing their own parsers for FASTA-like files. They should not be used to read regular FASTA files.
type Writer ¶
type Writer struct { // The number of columns to wrap a sequence at. By default, this // is set to 60. A value <= 0 will result in no wrapping. Columns int // Whether to a '*' at the end of each sequence. // By default, this is false. Asterisk bool // contains filtered or unexported fields }
A Writer writes entries to a FASTA encoded file.
The 'Columns' corresponds to the number of columns at which a sequence is wrapped. If it's <= 0, then no wrapping will be used.
The header text is never wrapped.