fastq

package
v0.0.0-...-85e8820 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2024 License: MIT Imports: 6 Imported by: 1

Documentation

Overview

Package fastq contains fastq parsers and writers.

Fastq is a flat text file format developed in ~2000 to store nucleotide sequencing data. While similar to fastq, fastq has a few differences. First, the sequence identifier begins with @ instead of >, and includes quality values for a sequence.

This package provides a parser and writer for working with Fastq formatted sequencing data.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Header struct{}

Header is a blank struct, needed for compatibility with bio parsers. It contains nothing.

func (*Header) WriteTo

func (header *Header) WriteTo(w io.Writer) (int64, error)

WriteTo is a blank function, needed for compatibility with bio parsers. It doesn't do anything.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser is a flexible parser that provides ample control over reading fastq-formatted sequences. It is initialized with NewParser.

Example
package main

import (
	_ "embed"
	"fmt"
	"strings"

	"github.com/koeng101/dnadesign/lib/bio/fastq"
)

//go:embed data/nanosavseq.fastq
var baseFastq string

func main() {
	parser := fastq.NewParser(strings.NewReader(baseFastq), 2*32*1024)
	for {
		fastq, err := parser.Next()
		if err != nil {
			fmt.Println(err)
			break
		}
		fmt.Println(fastq.Identifier)
	}
}
Output:

e3cc70d5-90ef-49b6-bbe1-cfef99537d73
92728f25-b658-426c-8cd7-d82dc70dbf71
60907b6b-5e38-498e-9c07-f036ebd8c658
990e110e-5e50-41a2-8ad5-92044d4465b8
EOF

func NewParser

func NewParser(r io.Reader, maxLineSize int) *Parser

NewParser returns a Parser that uses r as the source from which to parse fastq formatted sequences.

func (*Parser) Header

func (parser *Parser) Header() (Header, error)

Header returns nil,nil.

func (*Parser) Next

func (parser *Parser) Next() (Read, error)

Next reads next fastq genome in underlying reader and returns the result and the amount of bytes read during the call. Next only returns an error if it:

  • Attempts to read and fails to find a valid fastq sequence.
  • Returns reader's EOF if called after reader has been exhausted.
  • If a EOF is encountered immediately after a sequence with no newline ending. In this case the Read up to that point is returned with an EOF error.

It is worth noting the amount of bytes read are always right up to before the next fastq starts which means this function can effectively be used to index where fastqs start in a file or string.

Next is simplified for fastq files from fasta files. Unlike fasta files, fastq always have 4 lines following each other - not variable with a line limit of 80 like fasta files have. So instead of a for loop, you can just parse 4 lines at once.

type Read

type Read struct {
	Identifier string            `json:"identifier"`
	Optionals  map[string]string `json:"optionals"` // Nanopore, for example, carries along data like: `read=13956 ch=53 start_time=2020-11-11T01:49:01Z`
	Sequence   string            `json:"sequence"`
	Quality    string            `json:"quality"`
}

Read is a struct representing a single Fastq read element with an Identifier, its corresponding sequence, its quality score, and any optional pieces of data.

func (*Read) DeepCopy

func (read *Read) DeepCopy() Read

DeepCopy deep copies a read. Used for when you want to modify optionals then pipe elsewhere.

func (*Read) WriteTo

func (read *Read) WriteTo(w io.Writer) (int64, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL