bgzf

package
v1.4.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 28, 2024 License: BSD-3-Clause Imports: 11 Imported by: 87

Documentation

Overview

Package bgzf implements BGZF format reading and writing according to the SAM specification.

The specification is available at https://github.com/samtools/hts-specs.

Index

Examples

Constants

View Source
const (
	BlockSize    = 0x0ff00 // The maximum size of an uncompressed input data block.
	MaxBlockSize = 0x10000 // The maximum size of a compressed output block.
)

Variables

View Source
var (
	ErrClosed            = errors.New("bgzf: use of closed writer")
	ErrCorrupt           = errors.New("bgzf: corrupt block")
	ErrBlockOverflow     = errors.New("bgzf: block overflow")
	ErrWrongFileType     = errors.New("bgzf: file is a directory")
	ErrNoEnd             = errors.New("bgzf: cannot determine offset from end")
	ErrNotASeeker        = errors.New("bgzf: not a seeker")
	ErrContaminatedCache = errors.New("bgzf: cache owner mismatch")
	ErrNoBlockSize       = errors.New("bgzf: could not determine block size")
	ErrBlockSizeMismatch = errors.New("bgzf: unexpected block size")
)

Functions

func HasEOF

func HasEOF(r io.ReaderAt) (bool, error)

HasEOF checks for the presence of a BGZF magic EOF block. The magic block is defined in the SAM specification. A magic block is written by a Writer on calling Close. The ReaderAt must provide some method for determining valid ReadAt offsets.

Types

type Block

type Block interface {
	// Base returns the file offset of the start of
	// the gzip member from which the Block data was
	// decompressed.
	Base() int64

	io.Reader
	io.ByteReader

	// Used returns whether one or more bytes have
	// been read from the Block.
	Used() bool

	// NextBase returns the expected position of the next
	// BGZF block. It returns -1 if the Block is not valid.
	NextBase() int64
	// contains filtered or unexported methods
}

Block wraps interaction with decompressed BGZF data blocks.

type Cache

type Cache interface {
	// Get returns the Block in the Cache with the specified
	// base or a nil Block if it does not exist. The returned
	// Block must be removed from the Cache.
	Get(base int64) Block

	// Put inserts a Block into the Cache, returning the Block
	// that was evicted or nil if no eviction was necessary and
	// a boolean indicating whether the put Block was retained
	// by the Cache.
	Put(Block) (evicted Block, retained bool)

	// Peek returns whether a Block exists in the cache for the
	// given base. If a Block satisfies the request, then exists
	// is returned as true with the offset for the next Block in
	// the stream, otherwise false and -1.
	Peek(base int64) (exists bool, next int64)
}

Cache is a Block caching type. Basic cache implementations are provided in the cache package. A Cache must be safe for concurrent use.

If a Cache is a Wrapper, its Wrap method is called on newly created blocks.

type Chunk

type Chunk struct {
	Begin Offset
	End   Offset
}

Chunk is a region of a BGZF file.

type Offset

type Offset struct {
	File  int64
	Block uint16
}

Offset is a BGZF virtual offset.

type Reader

type Reader struct {
	gzip.Header

	// Blocked specifies the behaviour of the
	// Reader at the end of a BGZF member.
	// If the Reader is Blocked, a Read that
	// reaches the end of a BGZF block will
	// return io.EOF. This error is not sticky,
	// so a subsequent Read will progress to
	// the next block if it is available.
	Blocked bool
	// contains filtered or unexported fields
}

Reader implements BGZF blocked gzip decompression.

func NewReader

func NewReader(r io.Reader, rd int) (*Reader, error)

NewReader returns a new BGZF reader.

The number of concurrent read decompressors is specified by rd. If rd is 0, GOMAXPROCS concurrent will be created. If rd is 1, blocks will be read synchronously without readahead. The returned Reader should be closed after use to avoid leaking resources.

func (*Reader) Begin

func (bg *Reader) Begin() Tx

Begin returns a Tx that starts at the current virtual offset.

func (*Reader) BlockLen

func (bg *Reader) BlockLen() int

BlockLen returns the number of bytes remaining to be read from the current BGZF block.

func (*Reader) Close

func (bg *Reader) Close() error

Close closes the reader and releases resources.

func (*Reader) LastChunk

func (bg *Reader) LastChunk() Chunk

LastChunk returns the region of the BGZF file read by the last successful read operation or the resulting virtual offset of the last successful seek operation.

func (*Reader) Read

func (bg *Reader) Read(p []byte) (int, error)

Read implements the io.Reader interface.

func (*Reader) ReadByte added in v1.4.0

func (bg *Reader) ReadByte() (byte, error)

ReadByte implements the io.ByteReader interface.

Example
package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"os"

	"github.com/biogo/hts/bgzf"
)

func main() {
	// Write Tom Sawyer into a bgzf buffer.
	var buf bytes.Buffer
	w := bgzf.NewWriter(&buf, 1)
	f, err := os.Open("testdata/Mark.Twain-Tom.Sawyer.txt")
	if err != nil {
		log.Fatalf("failed to open file: %v", err)
	}
	defer f.Close()
	_, err = io.Copy(w, f)
	if err != nil {
		log.Fatalf("failed to copy file: %v", err)
	}
	err = w.Close()
	if err != nil {
		log.Fatalf("failed to close bgzf writer: %v", err)
	}

	// The text to search for.
	const line = `"It ain't any use, Huck, we're wrong again."`

	// Read the data until the line is found and output the line
	// number and bgzf.Chunk corresponding to the lines position
	// in the compressed data.
	r, err := bgzf.NewReader(&buf, 1)
	if err != nil {
		log.Fatal(err)
	}
	var n int
	for {
		n++
		b, chunk, err := readLine(r)
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}
		// Make sure we trim the trailing newline.
		if bytes.Equal(bytes.TrimSpace(b), []byte(line)) {
			fmt.Printf("line:%d chunk:%+v\n", n, chunk)
			break
		}
	}

}

// readLine returns a line terminated by a '\n' and the bgzf.Chunk that contains
// the line, including the newline character. If the end of file is reached before
// a newline, the unterminated line and corresponding chunk are returned.
func readLine(r *bgzf.Reader) ([]byte, bgzf.Chunk, error) {
	tx := r.Begin()
	var (
		data []byte
		b    byte
		err  error
	)
	for {
		b, err = r.ReadByte()
		if err != nil {
			break
		}
		data = append(data, b)
		if b == '\n' {
			break
		}
	}
	chunk := tx.End()
	return data, chunk, err
}
Output:


line:5986 chunk:{Begin:{File:112534 Block:11772} End:{File:112534 Block:11818}}

func (*Reader) Seek

func (bg *Reader) Seek(off Offset) error

Seek performs a seek operation to the given virtual offset.

func (*Reader) SetCache

func (bg *Reader) SetCache(c Cache)

SetCache sets the cache to be used by the Reader.

type Tx

type Tx struct {
	// contains filtered or unexported fields
}

Tx represents a multi-read transaction.

func (*Tx) End

func (t *Tx) End() Chunk

End returns the Chunk spanning the transaction. After return the Tx is no longer valid.

type Wrapper

type Wrapper interface {
	Wrap(Block) Block
}

Wrapper defines Cache types that need to modify a Block at its creation.

type Writer

type Writer struct {
	gzip.Header
	// contains filtered or unexported fields
}

Writer implements BGZF blocked gzip compression.

Because the SAM specification requires that the RFC1952 FLG header field be set to 0x04, a Writer's Name and Comment fields should not be set if its output is to be read by another BGZF decompressor implementation.

func NewWriter

func NewWriter(w io.Writer, wc int) *Writer

NewWriter returns a new Writer. Writes to the returned writer are compressed and written to w.

The number of concurrent write compressors is specified by wc.

func NewWriterLevel

func NewWriterLevel(w io.Writer, level, wc int) (*Writer, error)

NewWriterLevel returns a new Writer using the specified compression level instead of gzip.DefaultCompression. Allowable level options are integer values between between gzip.BestSpeed and gzip.BestCompression inclusive.

The number of concurrent write compressors is specified by wc.

func (*Writer) Close

func (bg *Writer) Close() error

Close closes the Writer, waiting for any pending writes before returning the final error of the Writer.

func (*Writer) Error

func (bg *Writer) Error() error

Error returns the error state of the Writer.

func (*Writer) Flush

func (bg *Writer) Flush() error

Flush writes unwritten data to the underlying io.Writer. Flush does not block.

func (*Writer) Next

func (bg *Writer) Next() (int, error)

Next returns the index of the start of the next write within the decompressed data block.

func (*Writer) Wait

func (bg *Writer) Wait() error

Wait waits for all pending writes to complete and returns the subsequent error state of the Writer.

func (*Writer) Write

func (bg *Writer) Write(b []byte) (int, error)

Write writes the compressed form of b to the underlying io.Writer. Decompressed data blocks are limited to BlockSize, so individual byte slices may span block boundaries, however the Writer attempts to keep each write within a single data block.

Directories

Path Synopsis
Package cache provides basic block cache types for the bgzf package.
Package cache provides basic block cache types for the bgzf package.
Package index provides common code for CSI and tabix BGZF indexing.
Package index provides common code for CSI and tabix BGZF indexing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL