bgzf

package

v1.4.5 Latest Latest Go to latest Published: Mar 28, 2024 License: BSD-3-Clause Imports: 11 Imported by: 87

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/biogo/hts

Links

Open Source Insights

Documentation ¶

Overview ¶

Package bgzf implements BGZF format reading and writing according to the SAM specification.

The specification is available at https://github.com/samtools/hts-specs.

Index ¶

Constants
Variables
func HasEOF(r io.ReaderAt) (bool, error)
type Block
type Cache
type Chunk
type Offset
type Reader
- func NewReader(r io.Reader, rd int) (*Reader, error)
type Tx
- func (t *Tx) End() Chunk
type Wrapper
type Writer
- func NewWriter(w io.Writer, wc int) *Writer
- func NewWriterLevel(w io.Writer, level, wc int) (*Writer, error)

Examples ¶

Reader.ReadByte

Constants ¶

View Source

const (
	BlockSize    = 0x0ff00 // The maximum size of an uncompressed input data block.
	MaxBlockSize = 0x10000 // The maximum size of a compressed output block.
)

Variables ¶

View Source

var (
	ErrClosed            = errors.New("bgzf: use of closed writer")
	ErrCorrupt           = errors.New("bgzf: corrupt block")
	ErrBlockOverflow     = errors.New("bgzf: block overflow")
	ErrWrongFileType     = errors.New("bgzf: file is a directory")
	ErrNoEnd             = errors.New("bgzf: cannot determine offset from end")
	ErrNotASeeker        = errors.New("bgzf: not a seeker")
	ErrContaminatedCache = errors.New("bgzf: cache owner mismatch")
	ErrNoBlockSize       = errors.New("bgzf: could not determine block size")
	ErrBlockSizeMismatch = errors.New("bgzf: unexpected block size")
)

Functions ¶

func HasEOF ¶

func HasEOF(r io.ReaderAt) (bool, error)

HasEOF checks for the presence of a BGZF magic EOF block. The magic block is defined in the SAM specification. A magic block is written by a Writer on calling Close. The ReaderAt must provide some method for determining valid ReadAt offsets.

Types ¶

type Block ¶

type Block interface {
	// Base returns the file offset of the start of
	// the gzip member from which the Block data was
	// decompressed.
	Base() int64

	io.Reader
	io.ByteReader

	// Used returns whether one or more bytes have
	// been read from the Block.
	Used() bool

	// NextBase returns the expected position of the next
	// BGZF block. It returns -1 if the Block is not valid.
	NextBase() int64
	// contains filtered or unexported methods
}

Block wraps interaction with decompressed BGZF data blocks.

type Cache ¶

type Cache interface {
	// Get returns the Block in the Cache with the specified
	// base or a nil Block if it does not exist. The returned
	// Block must be removed from the Cache.
	Get(base int64) Block

	// Put inserts a Block into the Cache, returning the Block
	// that was evicted or nil if no eviction was necessary and
	// a boolean indicating whether the put Block was retained
	// by the Cache.
	Put(Block) (evicted Block, retained bool)

	// Peek returns whether a Block exists in the cache for the
	// given base. If a Block satisfies the request, then exists
	// is returned as true with the offset for the next Block in
	// the stream, otherwise false and -1.
	Peek(base int64) (exists bool, next int64)
}

Cache is a Block caching type. Basic cache implementations are provided in the cache package. A Cache must be safe for concurrent use.

If a Cache is a Wrapper, its Wrap method is called on newly created blocks.

type Chunk ¶

type Chunk struct {
	Begin Offset
	End   Offset
}

Chunk is a region of a BGZF file.

type Offset ¶

type Offset struct {
	File  int64
	Block uint16
}

Offset is a BGZF virtual offset.

type Reader ¶

type Reader struct {
	gzip.Header

	// Blocked specifies the behaviour of the
	// Reader at the end of a BGZF member.
	// If the Reader is Blocked, a Read that
	// reaches the end of a BGZF block will
	// return io.EOF. This error is not sticky,
	// so a subsequent Read will progress to
	// the next block if it is available.
	Blocked bool
	// contains filtered or unexported fields
}

Reader implements BGZF blocked gzip decompression.

func NewReader ¶

func NewReader(r io.Reader, rd int) (*Reader, error)

NewReader returns a new BGZF reader.

The number of concurrent read decompressors is specified by rd. If rd is 0, GOMAXPROCS concurrent will be created. If rd is 1, blocks will be read synchronously without readahead. The returned Reader should be closed after use to avoid leaking resources.

func (*Reader) Begin ¶

func (bg *Reader) Begin() Tx

Begin returns a Tx that starts at the current virtual offset.

func (*Reader) BlockLen ¶

func (bg *Reader) BlockLen() int

BlockLen returns the number of bytes remaining to be read from the current BGZF block.

func (*Reader) Close ¶

func (bg *Reader) Close() error

Close closes the reader and releases resources.

func (*Reader) LastChunk ¶

func (bg *Reader) LastChunk() Chunk

LastChunk returns the region of the BGZF file read by the last successful read operation or the resulting virtual offset of the last successful seek operation.

func (*Reader) Read ¶

func (bg *Reader) Read(p []byte) (int, error)

Read implements the io.Reader interface.

func (*Reader) ReadByte ¶ added in v1.4.0

func (bg *Reader) ReadByte() (byte, error)

ReadByte implements the io.ByteReader interface.

Example ¶

package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"os"

	"github.com/biogo/hts/bgzf"
)

func main() {
	// Write Tom Sawyer into a bgzf buffer.
	var buf bytes.Buffer
	w := bgzf.NewWriter(&buf, 1)
	f, err := os.Open("testdata/Mark.Twain-Tom.Sawyer.txt")
	if err != nil {
		log.Fatalf("failed to open file: %v", err)
	}
	defer f.Close()
	_, err = io.Copy(w, f)
	if err != nil {
		log.Fatalf("failed to copy file: %v", err)
	}
	err = w.Close()
	if err != nil {
		log.Fatalf("failed to close bgzf writer: %v", err)
	}

	// The text to search for.
	const line = `"It ain't any use, Huck, we're wrong again."`

	// Read the data until the line is found and output the line
	// number and bgzf.Chunk corresponding to the lines position
	// in the compressed data.
	r, err := bgzf.NewReader(&buf, 1)
	if err != nil {
		log.Fatal(err)
	}
	var n int
	for {
		n++
		b, chunk, err := readLine(r)
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}
		// Make sure we trim the trailing newline.
		if bytes.Equal(bytes.TrimSpace(b), []byte(line)) {
			fmt.Printf("line:%d chunk:%+v\n", n, chunk)
			break
		}
	}

}

// readLine returns a line terminated by a '\n' and the bgzf.Chunk that contains
// the line, including the newline character. If the end of file is reached before
// a newline, the unterminated line and corresponding chunk are returned.
func readLine(r *bgzf.Reader) ([]byte, bgzf.Chunk, error) {
	tx := r.Begin()
	var (
		data []byte
		b    byte
		err  error
	)
	for {
		b, err = r.ReadByte()
		if err != nil {
			break
		}
		data = append(data, b)
		if b == '\n' {
			break
		}
	}
	chunk := tx.End()
	return data, chunk, err
}

Output:


line:5986 chunk:{Begin:{File:112534 Block:11772} End:{File:112534 Block:11818}}

func (*Reader) Seek ¶

func (bg *Reader) Seek(off Offset) error

Seek performs a seek operation to the given virtual offset.

func (*Reader) SetCache ¶

func (bg *Reader) SetCache(c Cache)

SetCache sets the cache to be used by the Reader.

type Tx ¶

type Tx struct {
	// contains filtered or unexported fields
}

Tx represents a multi-read transaction.

func (*Tx) End ¶

func (t *Tx) End() Chunk

End returns the Chunk spanning the transaction. After return the Tx is no longer valid.

type Wrapper ¶

type Wrapper interface {
	Wrap(Block) Block
}

Wrapper defines Cache types that need to modify a Block at its creation.

type Writer ¶

type Writer struct {
	gzip.Header
	// contains filtered or unexported fields
}

Writer implements BGZF blocked gzip compression.

Because the SAM specification requires that the RFC1952 FLG header field be set to 0x04, a Writer's Name and Comment fields should not be set if its output is to be read by another BGZF decompressor implementation.

func NewWriter ¶

func NewWriter(w io.Writer, wc int) *Writer

NewWriter returns a new Writer. Writes to the returned writer are compressed and written to w.

The number of concurrent write compressors is specified by wc.

func NewWriterLevel ¶

func NewWriterLevel(w io.Writer, level, wc int) (*Writer, error)

NewWriterLevel returns a new Writer using the specified compression level instead of gzip.DefaultCompression. Allowable level options are integer values between between gzip.BestSpeed and gzip.BestCompression inclusive.

The number of concurrent write compressors is specified by wc.

func (*Writer) Close ¶

func (bg *Writer) Close() error

Close closes the Writer, waiting for any pending writes before returning the final error of the Writer.

func (*Writer) Error ¶

func (bg *Writer) Error() error

Error returns the error state of the Writer.

func (*Writer) Flush ¶

func (bg *Writer) Flush() error

Flush writes unwritten data to the underlying io.Writer. Flush does not block.

func (*Writer) Next ¶

func (bg *Writer) Next() (int, error)

Next returns the index of the start of the next write within the decompressed data block.

func (*Writer) Wait ¶

func (bg *Writer) Wait() error

Wait waits for all pending writes to complete and returns the subsequent error state of the Writer.

func (*Writer) Write ¶

func (bg *Writer) Write(b []byte) (int, error)

Write writes the compressed form of b to the underlying io.Writer. Decompressed data blocks are limited to BlockSize, so individual byte slices may span block boundaries, however the Writer attempts to keep each write within a single data block.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cache Package cache provides basic block cache types for the bgzf package.	Package cache provides basic block cache types for the bgzf package.
index Package index provides common code for CSI and tabix BGZF indexing.	Package index provides common code for CSI and tabix BGZF indexing.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL