pamutil

package

v0.0.0-...-d966d87 Latest Latest Go to latest Published: Aug 18, 2020 License: Apache-2.0 Imports: 15 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/grailbio/bio

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func BlockIntersectsRange(startAddr, endAddr biopb.Coord, userRange biopb.CoordRange) bool
func CoordPathString(r biopb.Coord) string
func CoordRangePathString(r biopb.CoordRange) string
func FieldDataPath(dir string, recRange biopb.CoordRange, field string) string
func GenerateReadShards(opts GenerateReadShardsOpts, indexes []ShardIndex) ([]biopb.CoordRange, error)
func NewShardIndex(shardRange biopb.CoordRange, h *sam.Header) biopb.PAMShardIndex
func ReadShardIndex(ctx context.Context, dir string, recRange biopb.CoordRange) (index biopb.PAMShardIndex, err error)
func Remove(dir string) error
func ShardIndexPath(dir string, recRange biopb.CoordRange) string
func ValidateCoordRange(r *biopb.CoordRange) error
func WriteShardIndex(ctx context.Context, dir string, coordRange biopb.CoordRange, ...) error
type FileInfo
type FileType
type GenerateReadShardsOpts
type ShardIndex
- func ReadIndexes(ctx context.Context, path string, rng biopb.CoordRange, fields []string) ([]ShardIndex, error)

Constants ¶

View Source

const DefaultVersion = "PAM2"

DefaultVersion is the string embedded in ShardIndex.version.

View Source

const ShardIndexMagic = uint64(0x725c7226be794c60)

ShardIndexMagic is the value of ShardIndex.Magic.

Variables ¶

This section is empty.

Functions ¶

func BlockIntersectsRange ¶

func BlockIntersectsRange(startAddr, endAddr biopb.Coord, userRange biopb.CoordRange) bool

BlockIntersectsRange checks if userRange and [startAddr, endAddr] intersect.

func CoordPathString ¶

func CoordPathString(r biopb.Coord) string

CoordPathString generates a string that can be used to embed in a pathname. Use ParsePath() to parse such a string.

func CoordRangePathString ¶

func CoordRangePathString(r biopb.CoordRange) string

CoordRangePathString returns a string that can be used as part of a pathname.

func FieldDataPath ¶

func FieldDataPath(dir string, recRange biopb.CoordRange, field string) string

FieldDataPath returns the path of the file storing data for the given record range and the field.

func GenerateReadShards ¶

func GenerateReadShards(
	opts GenerateReadShardsOpts,
	indexes []ShardIndex) ([]biopb.CoordRange, error)

GenerateReadShards returns a list of biopb.CoordRanges. The biopb.CoordRanges can be passed to NewReader for parallel, sharded record reads. The returned list satisfies the following conditions.

The ranges in the list fill opts.Range (or the UniversalRange if not set) exactly, without an overlap or a gap.

Length of the list is at least nShards. The length may exceed nShards because this function tries to split a range at a rowshard boundary.

3. The bytesize of the file region(s) that covers each biopb.CoordRange is roughly the same.

4. The ranges are sorted in an increasing order of biopb.Coord.

opts.NumShards specifies the number of shards. It should be generally be zero, in which case the function picks an appropriate default.

func NewShardIndex ¶

func NewShardIndex(shardRange biopb.CoordRange, h *sam.Header) biopb.PAMShardIndex

NewShardIndex creates a new PAMShardIndex object with the given arguments.

func ReadShardIndex ¶

func ReadShardIndex(ctx context.Context, dir string, recRange biopb.CoordRange) (index biopb.PAMShardIndex, err error)

ReadShardIndex reads the index file, "dir/<recRange>.index".

func Remove ¶

func Remove(dir string) error

Remove deletes the files in the given PAM directory. It returns an error if some of the existing files fails to delete.

func ShardIndexPath ¶

func ShardIndexPath(dir string, recRange biopb.CoordRange) string

ShardIndexPath returns the path of shard index file.

func ValidateCoordRange ¶

func ValidateCoordRange(r *biopb.CoordRange) error

ValidateCoordRange validates "r" and normalize its fields, if necessary. In particular, if the range fields are all zeros, the range is replaced by UniversalRange.

func WriteShardIndex ¶

func WriteShardIndex(ctx context.Context, dir string, coordRange biopb.CoordRange, msg *biopb.PAMShardIndex) error

WriteShardIndex serializes "msg" into a single-block recordio file "dir/<coordRange>.index". Existing contents of the file is clobbered.

Types ¶

type FileInfo ¶

type FileInfo struct {
	// Path is the value passed to ParsePath.
	Path string

	// FileType is the type of the file. For "dir/0:0,46:1653469.mapq", the type
	// is FileTypeFieldData. For "dir/0:0,46:1653469.mapq", the type is
	// FileTypeFieldIndex.
	Type FileType

	// Field stores the field part of the filename. Field=="mapq" if the pathname
	// is "dir/0:0,46:1653469.mapq". It is meaningful iff Type ==
	// FileTypeFieldData.
	Field string

	// Dir is the directory under which the file is stored. Dir="dir" if the
	// pathname is "dir/0:0,46:1653469.mapq".
	Dir string
	// Range is the record range that the file stores. Range={Start:{0,0},
	// Limit:{46,1653469}} if the pathname is "dir/0:0,46:1653469.mapq".
	Range biopb.CoordRange
}

FileInfo is the result of parsing a pathname.

A PAM pathname looks like "dir/0:0,46:1653469.mapq" or "dir/0:0,46:1653469.index".

func ChooseIndexFilesInRange ¶

func ChooseIndexFilesInRange(allIndexFiles []FileInfo, recRange biopb.CoordRange) ([]FileInfo, error)

ChooseIndexFilesInRange returns the subset of allIndexFiles that overlap recRange. REQUIRES: allIndexFiles[i].Type == FileTypeShardIndex for all i.

func FindIndexFilesInRange ¶

func FindIndexFilesInRange(ctx context.Context, dir string, recRange biopb.CoordRange) ([]FileInfo, error)

FindIndexFilesInRange lists all *.index files that store a record that intersects "recRange".

func ListIndexes ¶

func ListIndexes(ctx context.Context, dir string) ([]FileInfo, error)

ListIndexes lists shard index files found for the given PAM files. The returned list will be sorted by positions.

func ParsePath ¶

func ParsePath(path string) (FileInfo, error)

ParsePath parses a PAM path into constituent parts. For example, ParsePath("foo:0:1,3:4.index") will result in FileInfo{Path: "foo", Type: FileTypeIndex, Prefix: "foo", Range: {biopb.Coord{0,1,0}, biopb.Coord{3,4,0}}}.

type FileType ¶

type FileType int

FileType defines the type of the file, either data or index.

const (
	// FileTypeUnknown is a sentinel
	FileTypeUnknown FileType = iota
	// FileTypeShardIndex represents a *.index file
	FileTypeShardIndex
	// FileTypeFieldData represents a *.<fieldname> file
	FileTypeFieldData
)

type GenerateReadShardsOpts ¶

type GenerateReadShardsOpts struct {
	// Range defines an optional row shard range. Only records in this range will
	// be returned by Scan() and Read(). If Range is unset, the universal range is
	// assumed. See also ReadOpts.Range.
	Range biopb.CoordRange

	// SplitMappedCoords allows GenerateReadShards to split mapped reads of
	// the same <refid, alignment position> into multiple shards. Setting
	// this flag true will cause shard size to be more even, but the caller
	// must be able to handle split reads.
	SplitMappedCoords bool
	// SplitUnmappedCoords allows GenerateReadShards to split unmapped
	// reads into multiple shards. Setting this flag true will cause shard
	// size to be more even, but the caller must be able to handle split
	// unmapped reads.
	SplitUnmappedCoords bool
	// CombineMappedAndUnmappedCoords allows creating a shard that contains both
	// mapped and unmapped reads. If this flag is false, shards are always split
	// at the start of unmapped reads.
	AlwaysSplitMappedAndUnmappedCoords bool

	// BytesPerShard is the target shard size, in bytes across all fields.  If
	// this field is set, NumShards is ignored.
	BytesPerShard int64
	// NumShards specifies the number of shards to create. This field is ignored
	// if BytePerShard>0. If neither BytesPerShard nor NumShards is set,
	// runtime.NumCPU()*4 shards will be created.
	NumShards int
}

GenerateReadShardsOpts defines options to GenerateReadShards.

type ShardIndex ¶

type ShardIndex struct {
	// Range is the coordinate range that this object represents. Records and indexes from the
	// source PAM that don't intersect this range were ignored.
	Range biopb.CoordRange
	// ApproxFileBytes is an estimate of the total file size of records in Range (in the
	// underlying PAM)
	ApproxFileBytes int64
	// Blocks is a sequence of index entries from one PAM field that span Range.
	Blocks []biopb.PAMBlockIndexEntry
}

ShardIndex is data derived from one PAM file index information used by the sharder.

func ReadIndexes ¶

func ReadIndexes(ctx context.Context, path string, rng biopb.CoordRange, fields []string) ([]ShardIndex, error)

ReadIndexes reads the ShardIndexes for the PAM file at path, within rng. If the PAM contains no records in rng, returns an empty slice.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL