Documentation ¶
Overview ¶
Package vcf provides an API for parsing genomic data compliant with the Variant Call Format 4.2 Specification
This API is built with channels, assuming asynchronous computation. Variants parsed successfully are sent immediately to the consumer of the API through a channel, as well as variants that fail to be processed.
Example ¶
Channels should be initialized and passed to the ToChannel function. The client should not close the channels This will happen inside ToChannel, when the input is exhausted.
validVariants := make(chan *Variant, 100) // buffered channel for correctly parsed variants invalidVariants := make(chan InvalidLine, 100) // buffered channel for variants that fail to parse filename := "example_vcfs/test.vcf" vcfFile, err := os.Open(filename) if err != nil { log.Fatalln("can't open file", filename) } defer vcfFile.Close() go func() { err := ToChannel(vcfFile, validVariants, invalidVariants) if err != nil { log.Fatalln(err) } }() go func() { // consume invalid variants channel asynchronously for invalid := range invalidVariants { fmt.Println("failed to parse line", invalid.Line, "with error", invalid.Err) } }() for variant := range validVariants { fmt.Println(variant) if variant.Qual != nil { fmt.Println("Quality:", *variant.Qual) } fmt.Println("Filter:", variant.Filter) fmt.Println("Allele Count:", *variant.AlleleCount) fmt.Println("Allele Frequency:", *variant.AlleleFrequency) fmt.Println("Total Alleles:", *variant.TotalAlleles) fmt.Println("Depth:", *variant.Depth) fmt.Println("Mapping Quality:", *variant.MappingQuality) fmt.Println("MAPQ0 Reads:", *variant.MAPQ0Reads) rawInfo := variant.Info vqslod := rawInfo["VQSLOD"] fmt.Println("VQSLOD:", vqslod) }
Output: Chromosome: 1 Position: 762588 Reference: G Alternative: C Quality: 40 Filter: PASS Allele Count: 2 Allele Frequency: 1 Total Alleles: 2 Depth: 5 Mapping Quality: 43.32 MAPQ0 Reads: 0 VQSLOD: 1.18
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SampleIDs ¶
SampleIDs reads a vcf header from an io.Reader and returns a slice with all the sample IDs contained in that header. If there are no samples on the header, a nil slice is returned
Example ¶
filename := "example_vcfs/testsamples.vcf" vcfFile, err := os.Open(filename) if err != nil { log.Fatalln("can't open file", filename) } defer vcfFile.Close() sampleIDs, err := SampleIDs(vcfFile) if err == nil && sampleIDs != nil { for i, sample := range sampleIDs { fmt.Printf("sample %d: %s\n", i, sample) } }
Output: sample 0: 111222
func ToChannel ¶
func ToChannel(reader io.Reader, output chan<- *Variant, invalids chan<- InvalidLine) error
ToChannel reads from an io.Reader and puts all variants into an already initialized channel. Variants whose parsing fails go into a specific channel for failing variants. If any of the two channels are full, ToChannel will block. The consumer must guarantee there is enough buffer space on the channels. Both channels are closed when the reader is fully scanned.
Types ¶
type InvalidLine ¶
InvalidLine represents a VCF line that could not be parsed. It encapsulates the problematic line with its corresponding error.
type Variant ¶
type Variant struct { // Required fields Chrom string Pos int Ref string Alt string ID string // Qual is a pointer so that it can be set to nil when it is a dot '.' Qual *float64 Filter string // Info is a map containing all the keys present in the INFO field, with their corresponding value. // For keys without corresponding values, the value is a `true` bool. // No attempt at parsing is made on this field, data is raw. // The only exception is for multiple alternatives data. These are reported separately for each variant. Info map[string]interface{} // Genotype fields for each sample Samples []map[string]string // Optional info fields. These are the reserved fields listed on the VCF 4.2 spec, session 1.4.1, number 8. // The parsing is lenient, if the fields do not conform to the expected type listed here, they will be set to nil. // The fields are meant as helpers for common scenarios, since the generic usage is covered by the Info map. // Definitions used in the metadata section of the header are not used. AncestralAllele *string Depth *int AlleleFrequency *float64 AlleleCount *int TotalAlleles *int End *int MAPQ0Reads *int NumberOfSamples *int MappingQuality *float64 Cigar *string InDBSNP *bool InHapmap2 *bool InHapmap3 *bool IsSomatic *bool IsValidated *bool In1000G *bool BaseQuality *float64 StrandBias *float64 // Structural variants Imprecise *bool Novel *bool StructuralVariantType *SVType StructuralVariantLength *int ConfidenceIntervalAroundPosition *int ConfidenceIntervalAroundEnd *int }
Variant is a struct representing the fields specified in the VCF 4.2 spec.
When the variant is generated through the API of the vcf package, the required fields are guaranteed to be valid, otherwise the parsing for the variant fails and is reported.
Multiple alternatives are parsed as separated instances of the type Variant. All other fields are optional and will not cause parsing fails if missing or non-conformant.