popgen

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: BSD-3-Clause Imports: 19 Imported by: 0

Documentation

Overview

Package popgen contains functions for analyzing population genetic statistics from variation or alignment data.

Package popgen contains tools for population genetic analysis, specifically for selection and population structure.

Index

Constants

View Source
const IntegralBound float64 = 1e-12

Variables

This section is empty.

Functions

func AfsDivergenceAscertainmentFixedAlpha

func AfsDivergenceAscertainmentFixedAlpha(afs Afs, alpha float64, binomMap [][]float64, d int, integralError float64) float64

AfsDivergenceAscertainmentFixedAlpha returns the likelihood of observing a particular derived allele frequency spectrum for a given selection parameter alpha. This is the special case where every segregating site has the same value for selection. Also applies a correction for divergence-based ascertainment bias.

func AfsDivergenceAscertainmentFixedAlphaClosure

func AfsDivergenceAscertainmentFixedAlphaClosure(afs Afs, binomMap [][]float64, d int, integralError float64) func(float64) float64

AfsDivergenceAscertainmentFixedAlphaClosure returns a func(float64) float64 representing the likelihood function for a specific derived allele frequency spectrum with a single selection parameter alpha. Incorporates a site-by-site correction for divergence-based ascertainment bias.

func AfsDivergenceAscertainmentLikelihood

func AfsDivergenceAscertainmentLikelihood(afs Afs, alpha []float64, binomMap [][]float64, d int, integralError float64) float64

AfsDivergenceAscertainmentLikelihood is like AfsLikelihood, but makes a correction for divergence-based ascertainment when variant sets were selected for divergence or identity between two groups of d individuals.

func AfsLikelihood

func AfsLikelihood(afs Afs, alpha []float64, binomMap [][]float64, integralError float64) float64

AfsLikelihood returns P(Data|alpha), or the likelihood of observing a particular allele frequency spectrum given alpha, a vector of selection parameters.

func AfsLikelihoodFixedAlpha

func AfsLikelihoodFixedAlpha(afs Afs, alpha float64, binomMap [][]float64, integralError float64) float64

AfsLikelihoodFixedAlpha calculates the likelihood of observing a particular frequency spectrum for a given alpha, a selection parameter. This represents the special case where every segregating site has the same value for selection, which enables faster, more simplified calculation.

func AfsLikelihoodFixedAlphaClosure

func AfsLikelihoodFixedAlphaClosure(afs Afs, binomMap [][]float64, integralError float64) func(float64) float64

AfsLikelihoodFixedAlphaClosure returns a func(float64) float64 representing the likelihood function for a specific derived allele frequency spectrum with a single selection parameter alpha.

func AfsSampleDensity

func AfsSampleDensity(n int, k int, alpha float64, binomMap [][]float64, integralError float64) float64

AfsSampleDensity (also referred to as the F function) is the product of the stationarity and binomial distributions integrated over p, the allele frequency.

func AfsStationarity

func AfsStationarity(p float64, alpha float64) float64

AfsStationarity returns the function value from a stationarity distribution with selection parameter alpha from a particular input allele frequency p.

func AfsStationarityClosure

func AfsStationarityClosure(alpha float64) func(float64) float64

AfsStationarityClosure returns a func(float64)float64 for a stationarity distribution with a fixed alpha value for subsequent integration.

func AfsToFrequency

func AfsToFrequency(a Afs) []float64

AfsToFrequency converts an allele frequency spectrum into allele frequencies. Useful for constructing subsequent AFS histograms.

func AlleleFrequencyProbability

func AlleleFrequencyProbability(i int, n int, alpha float64, binomMap [][]float64, integralError float64) float64

AlleleFrequencyProbability returns the probability of observing i out of n alleles from a stationarity distribution with selection parameter alpha.

func AlleleFrequencyProbabilityAncestralAscertainment

func AlleleFrequencyProbabilityAncestralAscertainment(alpha float64, i int, n int, d int, binomCache [][]float64, integralError float64) float64

AlleleFrequencyProbabilityAncestralAscertainment returns P(i | Asc, alpha) when the variant set has an ancestral allele ascertainment bias.

func AlleleFrequencyProbabilityDerivedAscertainment

func AlleleFrequencyProbabilityDerivedAscertainment(alpha float64, i int, n int, d int, binomCache [][]float64, integralError float64) float64

AlleleFrequencyProbabilityDerivedAscertainment returns P(i | Asc, alpha) when the variant set has a derived allele ascertainment bias.

func AncestralAscertainmentDenominator

func AncestralAscertainmentDenominator(fCache []float64, fCacheSum float64, d int) float64

AncestralAscertainmentDenominator is P(Asc | alpha), a constant normalizing factor for P(i | n, alpha) with the ancestral ascertainment correction.

func AncestralAscertainmentProbability

func AncestralAscertainmentProbability(n int, i int, d int) float64

AncestralAscertainmentProbability returns P(Asc | i) for ancestral allele ascertainment corrections.

func BuildBinomCache

func BuildBinomCache(allN []int) [][]float64

BuildBinomCache makes a 2D matrix where each entry binomCache[n][k] is equal to [n choose k] in logSpace.

func BuildFCache

func BuildFCache(n int, alpha float64, binomCache [][]float64, integralError float64) []float64

BuildFCache builds a slice of len(n) where each index i contains log(F(i | n, alpha)), where F is popgen.AFSSampleDensity.

func BuildLikelihoodCache

func BuildLikelihoodCache(allN []int) [][]float64

BuildLikelihoodCache constructs a cache of likelihood values for MLE. likelihoodCache[n][i] stores the likelihood for a segregating site of n alleles with i in the derived state for a particular selection parameter alpha.

func DerivedAscertainmentDenominator

func DerivedAscertainmentDenominator(fCache []float64, fCacheSum float64, d int) float64

DerivedAscertainmentDenominator is P(Asc | alpha), a constant normalizing factor for P(i | n, alpha) with the derived ascertainment correction.

func DerivedAscertainmentProbability

func DerivedAscertainmentProbability(n int, i int, d int) float64

DerivedAscertainmentProbability returns P(Asc | i) for derived allele ascertainment corrections.

func Dunn

func Dunn(b bed.Bed, aln []fasta.Fasta, g []*Group, realign bool) (float64, int, string)

Dunn takes a region of the genome, a multiple alignment, a set of groups, and an option to realign the bed region of the alignment before calculating the Dunn Index. The returns are the dunn index as a float64, the number of segregating sites considered, and missing group members as a string. Mathematical details of the Dunn Index are described at https://en.wikipedia.org/wiki/Dunn_index.

func FIntegralComponent

func FIntegralComponent(n int, k int, alpha float64, binomMap [][]float64) func(float64) float64

FIntegralComponent is a helper function of AfsSampleDensity and represents the component within the integral.

func FilterMultByGroup

func FilterMultByGroup(aln []fasta.Fasta, g []*Group) []fasta.Fasta

FilterMultByGroup takes in a multiFa alignment returns a multiFa containing only the entries that are contained in an input slice of Group structs.

func FindMissingGroupMembers

func FindMissingGroupMembers(aln []fasta.Fasta, g []*Group) string

FindMissingGroupMembers returns a string of all of the entries in a Group slice that are not contained in the names of a multiFa alignment.

func GetFCacheSum

func GetFCacheSum(fCache []float64) float64

GetFCacheSum uses the fCAche built in BuildFCache and calculates the sum from j=1 to n-1.

func GroupCompare

func GroupCompare(a *Group, b *Group) int

GroupCompare compares two Groups a and b for sorting or equality testing.

func GroupContains

func GroupContains(g *Group, s string) bool

GroupContains returns true if an input string s is contained within the members of group g, false otherwise.

func GroupListsAreEqual

func GroupListsAreEqual(a []*Group, b []*Group) bool

GroupListsAreEqual returns true if all the groups in a list of groups are equal to another list of groups, false otherwise.

func GroupListsCompare

func GroupListsCompare(a []*Group, b []*Group) int

GroupListsCompare compares two slices of Groups a and b for sorting and equality testing.

func GroupsContains

func GroupsContains(g []*Group, s string) bool

GroupsContains returns true if any groups within a slice of groups g contains a string s, false otherwise.

func InvertSegSite

func InvertSegSite(s *SegSite)

InvertSegSite reverses the polarity of a segregating site in place.

func MetropolisAccept

func MetropolisAccept(old Theta, thetaPrime Theta, s McmcSettings) bool

MetropolisAccept is a helper function of MetropolisHastings that determines whether to accept or reject a candidate parameter set.

func MetropolisHastings

func MetropolisHastings(data Afs, outFile string, s McmcSettings)

MetropolisHastings implements the MH algorithm for Markov Chain Monte Carlo approximation of the posterior distribution for selection based on an input allele frequency spectrum. muZero and sigmaZero represent the starting hyperparameter values.

func PlotAfsF

func PlotAfsF(alpha float64, n int, outFile string, integralError float64)

PlotAfsF writes the Allele Frequency F function (AfsSampleDensity) to an output file for downstream visualization.

func PlotAfsLikelihood

func PlotAfsLikelihood(afs Afs, outFile string, leftBound float64, rightBound float64, numPoints int, integralError float64, divergenceAscertainment bool, D int)

PlotAfsLikelihood plots the likelihood function from input Afs data. Specifies a left and right bound for plotting and a number of points to be plotted. Optiosn available for applying the divergence based ascertainment bias correction, changing the ascertainment subset size, and changing the integral error. This program will fatal if it is asked to plot the likelihood when alpha is equal to exactly 0.

func PlotAfsPmf

func PlotAfsPmf(alpha float64, n int, outFile string, integralError float64, derived bool, ancestral bool)

PlotAfsPmf writes the allele frequency probability mass function (AlleleFrequencyProbability) to an output file for downstream visualization. Derived or ancestral flag enable visualization of the AfsPmf with the divergence-based ascertainment correction.

func PlotAncestralAscertainmentDenominator

func PlotAncestralAscertainmentDenominator(outFile string, n int, d int, alpha float64, integralError float64)

PlotAncestralAscertainmentDenominator takes the path to an output file and prints the values of AncestralAscertainmentDenominator to the file for the parameters provided.

func PlotAncestralAscertainmentProbability

func PlotAncestralAscertainmentProbability(outFile string, n int, d int)

PlotAncestralAscertainmentProbability takes the path to an output file and prints the values of AncestralAscertainmentProbability(n, i, d) for all values of i to the file.

func PlotDerivedAscertainmentDenominator

func PlotDerivedAscertainmentDenominator(outFile string, n int, d int, alpha float64, integralError float64)

PlotDerivedAscertainmentDenominator takes the path to an output file and prints the values of DerivedAscertainmentDenominator to the file, given the input parameters.

func PlotDerivedAscertainmentProbability

func PlotDerivedAscertainmentProbability(outFile string, n int, d int)

PlotDerivedAscertainmentProbability takes the path to an output file and prints the values of DerivedAscertainmentProbability(n, i, d) for all values of i to the file.

func PosteriorOdds

func PosteriorOdds(old Theta, thetaPrime Theta) float64

PosteriorOdds is a helper function of MetropolisAccept that returns the Bayes Factor times the Prior Odds this should be the probability of accepting (can be greater than 1) if the Hastings Ratio is one.

func SegSiteToAlleleArray

func SegSiteToAlleleArray(s *SegSite) []int16

SegSiteToAlleleArray is a helper function of SimulateGenotype that takes a SegSite, constructs and array of values with i values set to 1 and n-i values set to 0. The array is then shuffled and returned.

func SelectionMaximumLikelihoodEstimate

func SelectionMaximumLikelihoodEstimate(data Afs, s MleSettings) float64

SelectionMaximumLikelihoodEstimate performs MLE on an input allele frequency spectrum and writes the result to an output file.

func SimulateGenotype

func SimulateGenotype(alpha float64, n int, boundAlpha float64, boundBeta float64, boundMultiplier float64) ([]vcf.Sample, bool)

SimulateGenotype returns a slice of type vcf.GenomeSample, representing a Sample field of a vcf struct, with an allele frequency drawn from a stationarity distribution with selection parameter alpha. Second return is true if the current genotype is a divergent base.

func StationaritySampler

func StationaritySampler(alpha float64, samples int, maxSampleDepth int, bins int, xLeft float64, xRight float64) []float64

StationaritySampler returns an allele frequency i out of n individuals sampled from a stationarity distribution with selection parameter alpha.

func VcfSampleDerivedAlleleFrequency

func VcfSampleDerivedAlleleFrequency(v vcf.Vcf) float64

VcfSampleDerivedAlleleFrequency returns the derived allele frequency based on the sample columns of a VCF variant.

func WriteTSV

func WriteTSV(outFile string, wf WrightFisherPopData)

WriteTSV writes a .tsv file based on WrightFisherPopData, including: Comments starts with # that include metadata about the parameters of the simulation Header line: Gen Site Freq.A Freq.C Freq.G Freq.T Ancestral Frequencies table.

Types

type Afs

type Afs struct {
	Sites []*SegSite
}

Afs is a struct that contains an Allele Frequency Spectrum, or a set of segregating sites.

func MultiFaToAfs

func MultiFaToAfs(aln []fasta.Fasta) Afs

MultiFaToAfs constructs an allele frequency spectrum struct from a multiFa alignment block.

func VcfToAfs

func VcfToAfs(filename string, UnPolarized bool, DivergenceAscertainment bool, IncludeRef bool) (*Afs, error)

VcfToAfs reads in a vcf file, parses the genotype information, and constructs an AFS struct. Polarized flag, when true, returns only variants with the ancestor annotated in terms of polarized, derived allele frequencies. IncludeRef, when true, uses the reference genome as an additional data point for the output AFS.

type Group

type Group struct {
	Name    string
	Members []string
}

Group is a struct with associated functions that provide a system to partition a list of samples or species for subsequent analysis of subsections of the overall data.

func ReadGroups

func ReadGroups(filename string) []*Group

ReadGroups parses a Group format file into a slice of Group structs.

type LikelihoodFunction

type LikelihoodFunction byte
const (
	Uncorrected LikelihoodFunction = iota
	Ancestral
	Derived
)

type McmcSettings

type McmcSettings struct {
	Iterations              int
	MuStep                  float64
	MuZero                  float64
	SigmaStep               float64
	SigmaZero               float64
	RandSeed                bool  //pseudorandom number generation control: set the seed as current unix time
	SetSeed                 int64 //pseudorandom number generation control: set the seed to a user-specified int64.
	UnPolarized             bool
	DivergenceAscertainment bool
	FixedSigma              bool
	D                       int //D is the size of the ascertainment subset.
	IntegralError           float64
	Verbose                 int
	SigmaPriorAlpha         float64 //defines the alpha parameter for the prior distribution for the Theta hyperparameter sigma.
	SigmaPriorBeta          float64 //defines the beta parameter for the prior distribution for the Theta hyperparameter sigma.
	MuPriorMean             float64 //defines the mean of the prior distribution for the Theta hyperparameter mu.
	MuPriorSigma            float64 //defines the standard deviation of the prior distribution for the Theta hyperparameter mu.
	IncludeRef              bool    //If true, includes the reference genome allele state as a datapoint in the allele frequency spectrum.
}

McmcSettings is a struct that stores settings for the various Mcmc helper functions.`.

type MleSettings

type MleSettings struct {
	Left                    float64 //left bound of MLE search space
	Right                   float64 //right bound of MLE search space
	Error                   float64 //desired accuracy in output maximum likelihood estimate
	UnPolarized             bool    //disable the need for ancestor annotation in the vcf file. Assumes the reference allele is in the ancestral state. Use with caution.
	DivergenceAscertainment bool    //if true, use the divergence-based ascertainment bias-corrected likelihood function for selection.
	D                       int     //for DivergenceAscertainment, set the size of the ascertainment subset.
	IntegralError           float64 //set the acceptable error in the internal integral calculations in the likelihood function.
	Verbose                 int     //default 0. When set to 1, debug prints appear in standard output.
	IncludeRef              bool    //Includes the reference genome allele state in the derived allele frequency spectrum.
}

MleSettings delineates the experimental parameters for running maximum likelihood estimation on a set of variants using selectionMLE.

type SegSite

type SegSite struct {
	I int                //individuals with the allele
	N int                //total number of individuals
	L LikelihoodFunction //specifies with likelihood function to use. 1 for ancestral, 0 for uncorrected, 2 for derived.
}

SegSite is the basic struct for segregating sites, used to construct allele frequency spectra.

func SimulateSegSite

func SimulateSegSite(alpha float64, n int, boundAlpha float64, boundBeta float64, boundMultiplier float64) (*SegSite, bool)

SimulateSegSite returns a segregating site with a non-zero allele frequency sampled from a stationarity distribution with selection parameter alpha. the second returns true if the site is divergent.

func VcfSampleToSegSite

func VcfSampleToSegSite(i vcf.Vcf, DivergenceAscertainment bool, UnPolarized bool, IncludeRef bool) (*SegSite, bool)

VcfSampleToSegSite returns a SegSite struct from an input Vcf entry. Enables flag for divergenceBasedAscertainment correction conditions and the unPolarized condition. Two returns: a pointer to the SegSite struct, and a bool that is true if the SegSite was made without issue, false for soft errors. If IncludeRef is true, adds the reference allele as an extra data point to the SegSite.

type Theta

type Theta struct {
	// contains filtered or unexported fields
}

Theta is a struct that stores parameter sets, including the alpha vector, mu, and sigma parameters, along with the likelihood of a particular parameter set for MCMC.

func GenerateCandidateThetaPrime

func GenerateCandidateThetaPrime(t Theta, data Afs, binomCache [][]float64, s McmcSettings) Theta

GenerateCandidateThetaPrime is a helper function of Metropolis Hastings that picks a new set of parameters based on the state of the current parameter set t. TODO: We could avoid some memory allocations by passing in an "old" theta and overwriting the values.

func InitializeTheta

func InitializeTheta(m float64, sig float64, data Afs, binomCache [][]float64, s McmcSettings) Theta

InitializeTheta is a helper function of Metropolis Hastings that generates the initial value of theta based on argument values.

type WrightFisherPopData

type WrightFisherPopData struct {
	Fasta     []fasta.Fasta
	Vcf       vcf.Vcf
	Meta      []string
	Freq      [][][]float64
	Ancestral []string //Ancestral states of each allele on each site (aka replicates)
	Settings  WrightFisherSettings
}

type WrightFisherSettings

type WrightFisherSettings struct {
	PopSize         int
	MutRate         float64
	NumGen          int
	GenomeSize      int
	RFitness        float64
	GcContent       float64
	InitFreq        string
	FitnessString   string
	SetSeed         int64
	Verbose         bool
	Fasta           bool
	Vcf             bool
	AncestralAllele string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL