glm

package
v0.0.0-...-ee97d3e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 19, 2021 License: BSD-3-Clause Imports: 12 Imported by: 1

README

glm estimates generalized linear models (GLMs) in Go.

See the examples directory for examples. This package can be used to produce results such as this.

Supported features

  • Estimation via IRLS and gonum optimizers

  • Supports many GLM families, links and variance functions

  • Supports estimation for case-weighted datasets

  • Regularized (ridge/LASSO/elastic net) estimation

  • Offsets

  • Unit tests covering all families with their default links and variance functions, and some of the more common non-canonical links

Missing features

  • Performance assessments

  • Model diagnostics

  • Marginalization

  • Missing data handling

  • GEE

  • Inference for survey data

Documentation

Overview

Package glm implements procedures for fitting generalized linear models (GLMs) in Go.

The data are provided to the models using the dstream package, see http://github.com/kshedden/dstream

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {

	// A logger to which logging information is wreitten
	Log *log.Logger

	// FitMethod is the numerical approach for fitting the model.  Allowed
	// values include IRLS, gradient, and coordinate.
	FitMethod string

	// ConcurrentIRLS is the number of concurrent goroutines used in IRLS
	// fitting.
	ConcurrentIRLS int

	// Start contains starting values for the regression parameter estimates
	Start []float64

	// WeightVar is the name of the variable for frequency-weighting the cases, if an empty
	// string, all weights are equal to 1.
	WeightVar string

	// OffsetVar is the name of a variable providing an offset
	OffsetVar string

	// Family defines a GLMfamily.
	Family *Family

	// Link defines a GLM link function; if not provided the default link for the family is used.
	Link *Link

	// VarFunc defines how the variance relates to the mean; if not provided, the default
	// variance function for the family is used.
	VarFunc *Variance

	// L1Penalty gives the level of penalization for each variable, by name
	L1Penalty map[string]float64

	// L2Penalty gives the level of penalization for each variable, by name
	L2Penalty map[string]float64

	// DispersionForm determines how the dispersion parameter is handled
	DispersionForm DispersionForm
}

Config defines configuration parameters for a GLM.

func DefaultConfig

func DefaultConfig() *Config

DefaultConfig returns default configuration values for a GLM.

type DevianceFunc

type DevianceFunc func([]statmodel.Dtype, []float64, []statmodel.Dtype, float64) float64

DevianceFunc evaluates and returns the deviance for a GLM. The arguments are the data, the mean values, the weights, and the scale parameter. The weights may be nil in which case all weights are taken to be 1.

type DispersionForm

type DispersionForm uint8

DispersionForm indicates an approach for handling the dispersion parameter.

const (
	DispersionFixed DispersionForm
	DispersionFree
	DispersionEstimate
)

DispersionFixed, ... define ways to handle the dispersion parameter in a GLM.

type Family

type Family struct {

	// The name of the family
	Name string

	// The numeric code for the family
	TypeCode FamilyType

	// The log-likelihood function for the family
	LogLike LogLikeFunc

	// The deviance function for the family
	Deviance DevianceFunc
	// contains filtered or unexported fields
}

Family represents a generalized linear model family.

func NewFamily

func NewFamily(fam FamilyType) *Family

NewFamily returns a family object corresponding to the given name. Supported names are binomial, gamma, gaussian, invgaussian, poisson, quasipoisson.

func NewNegBinomFamily

func NewNegBinomFamily(alpha float64, link *Link) *Family

NewNegBinomFamily returns a new family object for the negative binomial family, using the given link function.

func NewTweedieFamily

func NewTweedieFamily(pw float64, link *Link) *Family

NewTweedieFamily returns a new family object for the Tweedie family, using the given variance power and link function. The variance power determines the mean/variance relationship, variance = mean^pw. If link is nil, the canonical link is used, which is a power function with exponent 1 - pw. Passing NewLink(LogLink) as the link gives the log link, which avoids domain violations.

func (fam *Family) IsValidLink(link *Link) bool

IsValidLink returns true or false based on whether the link is valid for the family.

type FamilyType

type FamilyType uint8

FamilyType is the type of GLM family used in a model.

const (
	BinomialFamily FamilyType = iota
	PoissonFamily
	QuasiPoissonFamily
	GaussianFamily
	GammaFamily
	InvGaussianFamily
	NegBinomFamily
	TweedieFamily
)

BinomialFamily, ... are families for a GLM.

type GLM

type GLM struct {
	// contains filtered or unexported fields
}

GLM represents a generalized linear model.

func NewGLM

func NewGLM(data statmodel.Dataset, outcome string, predictors []string, config *Config) (*GLM, error)

NewGLM creates a new generalized linear model for the given family, using its default link and variance functions.

func (*GLM) ConcurrentIRLS

func (model *GLM) ConcurrentIRLS(n int) *GLM

ConcurrentIRLS sets the minimum chunk size for which concurrent calculations are used during IRLS.

func (*GLM) Dataset

func (model *GLM) Dataset() [][]statmodel.Dtype

Dataset returns the data columns that are used to fit the model.

func (*GLM) EstimateScale

func (model *GLM) EstimateScale(params []float64) float64

EstimateScale returns an estimate of the GLM scale parameter at the given parameter values.

func (*GLM) Fit

func (model *GLM) Fit() *GLMResults

Fit estimates the parameters of the GLM and returns a results object. Unregularized fits and fits involving L2 regularization can be obtained, but if L1 regularization is desired use FitRegularized instead of Fit.

func (*GLM) Focus

func (model *GLM) Focus(pos int, coeff []float64, offset []float64) statmodel.RegFitter

Focus returns a new GLM instance with a single variable, which is variable j in the original model. The effects of the remaining covariates are captured through the offset.

func (*GLM) Hessian

func (model *GLM) Hessian(param statmodel.Parameter, ht statmodel.HessType, hess []float64)

Hessian returns the Hessian matrix for the model. The Hessian is returned as a one-dimensional array, which is the vectorized form of the Hessian matrix. Either the observed or expected Hessian can be calculated.

func (*GLM) LinearPredictor

func (model *GLM) LinearPredictor(params *GLMParams, lp []float64) []float64

LinearPredictor returns the linear combination of the model covariates based on the provided parameter vector. The provided slice is used if it is large enough, otherwise a new slice is allocated. The linear predictor is returned.

func (*GLM) LogLike

func (model *GLM) LogLike(params statmodel.Parameter, exact bool) float64

LogLike returns the log-likelihood value for the generalized linear model at the given parameter values. If exact is false, multiplicative factors that are constant with respect to the parameter may be omitted.

func (*GLM) Mean

func (model *GLM) Mean(pa *GLMParams, mn []float64) []float64

Mean returns the fitted mean of the GLM for the given parameter. If the provided slice 'mn' is large enough to hold the result, it is used, otherwise a new slice is allocated. The fitted means are returned.

func (*GLM) NumObs

func (model *GLM) NumObs() int

NumObs returns the number of observations used to fit the model.

func (*GLM) NumParams

func (model *GLM) NumParams() int

NumParams returns the number of covariates in the model.

func (*GLM) OptMethod

func (model *GLM) OptMethod(method optimize.Method) *GLM

OptMethod sets the optimization method from gonum.Optimize.

func (*GLM) OptSettings

func (model *GLM) OptSettings(s *optimize.Settings) *GLM

OptSettings allows the caller to provide an optimization settings value.

func (*GLM) PearsonResid

func (model *GLM) PearsonResid(pa *GLMParams, resid []float64) []float64

PearsonResid calculates the Pearson residuals at the given parameter value. The Pearson residuals are the standardized residuals, using the model standard deviation to standardize. If the provided slice is large enough to hold the result, it is used, otherwise a new slice is allocated. The Pearson standardized residuals are returned.

func (*GLM) Resid

func (model *GLM) Resid(pa *GLMParams, resid []float64) []float64

Resid returns the residuals (observed minus fitted values) for the model, at the given parameter vector.

func (*GLM) Score

func (model *GLM) Score(params statmodel.Parameter, score []float64)

Score evaluates the score function for the GLM at the given parameter values.

func (*GLM) SetFamily

func (model *GLM) SetFamily(fam FamilyType) *GLM

SetFamily is a convenience method that sets the family, link, and variance function based on the given family name. The link and variance functions are set to their canonical values.

func (*GLM) Variance

func (model *GLM) Variance(pa *GLMParams, va []float64) []float64

Variance returns the model-based variance of the GLM responses for the given parameter. If the provided slice is large enough to hold the variances, it is used, otherwise a new slice is allocated. The variances are returned.

func (*GLM) Xpos

func (model *GLM) Xpos() []int

Xpos returns the positions of the covariates in the model's data stream.

type GLMParams

type GLMParams struct {
	// contains filtered or unexported fields
}

GLMParams represents the model parameters for a GLM.

func (*GLMParams) Clone

func (p *GLMParams) Clone() statmodel.Parameter

Clone produces a deep copy of the parameter value.

func (*GLMParams) GetCoeff

func (p *GLMParams) GetCoeff() []float64

GetCoeff returns the coefficients (slopes for individual covariates) from the parameter.

func (*GLMParams) SetCoeff

func (p *GLMParams) SetCoeff(coeff []float64)

SetCoeff sets the coefficients (slopes for individual covariates) for the parameter.

type GLMResults

type GLMResults struct {
	statmodel.BaseResults
	// contains filtered or unexported fields
}

GLMResults describes the results of a fitted generalized linear model.

func (*GLMResults) LinearPredictor

func (rslt *GLMResults) LinearPredictor(lp []float64) []float64

LinearPredictor returns the fitted linear predictor. If the provided slice is large enough, it is used, otherwise a new allocation is made. The fitted linear predictor is returned.

func (*GLMResults) Mean

func (rslt *GLMResults) Mean() []float64

Mean returns the fitted mean of the GLM at the estimated parameters. If the provided slice 'mn' is large enough to hold the result, it is used, otherwise a new allocation is made. The fitted means are returned.

func (*GLMResults) PearsonResid

func (rslt *GLMResults) PearsonResid(resid []float64) []float64

PearsonResid calculates the Pearson residuals at the given parameter value. The Pearson residuals are the standardized residuals, using the model standard deviation to standardize. If the provided slice is large enough to hold the result, it is used, otherwise a new slice is allocated. The Pearson standardized residuals are returned.

func (*GLMResults) Resid

func (rslt *GLMResults) Resid(resid []float64) []float64

Resid returns the residuals (observed minus fitted values) at the fitted parameter value.

func (*GLMResults) Scale

func (rslt *GLMResults) Scale() float64

Scale returns the estimated scale parameter.

func (*GLMResults) Summary

func (rslt *GLMResults) Summary() *GLMSummary

Summary displays a summary table of the model results.

type GLMSummary

type GLMSummary struct {
	// contains filtered or unexported fields
}

GLMSummary summarizes a fitted generalized linear model.

func (*GLMSummary) SetScale

func (gs *GLMSummary) SetScale(xf func(float64) float64, msg string) *GLMSummary

SetScale sets the scale on which the parameter results are displayed in the summary. 'xf' is a function that maps parameters and confidence limits from the linear scale to the desired scale. 'msg' is a message that is appended to the summary table.

func (*GLMSummary) String

func (gs *GLMSummary) String() string

String returns a string representation of a summary table for the model.

type Link struct {
	Name string

	TypeCode LinkType

	// Link calculates the link function (usually mapping the mean
	// value to the linear predictor).
	Link VecFunc

	// InvLink calculates the inverse of the link function
	// (usually mapping the linear predictor to the mean value).
	InvLink VecFunc

	// Deriv calculates the derivative of the link function.
	Deriv VecFunc

	// Deriv2 calculates the second derivative of the link function.
	Deriv2 VecFunc
}

Link specifies a GLM link function.

func NewLink(link LinkType) *Link

NewLink returns a link function object corresponding to the given name. Supported values are log, identity, cloglog, logit, recip, and recipsquared.

func NewPowerLink(pw float64) *Link

NewPowerLink returns the power link eta = mu^pw. If pw = 0 returns the log link.

type LinkType

type LinkType uint8

LinkType is used to specify a GLM link function.

const (
	LogLink LinkType = iota
	IdentityLink
	LogitLink
	CloglogLink
	RecipLink
	RecipSquaredLink
	PowerLink
)

LogLink, ... are used to specify GLM link functions.

type LogLikeFunc

type LogLikeFunc func([]statmodel.Dtype, []float64, []statmodel.Dtype, float64, bool) float64

LogLikeFunc evaluates and returns the log-likelihood for a GLM. The arguments are the data, the mean values, the weights, the scale parameter, and the 'exact flag'. If the exact flag is false, multiplicative factors that are constant with respect to the mean may be omitted. The weights may be nil in which case all weights are taken to be 1.

type NegBinomProfiler

type NegBinomProfiler struct {

	// A sequence of (dispersion, log-likelihood) values that lie on
	// the profile curve.
	Profile [][2]float64
	// contains filtered or unexported fields
}

NegBinomProfiler conducts profile likelihood analyses on a GLM with the negative binomial family.

func NewNegBinomProfiler

func NewNegBinomProfiler(result *GLMResults) *NegBinomProfiler

NewNegBinomProfiler returns a NegBinomProfiler that can be used to profile the dispersion parameter of a negative binomial GLM.

func (*NegBinomProfiler) ConfInt

func (nb *NegBinomProfiler) ConfInt(prob float64) (float64, float64)

ConfInt identifies dispersion parameters disp1, disp2 that define a profile confidence interval for the dispersion parameter. All points on the profile likelihood visited during the search are added to the Profile field of the NegBinomProfiler value.

func (*NegBinomProfiler) DispersionMLE

func (nb *NegBinomProfiler) DispersionMLE() float64

DispersionMLE returns the maximum likelihood estimate of the dispersion parameter.

func (*NegBinomProfiler) LogLike

func (nb *NegBinomProfiler) LogLike(disp float64) float64

LogLike returns the profile log likelihood value at the given dispersion parameter value.

type ScaleProfiler

type ScaleProfiler struct {

	// A sequence of (scale, log-likelihood) values that lie on
	// the profile curve.
	Profile [][2]float64
	// contains filtered or unexported fields
}

ScaleProfiler is used to do likelihood profile analysis on the scale parameter. Set the Results field to a fitted GLMResults value. This is suitable for models with no additional parameters, if there are other parameters (e.g. in the Tweedie or Negative Binomial case), they are held fixed at their values from the provided fit.

func NewScaleProfiler

func NewScaleProfiler(result *GLMResults) *ScaleProfiler

NewScaleProfiler returns a ScaleProfiler value that can be used to profile the scale parameters.

func (*ScaleProfiler) ConfInt

func (ps *ScaleProfiler) ConfInt(prob float64) (float64, float64)

ConfInt identifies scale parameters scale1, scale2 that define a profile confidence interval for the scale parameter. All points on the profile likelihood visited during the search are added to the Profile field of the ScaleProfiler value.

func (*ScaleProfiler) LogLike

func (ps *ScaleProfiler) LogLike(scale float64) float64

LogLike returns the profile log likelihood value at the given scale parameter value.

func (*ScaleProfiler) ScaleMLE

func (ps *ScaleProfiler) ScaleMLE() float64

ScaleMLE returns the maximum likelihood estimate of the scale parameter.

type TweedieProfiler

type TweedieProfiler struct {
	// contains filtered or unexported fields
}

TweedieProfiler conducts profile likelihood analyses on a GLM with the Tweedie family.

func NewTweedieProfiler

func NewTweedieProfiler(result *GLMResults) *TweedieProfiler

NewTweedieProfiler returns a TweedieProfiler that can be used to profile the variance power parameter of a Tweedie GLM.

func (*TweedieProfiler) LogLike

func (tp *TweedieProfiler) LogLike(pw, scale float64) float64

LogLike returns the profile log likelihood value at the given variance power and scale parameter.

func (*TweedieProfiler) ScaleMLE

func (tp *TweedieProfiler) ScaleMLE() float64

ScaleMLE returns the maximum likelihood estimate of the scale parameter.

func (*TweedieProfiler) VarPowerMLE

func (tp *TweedieProfiler) VarPowerMLE() float64

VarPowerMLE returns the maximum likelihood estimate of the variance power parameter..

type Variance

type Variance struct {
	Name  string
	Var   VecFunc
	Deriv VecFunc
}

Variance represents a GLM variance function.

func NewNegBinomVariance

func NewNegBinomVariance(alpha float64) *Variance

NewNegBinomVariance returns a variance function for the negative binomial family, using the given parameter alpha to determine the mean/variance relationship. The variance for mean m is m + alpha*m^2.

func NewTweedieVariance

func NewTweedieVariance(pw float64) *Variance

NewTweedieVariance returns a variance function for the Tweedie family, using the given parameter pw to determine the mean/variance relationship. The variance for mean m is m^pw.

func NewVariance

func NewVariance(vartype VarianceType) *Variance

NewVariance returns a new variance function object corresponding to the given name. Supported names are binomial, const, cubed, identity, and, squared.

type VarianceType

type VarianceType uint8

VarianceType is used to specify a GLM variance function.

const (
	BinomialVar VarianceType = iota
	IdentityVar
	ConstantVar
	SquaredVar
	CubedVar
)

BinomialVar, ... define variance functions for a GLM.

type VecFunc

type VecFunc func([]float64, []float64)

VecFunc is a function with two float64 array arguments.

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL