In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a generalized linear mixed model (GLMM) is an extension to the
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
(GLM) in which the linear predictor contains
random effects
In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are ...
in addition to the usual
fixed effects
In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are random va ...
. They also inherit from GLMs the idea of extending
linear mixed models to non-
normal Normal(s) or The Normal(s) may refer to:
Film and television
* ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson
* ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie
* ''Norma ...
data.
GLMMs provide a broad range of models for the analysis of grouped data, since the differences between groups can be modelled as a random effect. These models are useful in the analysis of many kinds of data, including
longitudinal data
In statistics and econometrics, panel data and longitudinal data are both multi-dimensional data involving measurements over time. Panel data is a subset of longitudinal data where observations are for the same subjects each time.
Time series and ...
.
Model
GLMMs are generally defined such that, conditioned on the random effects
, the dependent variable
is distributed according to the
exponential family
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
with its expectation
related to the linear predictor
via a link function
:
:
.
Here
and
are the fixed effects design matrix, and fixed effects respectively;
and
are the random effects design matrix and random effects respectively. To understand this very brief definition you will first need to understand the definition of a
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
and of a
mixed model
A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. ...
.
Generalized linear mixed models are a special cases of
hierarchical generalized linear models in which the random effects are normally distributed.
The complete likelihood
:
has no general closed form, and integrating over the random effects is usually extremely computationally intensive. In addition to numerically approximating this integral(e.g. via
Gauss–Hermite quadrature
In numerical analysis, Gauss–Hermite quadrature is a form of Gaussian quadrature for approximating the value of integrals of the following kind:
:\int_^ e^ f(x)\,dx.
In this case
:\int_^ e^ f(x)\,dx \approx \sum_^n w_i f(x_i)
where ''n'' is ...
), methods motivated by Laplace approximation have been proposed. For example, the penalized quasi-likelihood method, which essentially involves repeatedly fitting (i.e. doubly iterative) a weighted normal mixed model with a working variate, is implemented by various commercial and open source statistical programs.
Fitting a model
Fitting GLMMs via
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
(as via
AIC AIC may refer to:
Arts and entertainment
* Alice in Chains, American rock band
* Alice in Chains: AIC 23, a 2013 mockumentary
* Anime International Company, a Japanese animation studio
* Art Institute of Chicago, an art museum in Chicago
Busin ...
) involves
integrating over the random effects. In general, those integrals cannot be expressed in
analytical form. Various approximate methods have been developed, but none has good properties for all possible models and
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s (e.g. ungrouped
binary data
Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra.
Binary data occurs in many different technical and scientific fields, wher ...
are particularly problematic). For this reason, methods involving
numerical quadrature
In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations ...
or
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
have increased in use, as increasing computing power and advances in methods have made them more practical.
The
Akaike information criterion
The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to e ...
(AIC) is a common criterion for
model selection
Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the ...
. Estimates of AIC for GLMMs based on certain
exponential family
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
distributions have recently been obtained.
Software
* Several contributed packages in
R provide GLMM functionality, including lme4 and glmm.
* GLMM can be fitted using
SAS and
SPSS
SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
*
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
also provides a function called "fitglme" to fit GLMM models.
* The
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
package Statsmodels supports binomial and poisson implementation
* The Julia package MixedModels.jl provides a function called GeneralizedLinearMixedModel that fits a GLMM to provided data.
* DHARMa: residual diagnostics for hierarchical (multi-level/mixed) regression models (utk.edu)
See also
*
Generalized estimating equation
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized es ...
*
Hierarchical generalized linear model
References
{{reflist
Analysis of variance
Mixed model
A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. ...