The general linear model or general multivariate regression model is a compact way of simultaneously writing several
multiple linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
models. In that sense it is not a separate statistical
linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
. The various multiple linear regression models may be compactly written as
:
where Y is a
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** ''The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
with series of multivariate measurements (each column being a set of measurements on one of the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s), X is a matrix of observations on
independent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s that might be a
design matrix
In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ob ...
(each column being a set of observations on one of the independent variables), B is a matrix containing parameters that are usually to be estimated and U is a matrix containing
errors (noise).
The errors are usually assumed to be uncorrelated across measurements, and follow a
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
. If the errors do not follow a multivariate normal distribution,
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
s may be used to relax assumptions about Y and U.
The general linear model incorporates a number of different statistical models:
ANOVA
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
,
ANCOVA
Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a tre ...
,
MANOVA
In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests i ...
,
MANCOVA Multivariate analysis of covariance (MANCOVA) is an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables – covaria ...
, ordinary
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
,
''t''-test and
''F''-test. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. If Y, B, and U were
column vector
In linear algebra, a column vector with m elements is an m \times 1 matrix consisting of a single column of m entries, for example,
\boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end.
Similarly, a row vector is a 1 \times n matrix for some n, c ...
s, the matrix equation above would represent multiple linear regression.
Hypothesis tests with the general linear model can be made in two ways:
multivariate
Multivariate may refer to:
In mathematics
* Multivariable calculus
* Multivariate function
* Multivariate polynomial
In computing
* Multivariate cryptography
* Multivariate division algorithm
* Multivariate interpolation
* Multivariate optical c ...
or as several independent
univariate
In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate cas ...
tests. In multivariate tests the columns of Y are tested together, whereas in univariate tests the columns of Y are tested independently, i.e., as multiple univariate tests with the same design matrix.
Comparison to multiple linear regression
Multiple linear regression is a generalization of
simple linear regression
In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
to the case of more than one independent variable, and a
special case
In logic, especially as applied in mathematics, concept is a special case or specialization of concept precisely if every instance of is also an instance of but not vice versa, or equivalently, if is a generalization of . A limiting case is ...
of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is
:
or more compactly
for each observation ''i'' = 1, ... , ''n''.
In the formula above we consider ''n'' observations of one dependent variable and ''p'' independent variables. Thus, ''Y''
''i'' is the ''i''
th observation of the dependent variable, ''X''
''ij'' is ''i''
th observation of the ''j''
th independent variable, ''j'' = 1, 2, ..., ''p''. The values ''β''
''j'' represent parameters to be estimated, and ''ε''
''i'' is the ''i''
th independent identically distributed normal error.
In the more general multivariate linear regression, there is one equation of the above form for each of ''m'' > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
:
or more compactly
for all observations indexed as ''i'' = 1, ... , ''n'' and for all dependent variables indexed as ''j = 1, ... , ''m''.
Note that, since each dependent variable has its own set of regression parameters to be fitted, from a computational point of view the general multivariate regression is simply a sequence of standard multiple linear regressions using the same explanatory variables.
Comparison to generalized linear model
The general linear model and the
generalized linear model (GLM) are two commonly used families of
statistical methods
Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
to relate some number of continuous and/or categorical
predictors to a single
outcome variable.
The main difference between the two approaches is that the general linear model strictly assumes that the
residuals will follow a
conditionally normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
,
[Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences.] while the GLM loosens this assumption and allows for a variety of other
distributions from the
exponential family
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
for the residuals.
Of note, the general linear model is a special case of the GLM in which the distribution of the residuals follow a conditionally normal distribution.
The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLM family. Commonly used models in the GLM family include
binary logistic regression for binary or dichotomous outcomes,
Poisson regression
In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the logari ...
for count outcomes, and
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
for continuous, normally distributed outcomes. This means that GLM may be spoken of as a general family of statistical models or as specific models for specific outcome types.
Applications
An application of the general linear model appears in the analysis of multiple
brain scan
Neuroimaging is the use of quantitative (computational) techniques to study the structure and function of the central nervous system, developed as an objective way of scientifically studying the healthy human brain in a non-invasive manner. Incr ...
s in scientific experiments where contains data from brain scanners, contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to a ''mass-univariate'' in this setting) and is often referred to as
statistical parametric mapping.
See also
*
Bayesian multivariate linear regression
In statistics, Bayesian multivariate linear regression is a
Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random v ...
*
F-test
An ''F''-test is any statistical test in which the test statistic has an ''F''-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model th ...
*
t-test
A ''t''-test is any statistical hypothesis testing, statistical hypothesis test in which the test statistic follows a Student's t-distribution, Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test stati ...
Notes
References
*
*
*
{{statistics
Regression models