In statistics, the variance inflation factor (VIF) is the ratio (

quotient In arithmetic, a quotient (from lat, quotiens 'how many times', pronounced ) is a quantity produced by the division of two numbers. The quotient has widespread use throughout mathematics, and is commonly referred to as the integer part of a ...

) of the variance of estimating some parameter in a model that includes multiple other terms (parameters) by the variance of a model constructed using only one term. It quantifies the severity of

multicollinearity In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coeffic ...

in an

ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...

regression analysis. It provides an index that measures how much the

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

(the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity. Cuthbert Daniel claims to have invented the concept behind the variance inflation factor, but did not come up with the name.

Definition

Consider the following

linear model In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...

with ''k'' independent variables: : ''Y'' = ''β''₀ + ''β''₁ ''X''₁ + ''β''₂ ''X'' ₂ + ... + ''β''_''k'' ''X''_''k'' + ''ε''. The

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...

of the estimate of ''β''_''j'' is the square root of the ''j'' + 1 diagonal element of ''s''²(''X''′''X'')⁻¹, where ''s'' is the root mean squared error (RMSE) (note that RMSE² is a consistent estimator of the true variance of the error term,

\sigma^2

); ''X'' is the regression

design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ...

— a matrix such that ''X''_{''i'', ''j''+1} is the value of the ''j''^th independent variable for the ''i''^th case or observation, and such that ''X''_''i'',1, the predictor vector associated with the intercept term, equals 1 for all ''i''. It turns out that the square of this standard error, the estimated variance of the estimate of ''β''_''j'', can be equivalently expressed as: :

\widehat(\hat_j) = \frac\cdot \frac,

where ''R''_''j''² is the multiple ''R''² for the regression of ''X''_''j'' on the other covariates (a regression that does not involve the response variable ''Y''). This identity separates the influences of several distinct factors on the variance of the coefficient estimate: * ''s''²: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates * ''n'': greater sample size results in proportionately less variance in the coefficient estimates *

\widehat\operatorname(X_j)

: greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate The remaining term, 1 / (1 − ''R''_''j''²) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector ''X''_''j'' is

orthogonal In mathematics, orthogonality is the generalization of the geometric notion of '' perpendicularity''. By extension, orthogonality is also used to refer to the separation of specific features of a system. The term also has specialized meanings in ...

to each column of the design matrix for the regression of ''X''_''j'' on the other covariates. By contrast, the VIF is greater than 1 when the vector ''X''_''j'' is not orthogonal to all columns of the design matrix for the regression of ''X''_''j'' on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables (that is, we could scale each variable ''X''_''j'' by a constant ''c''_''j'' without changing the VIF). :

\widehat(\hat_j) = s^2 X^T X)^

Now let

r= X^T X

, and without losing generality, we reorder the columns of ''X'' to set the first column to be

X_j

r^ = \begin r_ & r_ \\ r_ & r_\end^

r_ = X_j^T X_j, r_ = X_j^T X_, r_ = X_^T X_j, r_ = X_^T X_

. By using

Schur complement In linear algebra and the theory of matrices, the Schur complement of a block matrix is defined as follows. Suppose ''p'', ''q'' are nonnegative integers, and suppose ''A'', ''B'', ''C'', ''D'' are respectively ''p'' × ''p'', ''p'' × ''q'', ''q'' ...

, the element in the first row and first column in

r^

is, :

r^_ =

_ - r_ r_^ r_ The hyphen-minus is the most commonly used type of hyphen, widely used in digital documents. It is the only character that looks like a minus sign or a dash in many character sets such as ASCII or on most keyboards, so it is also used as such. ...

Then we have, :

\\ = & s^2 \frac \\ = & \frac\cdot \frac \end

Here

\hat_

is the coefficient of regression of dependent variable

X_j

over covariate

X_

\mathrm_j

is the corresponding

residual sum of squares In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepanc ...

Calculation and analysis

We can calculate ''k'' different VIFs (one for each ''X''_''i'') in three steps:

Step one

First we run an ordinary least square regression that has ''X''_''i'' as a function of all the other explanatory variables in the first equation.
If ''i'' = 1, for example, equation would be :

X_1=\alpha_0 + \alpha_2 X_2 + \alpha_3 X_3 + \cdots + \alpha_k X_k +e

where

\alpha_0

is a constant and ''e'' is the

error term In mathematics and statistics, an error term is an additive type of error. Common examples include: * errors and residuals in statistics, e.g. in linear regression In statistics, linear regression is a linear approach for modelling the relati ...

Step two

Then, calculate the VIF factor for

\hat\beta_i

with the following formula : :

\mathrm_i = \frac

where ''R''²_''i'' is the

coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used ...

of the regression equation in step one, with

X_i

on the left hand side, and all other predictor variables (all the other X variables) on the right hand side.

Step three

Analyze the magnitude of

by considering the size of the

\operatorname(\hat \beta_i)

. A rule of thumb is that if

\operatorname(\hat \beta_i) > 10

then multicollinearity is high (a cutoff of 5 is also commonly used). However, there is no value of VIF greater than 0 in which the variance of the slopes of predictors isn't inflated. As a result, including two or more variables in a multiple regression that are not orthogonal (i.e. have correlation = 0), will alter each other's slope, SE of the slope, and P-value, because there is shared variance between the predictors that can't be uniquely attributed to any one of them. Some software instead calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference.

Interpretation

The square root of the variance inflation factor indicates how much larger the standard error increases compared to if that variable had 0 correlation to other predictor variables in the model. Example
If the variance inflation factor of a predictor variable were 5.27 (√5.27 = 2.3), this means that the standard error for the coefficient of that predictor variable is 2.3 times larger than if that predictor variable had 0 correlation with the other predictor variables.

Implementation

*vif function in th
car
R package *ols_vif_tol function in th
olsrr
R package *PROC REG in SA
System
*variance_inflation_factor function i
statsmodels
Python package *estat vif i
Stata
addon for

GRASS GIS ''Geographic Resources Analysis Support System'' (commonly termed ''GRASS GIS'') is a geographic information system (GIS) software suite used for geospatial data management and analysis, image processing, producing graphics and maps, spatial and ...