Variance inflation factor
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the variance inflation factor (VIF) is the ratio (
quotient In arithmetic, a quotient (from 'how many times', pronounced ) is a quantity produced by the division of two numbers. The quotient has widespread use throughout mathematics. It has two definitions: either the integer part of a division (in th ...
) of the variance of a parameter estimate when fitting a full model that includes other parameters to the variance of the parameter estimate if the model is fit with only the parameter on its own. The VIF provides an index that measures how much the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
(the square of the estimate's
standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
) of an estimated regression coefficient is increased because of
collinearity In geometry, collinearity of a set of points is the property of their lying on a single line. A set of points with this property is said to be collinear (sometimes spelled as colinear). In greater generality, the term has been used for aligned ...
. Cuthbert Daniel claims to have invented the concept behind the variance inflation factor, but did not come up with the name.


Definition

Consider the following
linear model In statistics, the term linear model refers to any model which assumes linearity in the system. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, t ...
with ''k'' independent variables: : ''Y'' = ''β''0 + ''β''1 ''X''1 + ''β''2 ''X'' 2 + ... + ''β''''k'' ''X''''k'' + ''ε''. The
standard error The standard error (SE) of a statistic (usually an estimator of a parameter, like the average or mean) is the standard deviation of its sampling distribution or an estimate of that standard deviation. In other words, it is the standard deviati ...
of the estimate of ''β''''j'' is the square root of the ''j'' + 1 diagonal element of ''s''2(''X''′''X'')−1, where ''s'' is the root mean squared error (RMSE) (note that RMSE2 is a
consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
of the true variance of the error term, \sigma^2 ); ''X'' is the regression
design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual o ...
— a matrix such that ''X''''i'', ''j''+1 is the value of the ''j''th independent variable for the ''i''th case or observation, and such that ''X''''i'',1, the predictor vector associated with the intercept term, equals 1 for all ''i''. It turns out that the square of this standard error, the estimated variance of the estimate of ''β''''j'', can be equivalently expressed as: : \widehat(\hat_j) = \frac\cdot \frac, where ''R''''j''2 is the multiple ''R''2 for the regression of ''X''''j'' on the other covariates (a regression that does not involve the response variable ''Y'') and \hat_j are the coefficient estimates, id est, the estimates of _j. This identity separates the influences of several distinct factors on the variance of the coefficient estimate: * ''s''2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates * ''n'': greater sample size results in proportionately less variance in the coefficient estimates * \widehat\operatorname(X_j): greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate The remaining term, 1 / (1 − ''R''''j''2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector ''X''''j'' is
orthogonal In mathematics, orthogonality (mathematics), orthogonality is the generalization of the geometric notion of ''perpendicularity''. Although many authors use the two terms ''perpendicular'' and ''orthogonal'' interchangeably, the term ''perpendic ...
to each column of the design matrix for the regression of ''X''''j'' on the other covariates. By contrast, the VIF is greater than 1 when the vector ''X''''j'' is not orthogonal to all columns of the design matrix for the regression of ''X''''j'' on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables (that is, we could scale each variable ''X''''j'' by a constant ''c''''j'' without changing the VIF). : \widehat(\hat_j) = s^2 X^T X)^ Now let r= X^T X, and without losing generality, we reorder the columns of ''X'' to set the first column to be X_j : r^ = \begin r_ & r_ \\ r_ & r_\end^ : r_ = X_j^T X_j, r_ = X_j^T X_, r_ = X_^T X_j, r_ = X_^T X_. By using
Schur complement The Schur complement is a key tool in the fields of linear algebra, the theory of matrices, numerical analysis, and statistics. It is defined for a block matrix. Suppose ''p'', ''q'' are nonnegative integers such that ''p + q > 0'', and suppose ...
, the element in the first row and first column in r^ is, :r^_ =
_ - r_ r_^ r_ The symbol , known in Unicode as hyphen-minus, is the form of hyphen most commonly used in digital documents. On most keyboards, it is the only character that resembles a minus sign or a dash, so it is also used for these. The name ''hyphen-min ...
Then we have, : \begin & \widehat(\hat_j) = s^2 X^T X)^ = s^2 r^_ \\ = & s^2 _j^T X_j - X_j^T X_ (X_^T X_)^ X_^T X_j \\ = & s^2 _j^T X_j - X_j^T X_ (X_^T X_)^ (X_^T X_) (X_^T X_)^ X_^T X_j \\ = & s^2 _j^T X_j - \hat_^T(X_^T X_) \hat_ \\ = & s^2 \frac \\ = & \frac\cdot \frac \end Here \hat_ is the coefficient of regression of dependent variable X_j over covariate X_ . \mathrm_j is the corresponding
residual sum of squares In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of dat ...
.


Calculation and analysis

We can calculate ''k'' different VIFs (one for each ''X''''i'') in three steps:


Step one

First we run an ordinary least square regression that has ''X''''i'' as a function of all the other explanatory variables in the first equation.
If ''i'' = 1, for example, equation would be :X_1=\alpha_0 + \alpha_2 X_2 + \alpha_3 X_3 + \cdots + \alpha_k X_k +\varepsilon where \alpha_0 is a constant and \varepsilon is the
error term In mathematics and statistics, an error term is an additive type of error. In writing, an error term is an instance of faulty language or grammar. Common examples include: * errors and residuals in statistics, e.g. in linear regression * the error ...
.


Step two

Then, calculate the VIF factor for \hat\alpha_i with the following formula : : \mathrm_i = \frac where ''R''2''i'' is the
coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used in t ...
of the regression equation in step one, with X_i on the left hand side, and all other predictor variables (all the other X variables) on the right hand side.


Step three

Analyze the magnitude of
multicollinearity In statistics, multicollinearity or collinearity is a situation where the predictors in a regression model are linearly dependent. Perfect multicollinearity refers to a situation where the predictive variables have an ''exact'' linear rela ...
by considering the size of the \operatorname(\hat \alpha_i). A rule of thumb is that if \operatorname(\hat \alpha_i) > 10 then multicollinearity is high (a cutoff of 5 is also commonly used). However, there is no value of VIF greater than 1 in which the variance of the slopes of predictors isn't inflated. As a result, including two or more variables in a multiple regression that are not orthogonal (i.e. have correlation = 0), will alter each other's slope, SE of the slope, and P-value, because there is shared variance between the predictors that can't be uniquely attributed to any one of them. Some software instead calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference.


Interpretation

The square root of the variance inflation factor indicates how much larger the standard error increases compared to if that variable had 0 correlation to other predictor variables in the model. Example
If the variance inflation factor of a predictor variable were 5.27 (√5.27 = 2.3), this means that the standard error for the coefficient of that predictor variable is 2.3 times larger than if that predictor variable had 0 correlation with the other predictor variables.


Implementation

*vif function in th
car
R package *ols_vif_tol function in th
olsrr
R package *PROC REG in SA
System
*variance_inflation_factor function i
statsmodels
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
package *estat vif i
Stata
addon for
GRASS GIS ''Geographic Resources Analysis Support System'' (commonly termed ''GRASS GIS'') is a geographic information system (GIS) software suite used for geospatial data management and analysis, image processing, producing graphics and maps, spatial and ...
*vif (non categorical) and gvif (categorical data) functions i
StatsModels
Julia Julia may refer to: People *Julia (given name), including a list of people with the name *Julia (surname), including a list of people with the name *Julia gens, a patrician family of Ancient Rome *Julia (clairvoyant) (fl. 1689), lady's maid of Qu ...
programing language


References


Further reading

* * * * * * * {{cite journal , last1=Zuur , first1=A.F. , last2=Ieno, first2=E.N., last3=Elphick, first3=C.S, year=2010 , title=A protocol for data exploration to avoid common statistical problems , journal=Methods in Ecology and Evolution , volume=1 , pages=3–14 , doi=10.1111/j.2041-210X.2009.00001.x , s2cid=18814132 , doi-access=free


See also

* Design effect Regression diagnostics Statistical ratios Statistical deviation and dispersion de:Multikollinearität#Varianzinflationsfaktor