statistics Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...

, the fraction of variance unexplained (FVU) in the context of a regression task is the fraction of variance of the

regressand Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...

(dependent variable) ''Y'' which cannot be explained, i.e., which is not correctly predicted, by the

explanatory variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

s ''X''.

Formal definition

Suppose we are given a regression function

f

yielding for each

y_i

an estimate

\widehat_i = f(x_i)

where

x_i

is the vector of the ''i''^th observations on all the explanatory variables. We define the fraction of variance unexplained (FVU) as: :

& = 1 - R^2 \end

where ''R''² is the

coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used i ...

and ''VAR''_err and ''VAR''_tot are the variance of the residuals and the sample variance of the dependent variable. ''SS''_''err'' (the sum of squared predictions errors, equivalently the

residual sum of squares In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepanc ...

), ''SS''_''tot'' (the

total sum of squares In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. For a set of observations, y_i, i\leq n, it is defined as the sum over all squared dif ...

), and ''SS''_''reg'' (the sum of squares of the regression, equivalently the

explained sum of squares In statistics, the explained sum of squares (ESS), alternatively known as the model sum of squares or sum of squares due to regression (SSR – not to be confused with the residual sum of squares (RSS) or sum of squares of errors), is a quantity ...

) are given by :

\begin 
\text_\text & = \sum_^N\;(y_i - \widehat_i)^2\\
\text_\text & = \sum_^N\;(y_i-\bar)^2 \\
\text_\text & = \sum_^N\;(\widehat_i-\bar)^2 \text  \\
\bar & = \frac 1 N \sum_^N\;y_i.
\end

Alternatively, the fraction of variance unexplained can be defined as follows: :

\text = \frac

where MSE(''f'') is the

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...

of the regression function ''ƒ''.

Explanation

It is useful to consider the second definition to understand FVU. When trying to predict ''Y'', the most naïve regression function that we can think of is the constant function predicting the mean of ''Y'', i.e.,

f(x_i)=\bar

. It follows that the MSE of this function equals the variance of ''Y''; that is, ''SS''_err = ''SS''_tot, and ''SS''_reg = 0. In this case, no variation in ''Y'' can be accounted for, and the FVU then has its maximum value of 1. More generally, the FVU will be 1 if the explanatory variables ''X'' tell us nothing about ''Y'' in the sense that the predicted values of ''Y'' do not

covary In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...

with ''Y''. But as prediction gets better and the MSE can be reduced, the FVU goes down. In the case of perfect prediction where

\hat_i = y_i

for all ''i'', the MSE is 0, ''SS''_err = 0, ''SS''_reg = ''SS''_tot, and the FVU is 0.

References

{{DEFAULTSORT:Fraction Of Variance Unexplained Parametric statistics Statistical ratios Least squares

Formal definition

Explanation

See also

References