In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, explained variation measures the proportion to which a mathematical model accounts for the variation (
dispersion
Dispersion may refer to:
Economics and finance
*Dispersion (finance), a measure for the statistical distribution of portfolio returns
*Price dispersion, a variation in prices across sellers of the same item
*Wage dispersion, the amount of variatio ...
) of a given data set. Often, variation is quantified as
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
; then, the more specific term explained variance can be used.
The complementary part of the total variation is called
unexplained or
residual variation.
Definition in terms of information gain
Information gain by better modelling
Following Kent (1983), we use the Fraser information (Fraser 1965)
:
where
is the probability density of a random variable
, and
with
(
) are two families of parametric models. Model family 0 is the simpler one, with a restricted parameter space
.
Parameters are determined by
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
,
:
The information gain of model 1 over model 0 is written as
:
where a factor of 2 is included for convenience. Γ is always nonnegative; it measures the extent to which the best model of family 1 is better than the best model of family 0 in explaining ''g''(''r'').
Information gain by a conditional model
Assume a two-dimensional random variable
where ''X'' shall be considered as an explanatory variable, and ''Y'' as a dependent variable. Models of family 1 "explain" ''Y'' in terms of ''X'',
:
,
whereas in family 0, ''X'' and ''Y'' are assumed to be independent. We define the randomness of ''Y'' by