statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, inverse-variance weighting is a method of aggregating two or more

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

to minimize the

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

of the

weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...

. Each random variable is weighted in inverse proportion to its variance, i.e. proportional to its precision. Given a sequence of independent observations with variances , the inverse-variance weighted average is given by :

\hat = \frac .

The inverse-variance weighted average has the least variance among all weighted averages, which can be calculated as :

Var(\hat) = \frac .

If the variances of the measurements are all equal, then the inverse-variance weighted average becomes the simple average. Inverse-variance weighting is typically used in statistical

meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting m ...

or sensor fusion to combine the results from independent measurements.

Context

Suppose an experimenter wishes to measure the value of a quantity, say the acceleration due to

gravity of Earth The gravity of Earth, denoted by , is the net acceleration that is imparted to objects due to the combined effect of gravitation (from mass distribution within Earth) and the centrifugal force (from the Earth's rotation). It is a vector quant ...

, whose true value happens to be

\mu

. A careful experimenter makes multiple measurements, which we denote with

n

X_1, X_2 , ... , X_n

. If they are all noisy but unbiased, i.e., the measuring device does not systematically overestimate or underestimate the true value and the errors are scattered symmetrically, then the

expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

E_i = \mu

\forall i

. The scatter in the measurement is then characterised by the

of the random variables

Var(X_i) := \sigma_i^2

, and if the measurements are performed under identical scenarios, then all the

\sigma_i

are the same, which we shall refer to by

\sigma

. Given the

n

measurements, a typical

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

for

\mu

, denoted as

\hat

, is given by the simple

average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7 ...

\overline = \frac \sum_i X_i

. Note that this empirical average is also a random variable, whose expectation value

E

overline An overline, overscore, or overbar, is a typographical feature of a horizontal line drawn immediately above the text. In old mathematical notation, an overline was called a '' vinculum'', a notation for grouping symbols which is expressed in m ...

/math> is

\mu

but also has a scatter. If the individual measurements are uncorrelated, the square of the error in the estimate is given by

Var(\overline) = \frac\sum_i \sigma_i^2 = \left(\frac\right)^2

. Hence, if all the

\sigma_i

are equal, then the error in the estimate decreases with increase in

n

1/\sqrt

, thus making more observations preferred. Instead of

n

repeated measurements with one instrument, if the experimenter makes

n

of the same quantity with

n

different instruments with varying quality of measurements, then there is no reason to expect the different

\sigma_i

to be the same. Some instruments could be noisier than others. In the example of measuring the acceleration due to gravity, the different "instruments" could be measuring

g

from a simple pendulum, from analysing a projectile motion etc. The simple average is no longer an optimal estimator, since the error in

\overline

might actually exceed the error in the least noisy measurement if different measurements have very different errors. Instead of discarding the noisy measurements that increase the final error, the experimenter can combine all the measurements with appropriate weights so as to give more importance to the least noisy measurements and vice versa. Given the knowledge of

\sigma_1^2, \sigma_2^2, ... , \sigma_n^2

, an optimal estimator to measure

\mu

would be a

weighted mean The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...

of the measurements

\hat = \frac

, for the particular choice of the weights

w_i = 1/\sigma_i^2

. The variance of the estimator

Var(\hat) = \frac

, which for the optimal choice of the weights become

Var(\hat_\text) =  \left(  \sum_  \sigma_i^ \right)^ .

Note that since

Var(\hat_\text) < \min_j \sigma_j^2

, the estimator has a scatter smaller than the scatter in any individual measurement. Furthermore, the scatter in

\hat_\text

decreases with adding more measurements, however noisier those measurements may be.

Derivation

Consider a generic weighted sum

Y= \sum_i w_i X_i

, where the weights

w_i

are normalised such that

\sum_i w_i = 1

. If the

X_i

are all independent, the variance of

Y

is given by :

Var(Y) = \sum_i w_i^2 \sigma_i^2.

For optimality, we wish to minimise

Var(Y)

which can be done by equating the

gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gr ...

with respect to the weights of

Var(Y)

to zero, while maintaining the constraint that

\sum_i w_i = 1

. Using a

Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...

w_0

to enforce the constraint, we express the variance :

Var(Y) = \sum_i w_i^2 \sigma_i^2 - w_0(\sum_i w_i - 1).

For

k>0

, :

0 = \frac Var(Y) = 2w_k\sigma_k^2 - w_0,

which implies that :

w_k = \frac.

The main takeaway here is that

w_k \propto 1/\sigma_k^2

. Since

\sum_i w_i = 1

, :

\frac = \sum_i \frac := \frac.

The individual normalised weights are :

w_k = \frac\left( \sum_i \frac \right)^.

It is easy to see that this extremum solution corresponds to the minimum from the

second partial derivative test The second (symbol: s) is the unit of time in the International System of Units (SI), historically defined as of a day – this factor derived from the division of the day first into 24 hours, then to 60 minutes and finally to 60 seconds each ...

by noting that the variance is a quadratic function of the weights. Thus, the minimum variance of the estimator is then given by :

Var(Y) = \sum_i \frac\sigma_i^2 = \sigma_0^4\sum_i \frac = \sigma_0^4\frac = \sigma_0^2 = \frac.

Normal Distributions

For normally distributed random variables inverse-variance weighted averages can also be derived as the maximum likelihood estimate for the true value. Furthermore, from a

Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...

perspective the posterior distribution for the true value given normally distributed observations

y_i

and a flat prior is a normal distribution with the inverse-variance weighted average as a mean and variance

Var(Y)

Multivariate Case

For multivariate distributions an equivalent argument leads to an optimal weighting based on the covariance matrices

\Sigma_i

of the individual estimates

x_i

: :

\hat = \left(\sum_i \Sigma_i^\right)^\sum_i \Sigma_i^ x_i

Var(\hat) = \left(\sum_i \Sigma_i^\right)^

For multivariate distributions the term "precision-weighted" average is more commonly used.

References

{{More citations needed, date=September 2012 Meta-analysis Estimation methods

Context

Derivation

Normal Distributions

Multivariate Case

See also

References