In
decision theory
Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
and
estimation theory, Stein's example (also known as Stein's phenomenon or Stein's paradox) is the observation that when three or more parameters are estimated simultaneously, there exist combined
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
s more accurate on average (that is, having lower expected
mean squared error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
) than any method that handles the parameters separately. It is named after
Charles Stein of
Stanford University, who discovered the phenomenon in 1955.
An intuitive explanation is that optimizing for the mean-squared error of a ''combined'' estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent. If one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.
Formal statement
The following is the simplest form of the paradox, the special case in which the number of observations is equal to the number of parameters to be estimated. Let
be a vector consisting of
unknown parameters. To estimate these parameters, a single measurement
is performed for each parameter
, resulting in a vector
of length
. Suppose the measurements are known to be
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independ ...
,
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
random variables, with mean
and variance 1, i.e.,
. Thus, each parameter is estimated using a single noisy measurement, and each measurement is equally inaccurate.
Under these conditions, it is intuitive and common to use each measurement as an estimate of its corresponding parameter. This so-called "ordinary" decision rule can be written as
, which is the
maximum likelihood estimator
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
(MLE). The quality of such an estimator is measured by its
risk function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
. A commonly used risk function is the
mean squared error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
, defined as
*_
*_
*_
*_{{citation
__, _first_=_R._J.
__, _last_=_Samworth
__, _title_=_Stein's_Paradox
__, _url_=_http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf
__, _journal_=_Eureka
__, _volume_=_62
__, _pages_=_38-41
__, _date_=_2012_
. We will show that
. The risk function is
: