The James–Stein estimator is a
biased estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
of the
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
,
, of (possibly)
correlated
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
Gaussian distributed random vector
In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its valu ...
s
with unknown means
.
It arose sequentially in two main published papers, the earlier version of the estimator was developed by
Charles Stein in 1956,
which reached a relatively shocking conclusion that while the then usual estimate of the mean, or the sample mean written by Stein and James as
, is
admissible when
, however it is
inadmissible when
and proposed a possible improvement to the estimator that
shrinks the sample means
towards a more central mean vector
(which can be chosen
a priori
("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ex ...
or commonly the "average of averages" of the sample means given all samples share the same size), is commonly referred to as
Stein's example or paradox. This earlier result was improved later by
Willard James and Charles Stein in 1961 through simplifying the original process.
It can be shown that the James–Stein estimator
dominates the "ordinary"
least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the r ...
approach, meaning the James–Stein estimator has a lower or equal
mean squared error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...
than the "ordinary" least square estimator.
Setting
Let
where the vector
is the unknown
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
of
, which is
-variate normally distributed and with known
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...
.
We are interested in obtaining an estimate,
, of
, based on a single observation,
, of
.
In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent
Gaussian noise
Gaussian noise, named after Carl Friedrich Gauss, is a term from signal processing theory denoting a kind of signal noise that has a probability density function (pdf) equal to that of the normal distribution (which is also known as the Gaussian ...
. Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the
least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the r ...
estimator, which is
.
Stein demonstrated that in terms of
mean squared error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...