Stein's Unbiased Risk Estimate
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, Stein's unbiased risk estimate (SURE) is an
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
estimator of the
mean-squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
of "a nearly arbitrary, nonlinear biased estimator." In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly. The technique is named after its discoverer, Charles Stein.


Formal statement

Let \mu \in ^d be an unknown parameter and let x \in ^d be a measurement vector whose components are independent and distributed normally with mean \mu_i, i=1,...,d, and variance \sigma^2. Suppose h(x) is an estimator of \mu from x, and can be written h(x) = x + g(x), where g is weakly differentiable. Then, Stein's unbiased risk estimate is given by :\operatorname(h) = d\sigma^2 + \, g(x)\, ^2 + 2 \sigma^2 \sum_^d \frac g_i(x) = -d\sigma^2 + \, g(x)\, ^2 + 2 \sigma^2 \sum_^d \frac h_i(x), where g_i(x) is the ith component of the function g(x), and \, \cdot\, is the
Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean s ...
. The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of h(x), i.e. :\operatorname E_\mu \ = \operatorname(h),\,\! with :\operatorname(h) = \operatorname E_\mu \, h(x)-\mu\, ^2. Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter \mu in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of \mu.


Proof

We wish to show that : \operatorname E_\mu \, h(x)-\mu\, ^2 = \operatorname E_\mu \. We start by expanding the MSE as : \begin \operatorname E_\mu \, h(x) - \mu\, ^2 & = \operatorname E_\mu \, g(x) + x - \mu\, ^2 \\ & = \operatorname E_\mu \, g(x)\, ^2 + \operatorname E_\mu \, x - \mu\, ^2 + 2 \operatorname E_\mu g(x)^T (x - \mu) \\ & = \operatorname E_\mu \, g(x)\, ^2 + d \sigma^2 + 2 \operatorname E_\mu g(x)^T(x - \mu). \end Now we use integration by parts to rewrite the last term: : \begin \operatorname E_\mu g(x)^T(x - \mu) & = \int_ \frac \exp\left(-\frac \right) \sum_^d g_i(x) (x_i - \mu_i) d^d x \\ & = \sigma^2 \sum_^d\int_ \frac \exp\left(-\frac \right) \frac d^d x \\ & = \sigma^2 \sum_^d \operatorname E_\mu \frac. \end Substituting this into the expression for the MSE, we arrive at : \operatorname E_\mu \, h(x) - \mu\, ^2 = \operatorname E_\mu \left( d\sigma^2 + \, g(x)\, ^2 + 2\sigma^2 \sum_^d \frac\right).


Applications

A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal
shrinkage estimator In statistics, shrinkage is the reduction in the effects of sampling variation. In regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coeff ...
. The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet
denoising Noise reduction is the process of removing noise from a signal. Noise reduction techniques exist for audio and images. Noise reduction algorithms may distort the signal to some degree. Noise rejection is the ability of a circuit to isolate an un ...
setting.


References

{{reflist Point estimation performance