HOME

TheInfoList



OR:

Stein's lemma, named in honor of Charles Stein, is a
theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of th ...
of
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
that is of interest primarily because of its applications to
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
— in particular, to James–Stein estimation and
empirical Bayes method Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed be ...
s — and its applications to portfolio choice theory. The theorem gives a formula for the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
of one
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
with the value of a function of another, when the two random variables are jointly normally distributed.


Statement of the lemma

Suppose ''X'' is a normally distributed
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
with expectation μ and
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
σ2. Further suppose ''g'' is a function for which the two expectations E(''g''(''X'') (''X'' − μ)) and E(''g'' ′(''X'')) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its
absolute value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), an ...
.) Then :E\bigl(g(X)(X-\mu)\bigr)=\sigma^2 E\bigl(g'(X)\bigr). In general, suppose ''X'' and ''Y'' are jointly normally distributed. Then :\operatorname(g(X),Y)= \operatorname(X,Y)E(g'(X)). For a general multivariate Gaussian random vector (X_1, ..., X_n) \sim N(\mu, \Sigma) it follows that :E\bigl(g(X)(X-\mu)\bigr)=\Sigma\cdot E\bigl(\nabla g(X)\bigr).


Proof

The univariate
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
for the univariate normal distribution with expectation 0 and variance 1 is :\varphi(x)=e^ Since \int x \exp(-x^2/2)\,dx = -\exp(-x^2/2) we get from
integration by parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative. ...
: :E (X)X= \frac\int g(x) x \exp(-x^2/2)\,dx = \frac\int g'(x) \exp(-x^2/2)\,dx = E
'(X) The apostrophe ( or ) is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes: * The marking of the omission of one o ...
/math>. The case of general variance \sigma^2 follows by
substitution Substitution may refer to: Arts and media *Chord substitution, in music, swapping one chord for a related one within a chord progression * Substitution (poetry), a variation in poetic scansion * "Substitution" (song), a 2009 song by Silversun Pi ...
.


More general statement

Isserlis' theorem In probability theory, Isserlis' theorem or Wick's probability theorem is a formula that allows one to compute higher-order moments of the multivariate normal distribution in terms of its covariance matrix. It is named after Leon Isserlis. This t ...
is equivalently stated as\operatorname(X_1 f(X_1,\ldots,X_n))=\sum_^ \operatorname(X_1X_i)\operatorname(\partial_f(X_1,\ldots,X_n)).where (X_1,\dots X_) is a zero-mean
multivariate normal In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
random vector. Suppose ''X'' is in an
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
, that is, ''X'' has the density :f_\eta(x)=\exp(\eta'T(x) - \Psi(\eta))h(x). Suppose this density has support (a,b) where a,b could be -\infty ,\infty and as x\rightarrow a\textb, \exp (\eta'T(x))h(x) g(x) \rightarrow 0 where g is any differentiable function such that E, g'(X), <\infty or \exp (\eta'T(x))h(x) \rightarrow 0 if a,b finite. Then :E\left left(\frac + \sum \eta_i T_i'(X)\right)\cdot g(X)\right= -E
'(X) The apostrophe ( or ) is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes: * The marking of the omission of one o ...
The derivation is same as the special case, namely, integration by parts. If we only know X has support \mathbb , then it could be the case that E, g(X), <\infty \text E, g'(X), <\infty but \lim_ f_\eta(x) g(x) \not= 0. To see this, simply put g(x)=1 and f_\eta(x) with infinitely spikes towards infinity but still integrable. One such example could be adapted from f(x) = \begin 1 & x \in , n + 2^) \\ 0 & \text \end so that f is smooth. Extensions to elliptically-contoured distributions also exist.


See also

*Stein's method *Taylor expansions for the moments of functions of random variables *Stein discrepancy


References

{{DEFAULTSORT:Stein's Lemma Theorems in statistics Probability theorems