Stein's lemma, named in honor of
Charles Stein, is a
theorem
In mathematics and formal logic, a theorem is a statement (logic), statement that has been Mathematical proof, proven, or can be proven. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to esta ...
of
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
that is of interest primarily because of its applications to
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
— in particular, to
James–Stein estimation and
empirical Bayes method
Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed ...
s — and its applications to
portfolio choice theory. The theorem gives a formula for the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables.
The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...
of one
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
with the value of a function of another, when the two random variables are
jointly normally distributed.
Note that the name "Stein's lemma" is also commonly used to refer to a different result in the area of
statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
, which connects the
error exponents in hypothesis testing In statistical hypothesis testing, the error exponent of a hypothesis testing procedure is the rate at which the probabilities of Type I and Type II decay exponentially with the size of the sample used in the test. For example, if the probability of ...
with the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
. This result is also known as the Chernoff–Stein lemma and is not related to the lemma discussed in this article.
Statement
Suppose ''X'' is a
normally distributed
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is
f(x ...
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
with
expectation μ and
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
σ
2.
Further suppose ''g'' is a differentiable function for which the two expectations
and
both exist.
(The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its
absolute value
In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), ...
.)
Then
:
Multidimensional
In general, suppose ''X'' and ''Y'' are jointly normally distributed. Then
:
For a general multivariate Gaussian random vector
it follows that
:
Similarly, when
,
Gradient descent
Stein's lemma can be used to stochastically estimate gradient:
where
are IID samples from the standard normal distribution
. This form has applications in Stein variational
gradient descent
Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.
The idea is to take repeated steps in the opposite direction of the gradi ...
and Stein variational
policy gradient.
Proof
The univariate
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
for the univariate normal distribution with expectation 0 and variance 1 is
:
Since
we get from
integration by parts
In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivati ...
:
: