probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...

and

statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

of a

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...

''g''(''X'') of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

''X'' when one knows the

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

of ''X'' but one does not know the distribution of ''g''(''X''). The form of the law can depend on the form in which one states the probability distribution of the random variable ''X''. If it is a discrete distribution and one knows its

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

''ƒ_X'' (but not ''ƒ''_''g''(''X'')), then the expected value of ''g''(''X'') is :

= \sum_x g(x) f_X(x), \,

where the sum is over all possible values ''x'' of ''X''. If it is a continuous distribution and one knows its

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...

''ƒ''_''X'' (but not ''ƒ''_''g''(''X'')), then the expected value of ''g''(''X'') is :

= \int_^\infty g(x) f_X(x) \, \mathrmx

If one knows the

cumulative probability distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

''F''_''X'' (but not ''F''_''g''(''X'')), then the expected value of ''g''(''X'') is given by a

Riemann–Stieltjes integral In mathematics, the Riemann–Stieltjes integral is a generalization of the Riemann integral, named after Bernhard Riemann and Thomas Joannes Stieltjes. The definition of this integral was first published in 1894 by Stieltjes. It serves as an ins ...

= \int_^\infty g(x) \, \mathrmF_X(x)

(again assuming ''X'' is real-valued).

Etymology

This proposition is known as the law of the unconscious statistician because of a purported tendency to use the identity without realizing that it must be treated as the result of a rigorously proved theorem, not merely a definition.

Joint distributions

A similar property holds for

joint distribution Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...

s. For discrete random variables ''X'' and ''Y'', a function of two variables ''g'', and joint probability mass function ''f''(''x'', ''y''): :

= \sum_y \sum_x g(x, y) f(x, y)

In the

absolutely continuous In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...

case, with ''f''(''x'', ''y'') being the joint probability density function, :

= \int_^\infty \int_^\infty g(x, y) f(x, y) \, \mathrmx \, \mathrmy

Proof

This law is not a trivial result of definitions as it might at first appear, but rather must be proved.

Continuous case

For a continuous random variable ''X'', let ''Y'' = ''g''(''X''), and suppose that ''g'' is differentiable and that its inverse ''g''⁻¹ is monotonic. By the formula for

inverse functions and differentiation In calculus, the inverse function rule is a formula that expresses the derivative of the inverse of a bijective and differentiable function in terms of the derivative of . More precisely, if the inverse of f is denoted as f^, where f^(y) = x i ...

, ::

\frac(g^(y)) = \frac

Because ''x'' = ''g''⁻¹(''y''), ::

dx = \fracdy

So that by a

change of variables Change or Changing may refer to: Alteration * Impermanence, a difference in a state of affairs at different points in time * Menopause, also referred to as "the change", the permanent cessation of the menstrual period * Metamorphosis, or change, ...

, ::

\int_^\infty g(x)f_X(x) \, dx = \int_^\infty yf_X(g^(y))\frac \, dy

Now, notice that because the cumulative distribution function

F_Y(y) = P(Y \leq y)

, substituting in the value of ''g''(''X''), taking the inverse of both sides, and rearranging yields

F_Y(y) = F_X(g^(y))

. Then, by the

chain rule In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions and in terms of the derivatives of and . More precisely, if h=f\circ g is the function such that h(x)=f(g(x)) for every , ...

, ::

f_Y(y) = f_X(g^(y))\frac

Combining these expressions, we find ::

\int_^\infty g(x)f_X(x) \, dx = \int_^\infty yf_Y(y) \, dy

By the definition of

, ::

= \int_^\infty g(x)f_X(x) \, dx

Discrete case

Let

Y = g(X)

. Then begin with the definition of expected value. :

= \sum_y yf_(y)

= \sum_y yP(g(X) = y)

= \sum_y y \sum_ f_X(x)

Rewriting "for all

y

, for all

x

such that

g(x) = y

" as "for all

x

" since

g^

is monotonic, :

= \sum_x g(x)f_X(x)

From measure theory

A technically complete derivation of the result is available using arguments in

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...

, in which the

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

of a transformed

''g(X)'' is related to that of the original random variable ''X''. The steps here involve defining a

pushforward measure In measure theory, a pushforward measure (also known as push forward, push-forward or image measure) is obtained by transferring ("pushing forward") a measure from one measurable space to another using a measurable function. Definition Given meas ...

for the transformed space, and the result is then an example of a

change of variables formula In calculus, integration by substitution, also known as ''u''-substitution, reverse chain rule or change of variables, is a method for evaluating integrals and antiderivatives. It is the counterpart to the chain rule for differentiation, and can ...

. :

\int_\Omega g \circ X \, \mathrmP = \int_ g \, \mathrm(X_* P)

We say

X:(\Omega, \Sigma, P)\to (\Omega_, \Sigma_)

has a density if the pushforward measure

\mathrm(X_* P)

is absolutely continuous with respect to the Lebesgue measure

\mu

. In that case, :

\mathrm(X_*P) = f \, \mathrm \mu,

where

f :  \to \mathbb

is the density (see Radon-Nikodym derivative). So the above can be rewritten as the more familiar :

= \int_ g \circ X \, \mathrmP = \int_ g(x) f(x) \, \mathrmx .

References

{{Reflist Theory of probability distributions Statistical laws