probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected value is the

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

of the possible values a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would expect to get in reality. The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by integration. In the axiomatic foundation for probability provided by

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude (mathematics), magnitude, mass, and probability of events. These seemingl ...

, the expectation is given by

Lebesgue integration In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the axis. The Lebesgue integral, named after French mathematician Henri L ...

. The expected value of a random variable is often denoted by , , or , with also often stylized as

\mathbb

or .

History

The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points, which seeks to divide the stakes ''in a fair way'' between two players, who have to end their game before it is properly finished. This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to

Blaise Pascal Blaise Pascal (19June 162319August 1662) was a French mathematician, physicist, inventor, philosopher, and Catholic Church, Catholic writer. Pascal was a child prodigy who was educated by his father, a tax collector in Rouen. His earliest ...

by French writer and amateur mathematician Chevalier de Méré in 1654. Méré claimed that this problem could not be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all. He began to discuss the problem in the famous series of letters to

Pierre de Fermat Pierre de Fermat (; ; 17 August 1601 – 12 January 1665) was a French mathematician who is given credit for early developments that led to infinitesimal calculus, including his technique of adequality. In particular, he is recognized for his d ...

. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it. In Dutch mathematician Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see Huygens (1657)) "''De ratiociniis in ludo aleæ''" on probability theory just after visiting Paris. The book extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the

theory of probability Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

. In the foreword to his treatise, Huygens wrote: In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the expectations of

random variables A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers ...

Etymology

Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes: More than a hundred years later, in 1814,

Pierre-Simon Laplace Pierre-Simon, Marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French polymath, a scholar whose work has been instrumental in the fields of physics, astronomy, mathematics, engineering, statistics, and philosophy. He summariz ...

published his tract "''Théorie analytique des probabilités''", where the concept of expected value was defined explicitly:

Notations

The use of the letter to denote "expected value" goes back to W. A. Whitworth in 1901. The symbol has since become popular for English writers. In German, stands for ''Erwartungswert'', in Spanish for ''esperanza matemática'', and in French for ''espérance mathématique.'' When "E" is used to denote "expected value", authors use a variety of stylizations: the expectation operator can be stylized as (upright), (italic), or

\mathbb

(in

blackboard bold Blackboard bold is a style of writing Emphasis (typography), bold symbols on a blackboard by doubling certain strokes, commonly used in mathematical lectures, and the derived style of typeface used in printed mathematical texts. The style is most ...

), while a variety of bracket notations (such as , , and ) are all used. Another popular notation is . , , and

\overline

are commonly used in physics. is used in Russian-language literature.

Definition

As discussed above, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

s, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of

and

, which provide these different contexts with an axiomatic foundation and common language. Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a

random vector In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

. It is defined component by component, as . Similarly, one may define the expected value of a

random matrix In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all of its entries are sampled randomly from a probability distribution. Random matrix theory (RMT) is the ...

with components by .

Random variables with finitely many outcomes

Consider a random variable with a ''finite'' list of possible outcomes, each of which (respectively) has probability of occurring. The expectation of is defined as

=x_1p_1 + x_2p_2 + \cdots + x_kp_k.

Since the probabilities must satisfy , it is natural to interpret as a weighted average of the values, with weights given by their probabilities . In the special case that all possible outcomes are equiprobable (that is, ), the weighted average is given by the standard

average In colloquial, ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean the sum of the numbers divided by ...

. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.

Examples

* Let

X

represent the outcome of a roll of a fair six-sided die. More specifically,

X

will be the number of pips showing on the top face of the die after the toss. The possible values for

X

are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of . The expectation of

X

= 1 \cdot \frac + 2 \cdot \frac + 3\cdot\frac + 4\cdot\frac + 5\cdot\frac + 6\cdot\frac = 3.5.

If one rolls the die

n

times and computes the average (

arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...

) of the results, then as

n

grows, the average will

almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...

converge to the expected value, a fact known as the strong law of large numbers. * The

roulette Roulette (named after the French language, French word meaning "little wheel") is a casino game which was likely developed from the Italy, Italian game Biribi. In the game, a player may choose to place a bet on a single number, various grouping ...

game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable

X

represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be

= -\$1 \cdot \frac + \$35 \cdot \frac = -\$\frac.

That is, the expected value to be won from a $1 bet is −$. Thus, in 190 bets, the net loss will probably be about $10.

Random variables with countably infinitely many outcomes

Informally, the expectation of a random variable with a countably infinite set of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that

= \sum_^\infty x_i\, p_i,

where are the possible outcomes of the random variable and are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context. However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the Riemann series theorem of

mathematical analysis Analysis is the branch of mathematics dealing with continuous functions, limit (mathematics), limits, and related theories, such as Derivative, differentiation, Integral, integration, measure (mathematics), measure, infinite sequences, series ( ...

illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely. For this reason, many mathematical textbooks only consider the case that the infinite sum given above converges absolutely, which implies that the infinite sum is a finite number independent of the ordering of summands. In the alternative case that the infinite sum does not converge absolutely, one says the random variable ''does not have finite expectation.''

Examples

* Suppose

x_i = i

and

p_i = \tfrac

for

i = 1, 2, 3, \ldots,

where

c = \tfrac

is the scaling factor which makes the probabilities sum to 1. Then we have

\,= \sum_i x_i p_i = 1(\tfrac) + 2(\tfrac) + 3 (\tfrac) + \cdots \,= \, \tfrac + \tfrac + \tfrac + \cdots \,=\, c \,=\, \tfrac.

Random variables with density

Now consider a random variable which has a

given by a function on the

real number line A number line is a graphical representation of a straight line that serves as spatial representation of numbers, usually graduated like a ruler with a particular origin point representing the number zero and evenly spaced marks in either direc ...

. This means that the probability of taking on a value in any given

open interval In mathematics, a real interval is the set (mathematics), set of all real numbers lying between two fixed endpoints with no "gaps". Each endpoint is either a real number or positive or negative infinity, indicating the interval extends without ...

is given by the

integral In mathematics, an integral is the continuous analog of a Summation, sum, which is used to calculate area, areas, volume, volumes, and their generalizations. Integration, the process of computing an integral, is one of the two fundamental oper ...

of over that interval. The expectation of is then given by the integral

= \int_^\infty x f(x)\, dx.

A general and mathematically precise formulation of this definition uses

and

, and the corresponding theory of ''absolutely continuous random variables'' is described in the next section. The density functions of many common distributions are piecewise continuous, and as such the theory is often developed in this restricted setting. For such functions, it is sufficient to only consider the standard Riemann integration. Sometimes ''continuous random variables'' are defined as those corresponding to this special class of densities, although the term is used differently by various authors. Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of is given by the Cauchy distribution , so that . It is straightforward to compute in this case that

\int_a^b xf(x)\,dx=\int_a^b \frac\,dx=\frac\ln\frac.

The limit of this expression as and does not exist: if the limits are taken so that , then the limit is zero, while if the constraint is taken, then the limit is . To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral converges absolutely, with left undefined otherwise. However, measure-theoretic notions as given below can be used to give a systematic definition of for more general random variables .

Arbitrary real-valued random variables

All definitions of the expected value may be expressed in the language of

. In general, if is a real-valued

defined on a

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...

, then the expected value of , denoted by , is defined as the

Lebesgue integral In mathematics, the integral of a non-negative Function (mathematics), function of a single variable can be regarded, in the simplest case, as the area between the Graph of a function, graph of that function and the axis. The Lebesgue integral, ...

= \int_\Omega X\,d\operatorname.

Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of is defined via weighted averages of ''approximations'' of which take on finitely many values. Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical to the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable is said to be ''absolutely continuous'' if any of the following conditions are satisfied: * there is a nonnegative

measurable function In mathematics, and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in ...

on the real line such that

\operatorname(X \in A) = \int_A f(x) \, dx,

for any

Borel set In mathematics, a Borel set is any subset of a topological space that can be formed from its open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets ...

, in which the integral is Lebesgue. * the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

of is

absolutely continuous In calculus and real analysis, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship betwe ...

. * for any Borel set of real numbers with

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...

equal to zero, the probability of being valued in is also equal to zero * for any positive number there is a positive number such that: if is a Borel set with Lebesgue measure less than , then the probability of being valued in is less than . These conditions are all equivalent, although this is nontrivial to establish. In this definition, is called the ''probability density function'' of (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration, combined with the law of the unconscious statistician, it follows that

\equiv \int_\Omega X\,d\operatorname = \int_\Reals x f(x)\, dx

for any absolutely continuous random variable . The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable. Roland Uhl 2023 Charakterisierung des Erwartungswertes Bild1

Roland Uhl 2023 Charakterisierung des Erwartungswertes Bild1

The expected value of any real-valued random variable

X

can also be defined on the graph of its

F

by a nearby equality of areas. In fact,

\operatorname = \mu

with a real number

\mu

if and only if the two surfaces in the

x

y

-plane, described by

x \le \mu, \;\, 0\le y \le F(x) \quad\text\quad x \ge \mu, \;\, F(x) \le y \le 1

respectively, have the same finite area, i.e. if

\int_^\mu F(x)\,dx = \int_\mu^\infty \big(1 - F(x)\big)\,dx

and both improper Riemann integrals converge. Finally, this is equivalent to the representation

= \int_0^\infty \bigl(1 - F(x)\bigr) \, dx - \int_^0 F(x) \, dx,

also with convergent integrals. pp. 2–4.

Infinite expected values

Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of . This is intuitive, for example, in the case of the St. Petersburg paradox, in which one considers a random variable with possible outcomes , with associated probabilities , for ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has

\sum_^\infty x_i\,p_i = 2\cdot \frac+4\cdot\frac + 8\cdot\frac+ 16\cdot\frac+ \cdots = 1 + 1 + 1 + 1 + \cdots.

It is natural to say that the expected value equals . There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral. The first fundamental observation is that, whichever of the above definitions are followed, any ''nonnegative'' random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as . The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable , one defines the positive and negative parts by and . These are nonnegative random variables, and it can be directly checked that . Since and are both then defined as either nonnegative numbers or , it is then natural to define:

= \infty. \end

According to this definition, exists and is finite if and only if and are both finite. Due to the formula , this is the case if and only if is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations. * In the case of the St. Petersburg paradox, one has and so as desired. * Suppose the random variable takes values with respective probabilities . Then it follows that takes value with probability for each positive integer , and takes value with remaining probability. Similarly, takes value with probability for each positive integer and takes value with remaining probability. Using the definition for non-negative random variables, one can show that both and (see Harmonic series). Hence, in this case the expectation of is undefined. * Similarly, the Cauchy distribution, as discussed above, has undefined expectation.

Expected values of common distributions

The following table gives the expected values of some commonly occurring

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

s. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.

Properties

The basic properties below (and their names in bold) replicate or follow immediately from those of

. Note that the letters "a.s." stand for "

"—a central property of the Lebesgue integral. Basically, one says that an inequality like

X \geq 0

is true almost surely, when the probability measure attributes zero-mass to the complementary event

\left\.

* Non-negativity: If

X \geq 0

(a.s.), then

\geq 0.

* of expectation: The expected value operator (or ''expectation operator'')

\operatorname cdot /math> is

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

in the sense that, for any random variables

X

and

Y,

and a constant

a,

\end

whenever the right-hand side is well-defined. By induction, this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for

N

random variables

X_

and constants

a_ (1\leq i \leq N),

we have

\operatorname\left sum_^a_X_\right = \sum_^a_\operatorname_

If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a

linear form In mathematics, a linear form (also known as a linear functional, a one-form, or a covector) is a linear mapIn some texts the roles are reversed and vectors are defined as linear maps from covectors to scalars from a vector space to its field (mat ...

on this vector space. * Monotonicity: If

X\leq Y

(a.s.), and both

\operatorname /math> and \operatorname /math> exist, then \operatorname leq\operatorname Proof follows from the linearity and the non-negativity property for Z=Y-X, since Z\geq 0 (a.s.).
* Non-degeneracy: If \operatorname (a.s.), then \operatorname = \operatorname Y In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y.
* If X = c (a.s.) for some real number , then \operatorname = c. In particular, for a random variable X with well-defined expectation, \operatorname operatorname[X = \operatorname[X">.html" ;"title="operatorname[X">operatorname[X = \operatorname[X A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value.
* As a consequence of the formula  as discussed above, together with the triangle inequality, it follows that for any random variable X with well-defined expectation, one has, \operatorname \leq \operatorname, X, . * Let  denote the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

of an event , then is given by the probability of . This is nothing but a different way of stating the expectation of a Bernoulli random variable, as calculated in the table above. * Formulas in terms of CDF: If

F(x)

is the

of a random variable , then

= \int_^\infty x\,dF(x),

where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of Lebesgue-Stieltjes. As a consequence of

integration by parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivati ...

as applied to this representation of , it can be proved that

= \int_0^\infty (1-F(x))\,dx - \int^0_ F(x)\,dx,

with the integrals taken in the sense of Lebesgue. As a special case, for any random variable valued in the nonnegative integers , one has

= \sum _^\infty \Pr(X>n),

where denotes the underlying probability measure. * Non-multiplicativity: In general, the expected value is not multiplicative, i.e.

\operatorname Y /math> is not necessarily equal to \operatorname cdot \operatorname If X and Y are independent, then one can show that \operatorname Y \operatorname \operatorname If the random variables are dependent, then generally \operatorname Y \neq \operatorname \operatorname although in special cases of dependency the equality may hold.
* Law of the unconscious statistician : The expected value of a measurable function of X, g(X), given that X has a probability density function f(x), is given by the

inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, ofte ...

f

and

g

= \int_ g(x) f(x)\, dx .

This formula also holds in multidimensional case, when

g

is a function of several random variables, and

f

is their joint density.

Inequalities

Concentration inequalities control the likelihood of a random variable taking on large values.

Markov's inequality In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive Constant (mathematics), constant. Markov's inequality is tight in the sense that for e ...

is among the best-known and simplest to prove: for a ''nonnegative'' random variable and any positive number , it states that

\operatorname(X\geq a)\leq\frac.

If is any random variable with finite expectation, then Markov's inequality may be applied to the random variable to obtain Chebyshev's inequality

\geq a)\leq\frac,

where is the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

. These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two

standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...

s of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than is otherwise available. For example, in the case of an unweighted dice, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%. The Kolmogorov inequality extends the Chebyshev inequality to the context of sums of random variables. The following three inequalities are of fundamental importance in the field of

and its applications to probability theory. * Jensen's inequality: Let be a

convex function In mathematics, a real-valued function is called convex if the line segment between any two distinct points on the graph of a function, graph of the function lies above or on the graph between the two points. Equivalently, a function is conve ...

and a random variable with finite expectation. Then

f(\operatorname(X)) \leq \operatorname (f(X)).

Part of the assertion is that the negative part of has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of can be phrased as saying that the output of the weighted average of ''two'' inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that for positive numbers , one obtains the Lyapunov inequality

\left(\operatorname, X, ^s\right)^ \leq \left(\operatorname, X, ^t\right)^.

This can also be proved by the Hölder inequality. In measure theory, this is particularly notable for proving the inclusion of , in the special case of

s. *

Hölder's inequality In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality (mathematics), inequality between Lebesgue integration, integrals and an indispensable tool for the study of Lp space, spaces. The numbers an ...

: if and are numbers satisfying , then

\operatorname, XY, \leq(\operatorname, X, ^p)^(\operatorname, Y, ^q)^.

for any random variables and . The special case of is called the

Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is an upper bound on the absolute value of the inner product between two vectors in an inner product space in terms of the product of the vector norms. It is ...

, and is particularly well-known. * Minkowski inequality: given any number , for any random variables and with and both finite, it follows that is also finite and

\Bigl(\operatorname, X+Y, ^p\Bigr)^\leq\Bigl(\operatorname, X, ^p\Bigr)^+\Bigl(\operatorname, Y, ^p\Bigr)^.

The Hölder and Minkowski inequalities can be extended to general

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

s, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.

Expectations under convergence of random variables

In general, it is not the case that

\operatorname_n \to \operatorname /math> even if X_n\to X pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let U be a random variable distributed uniformly on,1 For n\geq 1, define a sequence of random variables X_n = n \cdot \mathbf\left\, with \mathbf\ being the indicator function of the event A. Then, it follows that X_n \to 0 pointwise. But, \operatorname_n = n \cdot \Pr\left(U \in \left 0, \tfrac\right \right) = n \cdot \tfrac = 1 for each n. Hence, \lim_ \operatorname_n = 1 \neq 0 = \operatorname\left \lim_ X_n \right Analogously, for general sequence of random variables \, the expected value operator is not \sigma -additive, i.e. \operatorname\left sum^\infty_ Y_n\right \neq \sum^\infty_\operatorname_n An example is easily obtained by setting Y_0 = X_1 and Y_n = X_ - X_n for n \geq 1, where X_n is as in the previous example.

A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.
*

Monotone convergence theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the good convergence behaviour of monotonic sequences, i.e. sequences that are non- increasing, or non- decreasing. In its ...

: Let

\

be a sequence of random variables, with

0 \leq X_n \leq X_

(a.s) for each

n \geq 0.

Furthermore, let

X_n \to X

pointwise. Then, the monotone convergence theorem states that

\lim_n\operatorname_n \operatorname

Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let

\_^\infty

be non-negative random variables. It follows from the

monotone convergence theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the good convergence behaviour of monotonic sequences, i.e. sequences that are non- increasing, or non- decreasing. In its ...

that

\operatorname\left sum^\infty_X_i\right = \sum^\infty_\operatorname_i

Fatou's lemma In mathematics, Fatou's lemma establishes an inequality (mathematics), inequality relating the Lebesgue integral of the limit superior and limit inferior, limit inferior of a sequence of function (mathematics), functions to the limit inferior of ...

: Let

\

be a sequence of non-negative random variables. Fatou's lemma states that

\operatorname liminf_n X_n \leq \liminf_n \operatorname_n

Corollary. Let

X_n \geq 0

with

\leq C

for all

n \geq 0.

X_n \to X

(a.s), then

\leq C.

Proof is by observing that

X = \liminf_n X_n

(a.s.) and applying Fatou's lemma. *

Dominated convergence theorem In measure theory, Lebesgue's dominated convergence theorem gives a mild sufficient condition under which limits and integrals of a sequence of functions can be interchanged. More technically it says that if a sequence of functions is bounded i ...

: Let

\

be a sequence of random variables. If

X_n\to X

pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some Function (mathematics), function f. An important class of pointwise concepts are the ''pointwise operations'', that ...

(a.s.),

, X_n, \leq Y \leq +\infty

(a.s.), and

\infty.

Then, according to the dominated convergence theorem, **

<\infty

; **

\lim_n\operatorname_n \operatorname /math>
** \lim_n\operatorname, X_n - X,  = 0. * Uniform integrability : In some cases, the equality \lim_n\operatorname_n \operatorname lim_n X_n /math> holds when the sequence \ is ''uniformly integrable.''

Relationship with characteristic function

The probability density function

f_X

of a scalar random variable

X

is related to its characteristic function

\varphi_X

by the inversion formula:

f_X(x) = \frac\int_ e^\varphi_X(t) \, dt.

For the expected value of

g(X)

(where

g:\to

is a Borel function), we can use this inversion formula to obtain

\operatorname (X) = \frac \int_\Reals g(x) \left \int_\Reals e^\varphi_X(t) \, dt \right dx.

\operatorname (X) /math> is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem, \operatorname (X) = \frac \int_\Reals G(t) \varphi_X(t) \, dt, where G(t) = \int_\Reals g(x) e^ \, dx is the

Fourier transform In mathematics, the Fourier transform (FT) is an integral transform that takes a function as input then outputs another function that describes the extent to which various frequencies are present in the original function. The output of the tr ...

g(x).

The expression for

\operatorname (X) /math> also follows directly from the Plancherel theorem .

Uses and applications

The expectation of a random variable plays an important role in a variety of contexts. In

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, where one seeks estimates for unknown parameters based on available data gained from samples, the sample mean serves as an estimate for the expectation, and is itself a random variable. In such settings, the sample mean is considered to meet the desirable criterion for a "good" estimator in being

unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...

; that is, the expected value of the estimate is equal to the

true value The True Value Company is an American wholesaler and Hardware store brand. The corporate headquarters are located in Chicago. Historically True Value was a cooperative owned by retailers, but in 2018 it was purchased by ACON Investments. In Oc ...

of the underlying parameter. For a different example, in

decision theory Decision theory or the theory of rational choice is a branch of probability theory, probability, economics, and analytic philosophy that uses expected utility and probabilities, probability to model how individuals would behave Rationality, ratio ...

, an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their

utility function In economics, utility is a measure of a certain person's satisfaction from a certain state of the world. Over time, the term has been used with at least two meanings. * In a Normative economics, normative context, utility refers to a goal or ob ...

. It is possible to construct an expected value equal to the probability of an event by taking the expectation of an

that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies. The expected values of the powers of ''X'' are called the moments of ''X''; the moments about the mean of ''X'' are expected values of powers of . The moments of some random variables can be used to specify their distributions, via their moment generating functions. To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the

of the results. If the expected value exists, this procedure estimates the true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the

size Size in general is the Magnitude (mathematics), magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to three geometrical measures: length, area, or volume. Length can be generalized ...

of the sample gets larger, the

of this estimate gets smaller. This property is often exploited in a wide variety of applications, including general problems of statistical estimation and

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

, to estimate (probabilistic) quantities of interest via

Monte Carlo methods Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on Resampling (statistics), repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve pr ...

, since most quantities of interest can be written in terms of expectation, e.g.

\operatorname() = \operatorname

where

_

is the indicator function of the set

\mathcal.

classical mechanics Classical mechanics is a Theoretical physics, physical theory describing the motion of objects such as projectiles, parts of Machine (mechanical), machinery, spacecraft, planets, stars, and galaxies. The development of classical mechanics inv ...

, the

center of mass In physics, the center of mass of a distribution of mass in space (sometimes referred to as the barycenter or balance point) is the unique point at any given time where the weight function, weighted relative position (vector), position of the d ...

is an analogous concept to expectation. For example, suppose ''X'' is a discrete random variable with values ''x_i'' and corresponding probabilities ''p_i.'' Now consider a weightless rod on which are placed weights, at locations ''x_i'' along the rod and having masses ''p_i'' (whose sum is one). The point at which the rod balances is E 'X'' Expected values can also be used to compute the variance, by means of the computational formula for the variance

^2

A very important application of the expectation value is in the field of

quantum mechanics Quantum mechanics is the fundamental physical Scientific theory, theory that describes the behavior of matter and of light; its unusual characteristics typically occur at and below the scale of atoms. Reprinted, Addison-Wesley, 1989, It is ...

. The expectation value of a quantum mechanical operator

\hat

operating on a

quantum state In quantum physics, a quantum state is a mathematical entity that embodies the knowledge of a quantum system. Quantum mechanics specifies the construction, evolution, and measurement of a quantum state. The result is a prediction for the system ...

vector

, \psi\rangle

is written as

\langle\hat\rangle = \langle\psi, \hat, \psi\rangle.

The

uncertainty Uncertainty or incertitude refers to situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown, and is particularly relevant for decision ...

\hat

can be calculated by the formula

(\Delta A)^2 = \langle\hat^2\rangle - \langle \hat \rangle^2

References

Bibliography

* * * * * * * * * {{DEFAULTSORT:Expected Value Theory of probability distributions Gambling terminology Articles containing proofs Expected utility>X, 0, then

X=0

(a.s.). * If

X = Y

(a.s.), then

\operatorname = \operatorname Y

In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y. * If

X = c

(a.s.) for some real number , then

\operatorname = c.

In particular, for a random variable

X

with well-defined expectation,

= \operatorname[X">.html" ;"title="operatorname[X">operatorname[X = \operatorname[X

A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value. * As a consequence of the formula as discussed above, together with the triangle inequality, it follows that for any random variable

X

with well-defined expectation, one has

\leq \operatorname, X, .

* Let denote the

F(x)

is the

of a random variable , then

= \int_^\infty x\,dF(x),

where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of Lebesgue-Stieltjes. As a consequence of

as applied to this representation of , it can be proved that

= \int_0^\infty (1-F(x))\,dx - \int^0_ F(x)\,dx,

with the integrals taken in the sense of Lebesgue. As a special case, for any random variable valued in the nonnegative integers , one has

= \sum _^\infty \Pr(X>n),

where denotes the underlying probability measure. * Non-multiplicativity: In general, the expected value is not multiplicative, i.e.

\operatorname Y /math> is not necessarily equal to \operatorname cdot \operatorname If X and Y are independent, then one can show that \operatorname Y \operatorname \operatorname If the random variables are dependent, then generally \operatorname Y \neq \operatorname \operatorname although in special cases of dependency the equality may hold.
* Law of the unconscious statistician : The expected value of a measurable function of X, g(X), given that X has a probability density function f(x), is given by the

f

and

g

= \int_ g(x) f(x)\, dx .

This formula also holds in multidimensional case, when

g

is a function of several random variables, and

f

is their joint density.

Inequalities

Concentration inequalities control the likelihood of a random variable taking on large values.

is among the best-known and simplest to prove: for a ''nonnegative'' random variable and any positive number , it states that

\operatorname(X\geq a)\leq\frac.

If is any random variable with finite expectation, then Markov's inequality may be applied to the random variable to obtain Chebyshev's inequality

\geq a)\leq\frac,

where is the

and its applications to probability theory. * Jensen's inequality: Let be a

and a random variable with finite expectation. Then

f(\operatorname(X)) \leq \operatorname (f(X)).

\left(\operatorname, X, ^s\right)^ \leq \left(\operatorname, X, ^t\right)^.

This can also be proved by the Hölder inequality. In measure theory, this is particularly notable for proving the inclusion of , in the special case of

s. *

: if and are numbers satisfying , then

\operatorname, XY, \leq(\operatorname, X, ^p)^(\operatorname, Y, ^q)^.

for any random variables and . The special case of is called the

, and is particularly well-known. * Minkowski inequality: given any number , for any random variables and with and both finite, it follows that is also finite and

\Bigl(\operatorname, X+Y, ^p\Bigr)^\leq\Bigl(\operatorname, X, ^p\Bigr)^+\Bigl(\operatorname, Y, ^p\Bigr)^.

The Hölder and Minkowski inequalities can be extended to general

s, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.

Expectations under convergence of random variables

In general, it is not the case that

\operatorname_n \to \operatorname /math> even if X_n\to X pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let U be a random variable distributed uniformly on,1 For n\geq 1, define a sequence of random variables X_n = n \cdot \mathbf\left\, with \mathbf\ being the indicator function of the event A. Then, it follows that X_n \to 0 pointwise. But, \operatorname_n = n \cdot \Pr\left(U \in \left 0, \tfrac\right \right) = n \cdot \tfrac = 1 for each n. Hence, \lim_ \operatorname_n = 1 \neq 0 = \operatorname\left \lim_ X_n \right Analogously, for general sequence of random variables \, the expected value operator is not \sigma -additive, i.e. \operatorname\left sum^\infty_ Y_n\right \neq \sum^\infty_\operatorname_n An example is easily obtained by setting Y_0 = X_1 and Y_n = X_ - X_n for n \geq 1, where X_n is as in the previous example.

A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.
*

: Let

\

be a sequence of random variables, with

0 \leq X_n \leq X_

(a.s) for each

n \geq 0.

Furthermore, let

X_n \to X

pointwise. Then, the monotone convergence theorem states that

\lim_n\operatorname_n \operatorname

Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let

\_^\infty

be non-negative random variables. It follows from the

that

\operatorname\left sum^\infty_X_i\right = \sum^\infty_\operatorname_i

: Let

\

be a sequence of non-negative random variables. Fatou's lemma states that

\operatorname liminf_n X_n \leq \liminf_n \operatorname_n

Corollary. Let

X_n \geq 0

with

\leq C

for all

n \geq 0.

X_n \to X

(a.s), then

\leq C.

Proof is by observing that

X = \liminf_n X_n

(a.s.) and applying Fatou's lemma. *

: Let

\

be a sequence of random variables. If

X_n\to X

(a.s.),

, X_n, \leq Y \leq +\infty

(a.s.), and

\infty.

Then, according to the dominated convergence theorem, **

<\infty

; **

\lim_n\operatorname_n \operatorname /math>
** \lim_n\operatorname, X_n - X,  = 0. * Uniform integrability : In some cases, the equality \lim_n\operatorname_n \operatorname lim_n X_n /math> holds when the sequence \ is ''uniformly integrable.''

Relationship with characteristic function

The probability density function

f_X

of a scalar random variable

X

is related to its characteristic function

\varphi_X

by the inversion formula:

f_X(x) = \frac\int_ e^\varphi_X(t) \, dt.

For the expected value of

g(X)

(where

g:\to

is a Borel function), we can use this inversion formula to obtain

\operatorname (X) = \frac \int_\Reals g(x) \left \int_\Reals e^\varphi_X(t) \, dt \right dx.

\operatorname (X) /math> is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem, \operatorname (X) = \frac \int_\Reals G(t) \varphi_X(t) \, dt, where G(t) = \int_\Reals g(x) e^ \, dx is the

g(x).

The expression for

\operatorname (X) /math> also follows directly from the Plancherel theorem .

Uses and applications

The expectation of a random variable plays an important role in a variety of contexts. In

; that is, the expected value of the estimate is equal to the

of the underlying parameter. For a different example, in

, an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their

. It is possible to construct an expected value equal to the probability of an event by taking the expectation of an

of the sample gets larger, the

of this estimate gets smaller. This property is often exploited in a wide variety of applications, including general problems of statistical estimation and

, to estimate (probabilistic) quantities of interest via

, since most quantities of interest can be written in terms of expectation, e.g.

\operatorname() = \operatorname

where

_

is the indicator function of the set

\mathcal.

, the

^2

A very important application of the expectation value is in the field of

. The expectation value of a quantum mechanical operator

\hat

operating on a

vector

, \psi\rangle

is written as

\langle\hat\rangle = \langle\psi, \hat, \psi\rangle.

The

\hat

can be calculated by the formula

(\Delta A)^2 = \langle\hat^2\rangle - \langle \hat \rangle^2

References

Bibliography

* * * * * * * * * {{DEFAULTSORT:Expected Value Theory of probability distributions Gambling terminology Articles containing proofs Expected utility

History

Etymology

Notations

Definition

Random variables with finitely many outcomes

Examples

Random variables with countably infinitely many outcomes

Examples

Random variables with density

Arbitrary real-valued random variables

Infinite expected values

Expected values of common distributions

Properties

Inequalities

Expectations under convergence of random variables

Relationship with characteristic function

Uses and applications

See also

References

Bibliography

Inequalities

Expectations under convergence of random variables

Relationship with characteristic function

Uses and applications

See also

References

Bibliography