In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the
weighted average
The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
. Informally, the expected value is the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
of a large number of
independently selected
outcomes of a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
.
The expected value of a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
with a finite number of outcomes is a
weighted average
The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by
integration
Integration may refer to:
Biology
*Multisensory integration
*Path integration
* Pre-integration complex, viral genetic material used to insert a viral genome into a host genome
*DNA integration, by means of site-specific recombinase technology, ...
. In the axiomatic foundation for probability provided by
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...
, the expectation is given by
Lebesgue integration
In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Leb ...
.
The expected value of a random variable is often denoted by , , or , with also often stylized as or
History
The idea of the expected value originated in the middle of the 17th century from the study of the so-called
problem of points
The problem of points, also called the problem of division of the stakes, is a classical problem in probability theory. One of the famous problems that motivated the beginnings of modern probability theory in the 17th century, it led Blaise Pascal ...
, which seeks to divide the stakes ''in a fair way'' between two players, who have to end their game before it is properly finished. This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to
Blaise Pascal
Blaise Pascal ( , , ; ; 19 June 1623 – 19 August 1662) was a French mathematician, physicist, inventor, philosopher, and Catholic Church, Catholic writer.
He was a child prodigy who was educated by his father, a tax collector in Rouen. Pa ...
by French writer and amateur mathematician
Chevalier de Méré Antoine Gombaud, ''alias'' Chevalier de Méré, (1607 – 29 December 1684) was a French people, French writer, born in Poitou.E. Feuillâtre (Editor), ''Les Épistoliers Du XVIIe Siècle. Avec des Notices biographiques, des Notices littéraires ...
in 1654. Méré claimed that this problem couldn't be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all.
He began to discuss the problem in the famous series of letters to
Pierre de Fermat
Pierre de Fermat (; between 31 October and 6 December 1607 – 12 January 1665) was a French mathematician who is given credit for early developments that led to infinitesimal calculus, including his technique of adequality. In particular, he ...
. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it.
In Dutch mathematician
Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see
Huygens (1657)) "''De ratiociniis in ludo aleæ''" on probability theory just after visiting Paris. The book extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the
theory of probability
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
.
In the foreword to his treatise, Huygens wrote:
During his visit to France in 1655, Huygens learned about
de Méré's Problem Antoine Gombaud, ''alias'' Chevalier de Méré, (1607 – 29 December 1684) was a French writer, born in Poitou.E. Feuillâtre (Editor), ''Les Épistoliers Du XVIIe Siècle. Avec des Notices biographiques, des Notices littéraires, des Notes ex ...
. From his correspondence with Carcavine a year later (in 1656), he realized his method was essentially the same as Pascal's. Therefore, he knew about Pascal's priority in this subject before his book went to press in 1657.
In the mid-nineteenth century,
Pafnuty Chebyshev
Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics.
Chebyshe ...
became the first person to think systematically in terms of the expectations of
random variables.
Etymology
Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes:
More than a hundred years later, in 1814,
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...
published his tract "''Théorie analytique des probabilités''", where the concept of expected value was defined explicitly:
Notations
The use of the letter to denote expected value goes back to
W. A. Whitworth in 1901. The symbol has become popular since then for English writers. In German, stands for "Erwartungswert", in Spanish for "Esperanza matemática", and in French for "Espérance mathématique".
When "E" is used to denote expected value, authors use a variety of stylization: the expectation operator can be stylized as (upright), (italic), or
(in
blackboard bold
Blackboard bold is a typeface style that is often used for certain symbols in mathematical texts, in which certain lines of the symbol (usually vertical or near-vertical lines) are doubled. The symbols usually denote number sets. One way of pro ...
), while a variety of bracket notations (such as , , and ) are all used.
Another popular notation is , whereas , , and
are commonly used in physics, and in Russian-language literature.
Definition
As discussed below, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
s, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...
and
Lebesgue integration
In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Leb ...
, which provide these different contexts with an axiomatic foundation and common language.
Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a
random vector
In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
. It is defined component by component, as . Similarly, one may define the expected value of a
random matrix
In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathemat ...
with components by .
Random variables with finitely many outcomes
Consider a random variable with a ''finite'' list of possible outcomes, each of which (respectively) has probability of occurring. The expectation of is defined as
:
Since the probabilities must satisfy , it is natural to interpret as a
weighted average
The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of the values, with weights given by their probabilities .
In the special case that all possible outcomes are
equiprobable
Equiprobability is a property for a collection of events that each have the same probability of occurring. In statistics and probability theory it is applied in the discrete uniform distribution and the equidistribution theorem for rational numb ...
(that is, ), the weighted average is given by the standard
average
In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7 ...
. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.
Examples
*Let
represent the outcome of a roll of a fair six-sided . More specifically,
will be the number of
pips showing on the top face of the after the toss. The possible values for
are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of . The expectation of
is
::
:If one rolls the
times and computes the average (
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
) of the results, then as
grows, the average will
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0 ...
converge
Converge may refer to:
* Converge (band), American hardcore punk band
* Converge (Baptist denomination), American national evangelical Baptist body
* Limit (mathematics)
* Converge ICT, internet service provider in the Philippines
*CONVERGE CFD s ...
to the expected value, a fact known as the
strong law of large numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
.
*The
roulette
Roulette is a casino game named after the French word meaning ''little wheel'' which was likely developed from the Italian game Biribi''.'' In the game, a player may choose to place a bet on a single number, various groupings of numbers, the ...
game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable
represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be
::
:That is, the expected value to be won from a $1 bet is −$. Thus, in 190 bets, the net loss will probably be about $10.
Random variables with countably many outcomes
Informally, the expectation of a random variable with a
countable set
In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbers; ...
of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that
:
where are the possible outcomes of the random variable and are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context.
However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the
Riemann series theorem
In mathematics, the Riemann series theorem (also called the Riemann rearrangement theorem), named after 19th-century German mathematician Bernhard Riemann, says that if an infinite series of real numbers is conditionally convergent, then its terms ...
of
mathematical analysis
Analysis is the branch of mathematics dealing with continuous functions, limit (mathematics), limits, and related theories, such as Derivative, differentiation, Integral, integration, measure (mathematics), measure, infinite sequences, series (m ...
illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely.
For this reason, many mathematical textbooks only consider the case that the infinite sum given above
converges absolutely
In mathematics, an infinite series of numbers is said to converge absolutely (or to be absolutely convergent) if the sum of the absolute values of the summands is finite. More precisely, a real or complex series \textstyle\sum_^\infty a_n is said ...
, which implies that the infinite sum is a finite number independent of the ordering of summands. In the alternative case that the infinite sum does not converge absolutely, one says the random variable ''does not have finite expectation.''
Examples
*Suppose
and
for
where
is the scaling factor which makes the probabilities sum to 1. Then, using the direct definition for non-negative random variables, we have
Random variables with density
Now consider a random variable which has a
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
given by a function on the
real number line
In elementary mathematics, a number line is a picture of a graduated straight line that serves as visual representation of the real numbers. Every point of a number line is assumed to correspond to a real number, and every real number to a poin ...
. This means that the probability of taking on a value in any given
open interval
In mathematics, a (real) interval is a set of real numbers that contains all real numbers lying between any two numbers of the set. For example, the set of numbers satisfying is an interval which contains , , and all numbers in between. Other ...
is given by the
integral
In mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented i ...
of over that interval. The expectation of is then given by the integral
:
A general and mathematically precise formulation of this definition uses
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...
and
Lebesgue integration
In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Leb ...
, and the corresponding theory of ''absolutely continuous random variables'' is described in the next section. The density functions of many common distributions are
piecewise continuous
In mathematics, a piecewise-defined function (also called a piecewise function, a hybrid function, or definition by cases) is a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain. P ...
, and as such the theory is often developed in this restricted setting. For such functions, it is sufficient to only consider the standard
Riemann integration
In the branch of mathematics known as real analysis, the Riemann integral, created by Bernhard Riemann, was the first rigorous definition of the integral of a function on an interval. It was presented to the faculty at the University of Gö ...
. Sometimes ''continuous random variables'' are defined as those corresponding to this special class of densities, although the term is used differently by various authors.
Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of is given by the
Cauchy distribution
The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
, so that . It is straightforward to compute in this case that
:
The limit of this expression as and does not exist: if the limits are taken so that , then the limit is zero, while if the constraint is taken, then the limit is .
To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral
converges absolutely
In mathematics, an infinite series of numbers is said to converge absolutely (or to be absolutely convergent) if the sum of the absolute values of the summands is finite. More precisely, a real or complex series \textstyle\sum_^\infty a_n is said ...
, with left undefined otherwise. However, measure-theoretic notions as given below can be used to give a systematic definition of for more general random variables .
Arbitrary real-valued random variables
All definitions of the expected value may be expressed in the language of
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...
. In general, if is a real-valued
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
defined on a
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
, then the expected value of , denoted by , is defined as the
Lebesgue integral
In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Lebe ...
:
Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of is defined via weighted averages of ''approximations'' of which take on finitely many values. Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical with the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable is said to be ''absolutely continuous'' if any of the following conditions are satisfied:
* there is a nonnegative
measurable function on the real line such that
::
:for any
Borel set
In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are named ...
, in which the integral is Lebesgue.
* the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
of is
absolutely continuous
In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...
.
* for any Borel set of real numbers with
Lebesgue measure equal to zero, the probability of being valued in is also equal to zero
* for any positive number there is a positive number such that: if is a Borel set with Lebesgue measure less than , then the probability of being valued in is less than .
These conditions are all equivalent, although this is nontrivial to establish. In this definition, is called the ''probability density function'' of (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration, combined with the
law of the unconscious statistician In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the expected value of a function ''g''(''X'') of a random variable ''X'' when one knows the probability distribution of ''X'' but ...
, it follows that
:
for any absolutely continuous random variable . The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.
Infinite expected values
Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of . This is intuitive, for example, in the case of the
St. Petersburg paradox
The St. Petersburg paradox or St. Petersburg lottery is a paradox involving the game of flipping a coin where the expected payoff of the theoretical lottery game approaches infinity but nevertheless seems to be worth only a very small amount to t ...
, in which one considers a random variable with possible outcomes , with associated probabilities , for ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has
It is natural to say that the expected value equals .
There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral. The first fundamental observation is that, whichever of the above definitions are followed, any ''nonnegative'' random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as . The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable , one defines the
positive and negative parts
In mathematics, the positive part of a real or extended real-valued function is defined by the formula
: f^+(x) = \max(f(x),0) = \begin f(x) & \mbox f(x) > 0 \\ 0 & \mbox \end
Intuitively, the graph of f^+ is obtained by taking the graph of f, ...
by and . These are nonnegative random variables, and it can be directly checked that . Since and are both then defined as either nonnegative numbers or , it is then natural to define:
According to this definition, exists and is finite if and only if and are both finite. Due to the formula , this is the case if and only if is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.
*In the case of the St. Petersburg paradox, one has and so as desired.
* Suppose the random variable takes values with respective probabilities . Then it follows that takes value with probability for each positive integer , and takes value with remaining probability. Similarly, takes value with probability for each positive integer and takes value with remaining probability. Using the definition for non-negative random variables, one can show that both and (see
Harmonic series). Hence, in this case the expectation of is undefined.
* Similarly, the Cauchy distribution, as discussed above, has undefined expectation.
Expected values of common distributions
The following table gives the expected values of some commonly occurring probability distributions. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.
Properties
The basic properties below (and their names in bold) replicate or follow immediately from those of
Lebesgue integral
In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Lebe ...
. Note that the letters "a.s." stand for "
almost surely
In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0 ...
"—a central property of the Lebesgue integral. Basically, one says that an inequality like
is true almost surely, when the probability measure attributes zero-mass to the complementary event
.
*Non-negativity: If
(a.s.), then
.
*Linearity of expectation:
The expected value operator (or expectation operator)