HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the
weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
. Informally, the expected value is the arithmetic mean of a large number of independently selected outcomes of a random variable. The expected value of a random variable with a finite number of outcomes is a
weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by integration. In the axiomatic foundation for probability provided by measure theory, the expectation is given by Lebesgue integration. The expected value of a random variable is often denoted by , , or , with also often stylized as or \mathbb.


History

The idea of the expected value originated in the middle of the 17th century from the study of the so-called
problem of points The problem of points, also called the problem of division of the stakes, is a classical problem in probability theory. One of the famous problems that motivated the beginnings of modern probability theory in the 17th century, it led Blaise Pascal ...
, which seeks to divide the stakes ''in a fair way'' between two players, who have to end their game before it is properly finished. This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to Blaise Pascal by French writer and amateur mathematician Chevalier de Méré in 1654. Méré claimed that this problem couldn't be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being a mathematician, was provoked and determined to solve the problem once and for all. He began to discuss the problem in the famous series of letters to
Pierre de Fermat Pierre de Fermat (; between 31 October and 6 December 1607 – 12 January 1665) was a French mathematician who is given credit for early developments that led to infinitesimal calculus, including his technique of adequality. In particular, he ...
. Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed a small circle of mutual scientific friends in Paris about it. In Dutch mathematician Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see Huygens (1657)) "''De ratiociniis in ludo aleæ''" on probability theory just after visiting Paris. The book extended the concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the
theory of probability Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
. In the foreword to his treatise, Huygens wrote: During his visit to France in 1655, Huygens learned about de Méré's Problem. From his correspondence with Carcavine a year later (in 1656), he realized his method was essentially the same as Pascal's. Therefore, he knew about Pascal's priority in this subject before his book went to press in 1657. In the mid-nineteenth century,
Pafnuty Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebyshe ...
became the first person to think systematically in terms of the expectations of random variables.


Etymology

Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes: More than a hundred years later, in 1814,
Pierre-Simon Laplace Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...
published his tract "''Théorie analytique des probabilités''", where the concept of expected value was defined explicitly:


Notations

The use of the letter to denote expected value goes back to W. A. Whitworth in 1901. The symbol has become popular since then for English writers. In German, stands for "Erwartungswert", in Spanish for "Esperanza matemática", and in French for "Espérance mathématique". When "E" is used to denote expected value, authors use a variety of stylization: the expectation operator can be stylized as (upright), (italic), or \mathbb (in
blackboard bold Blackboard bold is a typeface style that is often used for certain symbols in mathematical texts, in which certain lines of the symbol (usually vertical or near-vertical lines) are doubled. The symbols usually denote number sets. One way of pro ...
), while a variety of bracket notations (such as , , and ) are all used. Another popular notation is , whereas , , and \overline are commonly used in physics, and in Russian-language literature.


Definition

As discussed below, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
s, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of measure theory and Lebesgue integration, which provide these different contexts with an axiomatic foundation and common language. Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a
random vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value ...
. It is defined component by component, as . Similarly, one may define the expected value of a
random matrix In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathemat ...
with components by .


Random variables with finitely many outcomes

Consider a random variable with a ''finite'' list of possible outcomes, each of which (respectively) has probability of occurring. The expectation of is defined as :\operatorname =x_1p_1 + x_2p_2 + \cdots + x_kp_k. Since the probabilities must satisfy , it is natural to interpret as a
weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of the values, with weights given by their probabilities . In the special case that all possible outcomes are
equiprobable Equiprobability is a property for a collection of events that each have the same probability of occurring. In statistics and probability theory it is applied in the discrete uniform distribution and the equidistribution theorem for rational numb ...
(that is, ), the weighted average is given by the standard
average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7 ...
. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.


Examples

*Let X represent the outcome of a roll of a fair six-sided . More specifically, X will be the number of pips showing on the top face of the after the toss. The possible values for X are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of . The expectation of X is :: \operatorname = 1\cdot\frac16 + 2\cdot\frac16 + 3\cdot\frac16 + 4\cdot\frac16 + 5\cdot\frac16 + 6\cdot\frac16 = 3.5. :If one rolls the n times and computes the average ( arithmetic mean) of the results, then as n grows, the average will
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0 ...
converge Converge may refer to: * Converge (band), American hardcore punk band * Converge (Baptist denomination), American national evangelical Baptist body * Limit (mathematics) * Converge ICT, internet service provider in the Philippines *CONVERGE CFD s ...
to the expected value, a fact known as the
strong law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
. *The roulette game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable X represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be :: \operatorname ,\text\$1\text\,= -\$1 \cdot \frac + \$35 \cdot \frac = -\$\frac. :That is, the expected value to be won from a $1 bet is −$. Thus, in 190 bets, the net loss will probably be about $10.


Random variables with countably many outcomes

Informally, the expectation of a random variable with a countable set of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that : \operatorname = \sum_^\infty x_i\, p_i, where are the possible outcomes of the random variable and are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context. However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the
Riemann series theorem In mathematics, the Riemann series theorem (also called the Riemann rearrangement theorem), named after 19th-century German mathematician Bernhard Riemann, says that if an infinite series of real numbers is conditionally convergent, then its terms ...
of
mathematical analysis Analysis is the branch of mathematics dealing with continuous functions, limit (mathematics), limits, and related theories, such as Derivative, differentiation, Integral, integration, measure (mathematics), measure, infinite sequences, series (m ...
illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely. For this reason, many mathematical textbooks only consider the case that the infinite sum given above
converges absolutely In mathematics, an infinite series of numbers is said to converge absolutely (or to be absolutely convergent) if the sum of the absolute values of the summands is finite. More precisely, a real or complex series \textstyle\sum_^\infty a_n is said ...
, which implies that the infinite sum is a finite number independent of the ordering of summands. In the alternative case that the infinite sum does not converge absolutely, one says the random variable ''does not have finite expectation.''


Examples

*Suppose x_i = i and p_i = \tfrac for i = 1, 2, 3, \ldots, where c = \tfrac is the scaling factor which makes the probabilities sum to 1. Then, using the direct definition for non-negative random variables, we have \operatorname \,= \sum_i x_i p_i = 1(\tfrac) + 2(\tfrac) + 3 (\tfrac) + \cdots \,= \, \tfrac + \tfrac + \tfrac + \cdots \,=\, c \,=\, \tfrac.


Random variables with density

Now consider a random variable which has a
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
given by a function on the
real number line In elementary mathematics, a number line is a picture of a graduated straight line that serves as visual representation of the real numbers. Every point of a number line is assumed to correspond to a real number, and every real number to a poin ...
. This means that the probability of taking on a value in any given open interval is given by the
integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along wit ...
of over that interval. The expectation of is then given by the integral : \operatorname = \int_^\infty x f(x)\, dx. A general and mathematically precise formulation of this definition uses measure theory and Lebesgue integration, and the corresponding theory of ''absolutely continuous random variables'' is described in the next section. The density functions of many common distributions are
piecewise continuous In mathematics, a piecewise-defined function (also called a piecewise function, a hybrid function, or definition by cases) is a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain. P ...
, and as such the theory is often developed in this restricted setting. For such functions, it is sufficient to only consider the standard
Riemann integration In the branch of mathematics known as real analysis, the Riemann integral, created by Bernhard Riemann, was the first rigorous definition of the integral of a function on an interval. It was presented to the faculty at the University of Gö ...
. Sometimes ''continuous random variables'' are defined as those corresponding to this special class of densities, although the term is used differently by various authors. Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of is given by the
Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
, so that . It is straightforward to compute in this case that :\int_a^b xf(x)\,dx=\int_a^b \frac\,dx=\frac\ln\frac. The limit of this expression as and does not exist: if the limits are taken so that , then the limit is zero, while if the constraint is taken, then the limit is . To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral
converges absolutely In mathematics, an infinite series of numbers is said to converge absolutely (or to be absolutely convergent) if the sum of the absolute values of the summands is finite. More precisely, a real or complex series \textstyle\sum_^\infty a_n is said ...
, with left undefined otherwise. However, measure-theoretic notions as given below can be used to give a systematic definition of for more general random variables .


Arbitrary real-valued random variables

All definitions of the expected value may be expressed in the language of measure theory. In general, if is a real-valued random variable defined on a
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
, then the expected value of , denoted by , is defined as the
Lebesgue integral In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Lebe ...
:\operatorname = \int_\Omega X\,d\operatorname. Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of is defined via weighted averages of ''approximations'' of which take on finitely many values. Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical with the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable is said to be ''absolutely continuous'' if any of the following conditions are satisfied: * there is a nonnegative measurable function on the real line such that ::\text(X\in A)=\int_A f(x)\,dx, :for any
Borel set In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are na ...
, in which the integral is Lebesgue. * the cumulative distribution function of is
absolutely continuous In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...
. * for any Borel set of real numbers with Lebesgue measure equal to zero, the probability of being valued in is also equal to zero * for any positive number there is a positive number such that: if is a Borel set with Lebesgue measure less than , then the probability of being valued in is less than . These conditions are all equivalent, although this is nontrivial to establish. In this definition, is called the ''probability density function'' of (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration, combined with the
law of the unconscious statistician In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the expected value of a function ''g''(''X'') of a random variable ''X'' when one knows the probability distribution of ''X'' but ...
, it follows that :\operatorname equiv\int_\Omega X\,d\operatorname=\int_xf(x)\,dx for any absolutely continuous random variable . The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.


Infinite expected values

Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of . This is intuitive, for example, in the case of the
St. Petersburg paradox The St. Petersburg paradox or St. Petersburg lottery is a paradox involving the game of flipping a coin where the expected payoff of the theoretical lottery game approaches infinity but nevertheless seems to be worth only a very small amount to t ...
, in which one considers a random variable with possible outcomes , with associated probabilities , for ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has \operatorname \sum_^\infty x_i\,p_i =2\cdot \frac+4\cdot\frac + 8\cdot\frac+ 16\cdot\frac+ \cdots = 1 + 1 + 1 + 1 + \cdots. It is natural to say that the expected value equals . There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral. The first fundamental observation is that, whichever of the above definitions are followed, any ''nonnegative'' random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as . The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable , one defines the positive and negative parts by and . These are nonnegative random variables, and it can be directly checked that . Since and are both then defined as either nonnegative numbers or , it is then natural to define: \operatorname = \begin \operatorname ^+- \operatorname ^-& \text \operatorname ^+< \infty \text \operatorname ^-< \infty;\\ +\infty & \text \operatorname ^+= \infty \text \operatorname ^-< \infty;\\ -\infty & \text \operatorname ^+< \infty \text \operatorname ^-= \infty;\\ \text & \text \operatorname ^+= \infty \text \operatorname ^-= \infty. \end According to this definition, exists and is finite if and only if and are both finite. Due to the formula , this is the case if and only if is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations. *In the case of the St. Petersburg paradox, one has and so as desired. * Suppose the random variable takes values with respective probabilities . Then it follows that takes value with probability for each positive integer , and takes value with remaining probability. Similarly, takes value with probability for each positive integer and takes value with remaining probability. Using the definition for non-negative random variables, one can show that both and (see Harmonic series). Hence, in this case the expectation of is undefined. * Similarly, the Cauchy distribution, as discussed above, has undefined expectation.


Expected values of common distributions

The following table gives the expected values of some commonly occurring probability distributions. The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references.


Properties

The basic properties below (and their names in bold) replicate or follow immediately from those of
Lebesgue integral In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Lebe ...
. Note that the letters "a.s." stand for "
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0 ...
"—a central property of the Lebesgue integral. Basically, one says that an inequality like X \geq 0 is true almost surely, when the probability measure attributes zero-mass to the complementary event \left\ . *Non-negativity: If X \geq 0 (a.s.), then \operatorname X\geq 0. *Linearity of expectation: The expected value operator (or expectation operator) \operatorname cdot/math> is
linear Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
in the sense that, for any random variables X and Y, and a constant a, \begin \operatorname + Y&= \operatorname + \operatorname \\ \operatorname X &= a \operatorname \end :whenever the right-hand side is well-defined. By induction, this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for N random variables X_ and constants a_ (1\leq i \leq N), we have \operatorname\left sum_^a_X_\right= \sum_^a_\operatorname _. If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a linear form on this vector space. *Monotonicity: If X\leq Y (a.s.), and both \operatorname /math> and \operatorname /math> exist, then \operatorname leq\operatorname /math>. Proof follows from the linearity and the non-negativity property for Z=Y-X, since Z\geq 0 (a.s.). *Non-degeneracy: If \operatorname (a.s.),_then__\operatorname_X=_\operatorname Y/math>._In_other_words,_if_X_and_Y_are_random_variables_that_take_different_values_with_probability_zero,_then_the_expectation_of_X_will_equal_the_expectation_of_Y. *_If_X=c__(a.s.)_for_some_real_number_,_then_\operatorname_=_c._In_particular,_for_a_random_variable_X_with_well-defined_expectation,_\operatorname operatorname[X_=_\operatorname_/math>._A_well_defined_expectation_implies_that_there_is_one_number,_or_rather,_one_constant_that_defines_the_expected_value._Thus_follows_that_the_expectation_of_this_constant_is_just_the_original_expected_value. *_As_a_consequence_of_the_formula__as_discussed_above,_together_with_the_triangle_inequality,_it_follows_that_for_any_random_variable_X_with_well-defined_expectation,_one_has__, \operatorname _\leq_\operatorname, X, _. *Let__denote_the_ indicator_function_of_an_ event_,_then__is_given_by_the_probability_of_._This_is_nothing_but_a_different_way_of_stating_the_expectation_of_a_
Bernoulli_random_variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
,_as_calculated_in_the_table_above. *
  • Formulas_in_terms_of_CDF:_If_F(x)_is_the__cumulative_distribution_function_of_a_random_variable_,_then : \operatorname_=_\int_^\infty_x\,dF(x), _where_the_values_on_both_sides_are_well_defined_or_not_well_defined_simultaneously,_and_the_integral_is_taken_in_the_sense_of_ Lebesgue-Stieltjes._As_a_consequence_of_
    integration_by_parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative. ...
    _as_applied_to_this_representation_of_,_it_can_be_proved_that__\operatorname_=_\int_0^\infty_(1-F(x))\,dx_-__\int^0__F(x)\,dx,_with_the_integrals_taken_in_the_sense_of_Lebesgue._As_a_special_case,_for_any_random_variable__valued_in_the_nonnegative_integers_,_one_has__\operatorname_\sum__^\infty_\operatorname(X>n), :where__denotes_the_underlying_probability_measure. *Non-multiplicativity:_In_general,_the_expected_value_is_not_multiplicative,_i.e._\operatorname Y/math>_is_not_necessarily_equal_to_\operatorname_cdot_\operatorname_/math>._If_X_and_Y_are_
    independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
    ,_then_one_can_show_that_\operatorname Y\operatorname_\operatorname_/math>._If_the_random_variables_are_ dependent,_then_generally_\operatorname Y\neq_\operatorname_\operatorname_/math>,_although_in_special_cases_of_dependency_the_equality_may_hold. *
    Law_of_the_unconscious_statistician In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the expected value of a function ''g''(''X'') of a random variable ''X'' when one knows the probability distribution of ''X'' but ...
    :_The_expected_value_of_a_measurable_function_of_X,_g(X),_given_that_X_has_a_probability_density_function_f(x),_is_given_by_the_
    inner_product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
    _of_f_and_g:_\operatorname (X)=_\int__g(x)_f(x)\,_dx_._This_formula_also_holds_in_multidimensional_case,_when_g_is_a_function_of_several_random_variables,_and_f_is_their_ joint_density.


    __Inequalities_

    Concentration_inequalities_control_the_likelihood_of_a_random_variable_taking_on_large_values._
    Markov's_inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Markov, ...
    _is_among_the_best-known_and_simplest_to_prove:_for_a_''nonnegative''_random_variable__and_any_positive_number_,_it_states_that_ \operatorname(X\geq_a)\leq\frac. If__is_any_random_variable_with_finite_expectation,_then_Markov's_inequality_may_be_applied_to_the_random_variable__to_obtain_ Chebyshev's_inequality_ \operatorname(, X-\text \geq_a)\leq\frac, where__is_the_
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    ._These_inequalities_are_significant_for_their_nearly_complete_lack_of_conditional_assumptions._For_example,_for_any_random_variable_with_finite_expectation,_the_Chebyshev_inequality_implies_that_there_is_at_least_a_75%_probability_of_an_outcome_being_within_two_ standard_deviations_of_the_expected_value._However,_in_special_cases_the_Markov_and_Chebyshev_inequalities_often_give_much_weaker_information_than_is_otherwise_available._For_example,_in_the_case_of_an_unweighted_dice,_Chebyshev's_inequality_says_that_odds_of_rolling_between_1_and_6_is_at_least_53%;_in_reality,_the_odds_are_of_course_100%._The_ Kolmogorov_inequality_extends_the_Chebyshev_inequality_to_the_context_of_sums_of_random_variables. The_following_three_inequalities_are_of_fundamental_importance_in_the_field_of_mathematical_analysis_ Analysis_is_the_branch_of_mathematics_dealing_with_continuous_functions,_limit_(mathematics),_limits,_and_related_theories,_such_as_Derivative,_differentiation,_Integral,_integration,_measure_(mathematics),_measure,_infinite_sequences,_series_(m_...
  • _and_its_applications_to_probability_theory. * Jensen's_inequality:_Let__be_a_ convex_function_and__a_random_variable_with_finite_expectation._Then_ ____f(\operatorname(X))_\leq_\operatorname_(f(X)). :Part_of_the_assertion_is_that_the_ negative_part_of__has_finite_expectation,_so_that_the_right-hand_side_is_well-defined_(possibly_infinite)._Convexity_of__can_be_phrased_as_saying_that_the_output_of_the_weighted_average_of_''two''_inputs_under-estimates_the_same_weighted_average_of_the_two_outputs;_Jensen's_inequality_extends_this_to_the_setting_of_completely_general_weighted_averages,_as_represented_by_the_expectation._In_the_special_case_that__for_positive_numbers_,_one_obtains_the_Lyapunov_inequality_ \left(\operatorname, X, ^s\right)^\leq\left(\operatorname, X, ^t\right)^. _ :This_can_also_be_proved_by_the_Hölder_inequality._In_measure_theory,_this_is_particularly_notable_for_proving_the_inclusion__of_ ,_in_the_special_case_of_probability_space_ In_probability_theory,_a_probability_space_or_a_probability_triple_(\Omega,_\mathcal,_P)_is_a__mathematical_construct_that_provides_a_formal_model_of_a_random_process_or_"experiment"._For_example,_one_can_define_a_probability_space_which_models_t_...
    s. *_ Hölder's_inequality:_if__and__are_numbers_satisfying_,_then_ \operatorname, XY, \leq(\operatorname, X, ^p)^(\operatorname, Y, ^q)^. :_for_any_random_variables__and_._The_special_case_of__is_called_the_ Cauchy–Schwarz_inequality,_and_is_particularly_well-known. *_
    Minkowski_inequality In mathematical analysis, the Minkowski inequality establishes that the L''p'' spaces are normed vector spaces. Let ''S'' be a measure space, let and let ''f'' and ''g'' be elements of L''p''(''S''). Then is in L''p''(''S''), and we have the t ...
    :_given_any_number_,_for_any_random_variables__and__with__and__both_finite,_it_follows_that__is_also_finite_and_ \Bigl(\operatorname, X+Y, ^p\Bigr)^\leq\Bigl(\operatorname, X, ^p\Bigr)^+\Bigl(\operatorname, Y, ^p\Bigr)^. The_Hölder_and_Minkowski_inequalities_can_be_extended_to_general_ measure_spaces,_and_are_often_given_in_that_context._By_contrast,_the_Jensen_inequality_is_special_to_the_case_of_probability_spaces.


    __Expectations_under_convergence_of_random_variables_

    In_general,_it_is_not_the_case_that_\operatorname _n\to_\operatorname_/math>_even_if_X_n\to_X_pointwise._Thus,_one_cannot_interchange_limits_and_expectation,_without_additional_conditions_on_the_random_variables._To_see_this,_let_U_be_a_random_variable_distributed_uniformly_on_ ,1/math>._For_n\geq_1,_define_a_sequence_of_random_variables :X_n_=_n_\cdot_\mathbf\left\, with_\_being_the_indicator_function_of_the_event_A._Then,_it_follows_that_X_n_\to_0_pointwise._But,_\operatorname _n=_n_\cdot_\operatorname\left(U_\in_\left 0,_\tfrac\right\right)_=_n_\cdot_\tfrac_=_1_for_each_n._Hence,__\lim__\operatorname _n=_1_\neq_0_=_\operatorname\left \lim__X_n_\right Analogously,_for_general_sequence_of_random_variables_\,_the_expected_value_operator_is_not_\sigma-additive,_i.e. :\operatorname\left sum^\infty__Y_n\right\neq_\sum^\infty_\operatorname _n An_example_is_easily_obtained_by_setting_Y_0_=_X_1_and_Y_n_=_X__-_X_n_for_n_\geq_1,_where_X_n_is_as_in_the_previous_example. A_number_of_convergence_results_specify_exact_conditions_which_allow_one_to_interchange_limits_and_expectations,_as_specified_below. *
    Monotone_convergence_theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Infor ...
    :_Let_\_be_a_sequence_of_random_variables,_with_0_\leq_X_n_\leq_X__(a.s)_for_each__n_\geq_0._Furthermore,_let__X_n_\to_X__pointwise._Then,_the_monotone_convergence_theorem_states_that_\lim_n\operatorname _n\operatorname __Using_the_monotone_convergence_theorem,_one_can_show_that_expectation_indeed_satisfies_countable_additivity_for_non-negative_random_variables._In_particular,_let_\^\infty__be_non-negative_random_variables.__It_follows_from_
    monotone_convergence_theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Infor ...
    _that_ \operatorname\left sum^\infty_X_i\right=_\sum^\infty_\operatorname _i * Fatou's_lemma:_Let_\_be_a_sequence_of_non-negative_random_variables._Fatou's_lemma_states_that_\operatorname liminf_n_X_n\leq_\liminf_n_\operatorname _n___Corollary._Let__X_n_\geq_0_with_\operatorname _n\leq_C__for_all__n_\geq_0._If_X_n_\to_X_(a.s),_then_\operatorname_\leq_C.___Proof_is_by_observing_that__X_=_\liminf_n_X_n_(a.s.)_and_applying_Fatou's_lemma. *
    Dominated_convergence_theorem In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the ''L''1 norm. Its power and utility are two of the primary t ...
    :_Let_\_be_a_sequence_of_random_variables._If_X_n\to_X_
    pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some function f. An important class of pointwise concepts are the ''pointwise operations'', that is, operations defined ...
    _(a.s.),_, X_n, \leq_Y_\leq_+\infty_(a.s.),_and_\operatorname_\infty._Then,_according_to_the_dominated_convergence_theorem, **\operatorname, X, _\leq_\operatorname <\infty; **\lim_n\operatorname _n\operatorname_/math> **\lim_n\operatorname, X_n_-_X, _=_0._ * Uniform_integrability:_In_some_cases,_the_equality_\lim_n\operatorname _n\operatorname
    lim_n_X_n Lim or LIM may refer to: Name * Lim (Korean surname), a common Korean surname * Lim (Chinese surname), Hokkien, Hakka, Teochew and Hainanese spelling of the Chinese family name "Lin" * Liza Lim (born 1966), Australian classical composer Abbre ...
    /math>_holds_when_the_sequence_\_is_''uniformly_integrable''.


    _Relationship_with_characteristic_function

    The_probability_density_function_f_X_of_a_scalar_random_variable_X_is_related_to_its_
    characteristic_function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
    _\varphi_X_by_the_inversion_formula: :_f_X(x)_=_\frac\int__e^\varphi_X(t)_\,_\mathrmt. For_the_expected_value_of_g(X)_(where_g:\to_is_a_
    Borel_function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in d ...
    ),_we_can_use_this_inversion_formula_to_obtain :_\operatorname (X)=_\frac_\int__g(x)\left \int__e^\varphi_X(t)_\,_\mathrmt_\right,\mathrmx. If_\operatorname (X)/math>_is_finite,_changing_the_order_of_integration,_we_get,_in_accordance_with_ Fubini–Tonelli_theorem, :_\operatorname (X)=_\frac_\int__G(t)_\varphi_X(t)_\,_\mathrmt, where :G(t)_=_\int__g(x)_e^_\,_\mathrmx is_the_Fourier_transform_of__g(x).__The_expression_for_\operatorname (X)/math>_also_follows_directly_from_
    Plancherel_theorem In mathematics, the Plancherel theorem (sometimes called the Parseval–Plancherel identity) is a result in harmonic analysis, proven by Michel Plancherel in 1910. It states that the integral of a function's squared modulus is equal to the integ ...
    .


    __Uses_and_applications_

    The_expectation_of_a_random_variable_plays_an_important_role_in_a_variety_of_contexts._For_example,_in_
    decision_theory Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
    ,_an_agent_making_an_optimal_choice_in_the_context_of_incomplete_information_is_often_assumed_to_maximize_the_expected_value_of_their_
    utility_function As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
    . For_a_different_example,_in_ statistics,_where_one_seeks_estimates_for_unknown_parameters_based_on_available_data,_the_estimate_itself_is_a_random_variable._In_such_settings,_a_desirable_criterion_for_a_"good"_estimator_is_that_it_is_''
    unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
    '';_that_is,_the_expected_value_of_the_estimate_is_equal_to_the_
    true_value In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population exa ...
    _of_the_underlying_parameter. It_is_possible_to_construct_an_expected_value_equal_to_the_probability_of_an_event,_by_taking_the_expectation_of_an_ indicator_function_that_is_one_if_the_event_has_occurred_and_zero_otherwise._This_relationship_can_be_used_to_translate_properties_of_expected_values_into_properties_of_probabilities,_e.g._using_the_ law_of_large_numbers_to_justify_estimating_probabilities_by_ frequencies. The_expected_values_of_the_powers_of_''X''_are_called_the_ moments_of_''X'';_the_ moments_about_the_mean_of_''X''_are_expected_values_of_powers_of_._The_moments_of_some_random_variables_can_be_used_to_specify_their_distributions,_via_their_
    moment_generating_function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
    s. To_empirically_ estimate_the_expected_value_of_a_random_variable,_one_repeatedly_measures_observations_of_the_variable_and_computes_the__arithmetic_mean_of_the_results._If_the_expected_value_exists,_this_procedure_estimates_the_true_expected_value_in_an_
    unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
    _manner_and_has_the_property_of_minimizing_the_sum_of_the_squares_of_the_ residuals_(the_sum_of_the_squared_differences_between_the_observations_and_the_ estimate)._The_ law_of_large_numbers_demonstrates_(under_fairly_mild_conditions)_that,_as_the_
    size Size in general is the magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to linear dimensions ( length, width, height, diameter, perimeter), area, or volume. Size can also be m ...
    _of_the_ sample_gets_larger,_the_
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    _of_this_ estimate_gets_smaller. This_property_is_often_exploited_in_a_wide_variety_of_applications,_including_general_problems_of_
    statistical_estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...
    _and_
    machine_learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
    ,_to_estimate_(probabilistic)_quantities_of_interest_via_ Monte_Carlo_methods,_since_most_quantities_of_interest_can_be_written_in_terms_of_expectation,_e.g._\operatorname()_=_\operatorname /math>,_where____is_the_indicator_function_of_the_set_\mathcal. _In_
    classical_mechanics Classical mechanics is a physical theory describing the motion of macroscopic objects, from projectiles to parts of machinery, and astronomical objects, such as spacecraft, planets, stars, and galaxies. For objects governed by classi ...
    ,_the_ center_of_mass_is_an_analogous_concept_to_expectation._For_example,_suppose_''X''_is_a_discrete_random_variable_with_values_''xi''_and_corresponding_probabilities_''pi''._Now_consider_a_weightless_rod_on_which_are_placed_weights,_at_locations_''xi''_along_the_rod_and_having_masses_''pi''_(whose_sum_is_one)._The_point_at_which_the_rod_balances_is_E 'X'' Expected_values_can_also_be_used_to_compute_the_
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    ,_by_means_of_the_computational_formula_for_the_variance :\operatorname(X)=__\operatorname ^2-_(\operatorname ^2. A_very_important_application_of_the_expectation_value_is_in_the_field_of_
    quantum_mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistr ...
    ._The_expectation_value_of_a_quantum_mechanical_operator_\hat_operating_on_a_
    quantum_state In quantum physics, a quantum state is a mathematical entity that provides a probability distribution for the outcomes of each possible measurement on a system. Knowledge of the quantum state together with the rules for the system's evolution i ...
    _vector_, \psi\rangle_is_written_as_\langle\hat\rangle_=_\langle\psi, A, \psi\rangle._The_
    uncertainty Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable ...
    _in_\hat_can_be_calculated_by_the_formula_(\Delta_A)^2_=_\langle\hat^2\rangle_-_\langle_\hat_\rangle^2_.


    _See_also

    * Center_of_mass *
    Central_tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...
    * Chebyshev's_inequality_(an_inequality_on_location_and_scale_parameters) * Conditional_expectation * Expectation_(the_general_term) *
    Expectation_value_(quantum_mechanics) In quantum mechanics, the expectation value is the probabilistic expected value of the result (measurement) of an experiment. It can be thought of as an average of all the possible outcomes of a measurement as weighted by their likelihood, and as ...
    *
    Law_of_total_expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if X is a random variable whose expected v ...
    —the_expected_value_of_the_conditional_expected_value_of_''X''_given_''Y''_is_the_same_as_the_expected_value_of_''X''. * Moment_(mathematics) *
    Nonlinear_expectation In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations. The common use of nonlinea ...
    _(a_generalization_of_the_expected_value) *
    Sample_mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...
    *
    Population_mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothe ...
    * Wald's_equation—an_equation_for_calculating_the_expected_value_of_a_random_number_of_random_variables


    __References_


    _Literature

    *_ *_ * * * * * *_ *


    _External_Links

    {{DEFAULTSORT:Expected_Value Theory_of_probability_distributions Gambling_terminology Articles_containing_proofshtml" ;"title="X, ]=0, then X=0 (a.s.). * If X = Y (a.s.), then \operatorname X= \operatorname Y/math>. In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y. * If X=c (a.s.) for some real number , then \operatorname = c. In particular, for a random variable X with well-defined expectation, \operatorname operatorname[X = \operatorname /math>. A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value. * As a consequence of the formula as discussed above, together with the triangle inequality, it follows that for any random variable X with well-defined expectation, one has , \operatorname \leq \operatorname, X, . *Let denote the indicator function of an event , then is given by the probability of . This is nothing but a different way of stating the expectation of a
    Bernoulli random variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
    , as calculated in the table above. *
  • Formulas in terms of CDF: If F(x) is the cumulative distribution function of a random variable , then : \operatorname = \int_^\infty x\,dF(x), where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of Lebesgue-Stieltjes. As a consequence of
    integration by parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative. ...
    as applied to this representation of , it can be proved that \operatorname = \int_0^\infty (1-F(x))\,dx - \int^0_ F(x)\,dx, with the integrals taken in the sense of Lebesgue. As a special case, for any random variable valued in the nonnegative integers , one has \operatorname \sum _^\infty \operatorname(X>n), :where denotes the underlying probability measure. *Non-multiplicativity: In general, the expected value is not multiplicative, i.e. \operatorname Y/math> is not necessarily equal to \operatorname cdot \operatorname /math>. If X and Y are
    independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
    , then one can show that \operatorname Y\operatorname \operatorname /math>. If the random variables are dependent, then generally \operatorname Y\neq \operatorname \operatorname /math>, although in special cases of dependency the equality may hold. *
    Law of the unconscious statistician In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the expected value of a function ''g''(''X'') of a random variable ''X'' when one knows the probability distribution of ''X'' but ...
    : The expected value of a measurable function of X, g(X), given that X has a probability density function f(x), is given by the
    inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
    of f and g: \operatorname (X)= \int_ g(x) f(x)\, dx . This formula also holds in multidimensional case, when g is a function of several random variables, and f is their joint density.


    Inequalities

    Concentration inequalities control the likelihood of a random variable taking on large values.
    Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Markov, ...
    is among the best-known and simplest to prove: for a ''nonnegative'' random variable and any positive number , it states that \operatorname(X\geq a)\leq\frac. If is any random variable with finite expectation, then Markov's inequality may be applied to the random variable to obtain Chebyshev's inequality \operatorname(, X-\text \geq a)\leq\frac, where is the
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    . These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two standard deviations of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than is otherwise available. For example, in the case of an unweighted dice, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%. The Kolmogorov inequality extends the Chebyshev inequality to the context of sums of random variables. The following three inequalities are of fundamental importance in the field of
    mathematical analysis Analysis is the branch of mathematics dealing with continuous functions, limit (mathematics), limits, and related theories, such as Derivative, differentiation, Integral, integration, measure (mathematics), measure, infinite sequences, series (m ...
    and its applications to probability theory. * Jensen's inequality: Let be a convex function and a random variable with finite expectation. Then f(\operatorname(X)) \leq \operatorname (f(X)). :Part of the assertion is that the negative part of has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of can be phrased as saying that the output of the weighted average of ''two'' inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that for positive numbers , one obtains the Lyapunov inequality \left(\operatorname, X, ^s\right)^\leq\left(\operatorname, X, ^t\right)^. :This can also be proved by the Hölder inequality. In measure theory, this is particularly notable for proving the inclusion of , in the special case of
    probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
    s. * Hölder's inequality: if and are numbers satisfying , then \operatorname, XY, \leq(\operatorname, X, ^p)^(\operatorname, Y, ^q)^. : for any random variables and . The special case of is called the Cauchy–Schwarz inequality, and is particularly well-known. *
    Minkowski inequality In mathematical analysis, the Minkowski inequality establishes that the L''p'' spaces are normed vector spaces. Let ''S'' be a measure space, let and let ''f'' and ''g'' be elements of L''p''(''S''). Then is in L''p''(''S''), and we have the t ...
    : given any number , for any random variables and with and both finite, it follows that is also finite and \Bigl(\operatorname, X+Y, ^p\Bigr)^\leq\Bigl(\operatorname, X, ^p\Bigr)^+\Bigl(\operatorname, Y, ^p\Bigr)^. The Hölder and Minkowski inequalities can be extended to general measure spaces, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.


    Expectations under convergence of random variables

    In general, it is not the case that \operatorname _n\to \operatorname /math> even if X_n\to X pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let U be a random variable distributed uniformly on ,1/math>. For n\geq 1, define a sequence of random variables :X_n = n \cdot \mathbf\left\, with \ being the indicator function of the event A. Then, it follows that X_n \to 0 pointwise. But, \operatorname _n= n \cdot \operatorname\left(U \in \left 0, \tfrac\right\right) = n \cdot \tfrac = 1 for each n. Hence, \lim_ \operatorname _n= 1 \neq 0 = \operatorname\left \lim_ X_n \right Analogously, for general sequence of random variables \, the expected value operator is not \sigma-additive, i.e. :\operatorname\left sum^\infty_ Y_n\right\neq \sum^\infty_\operatorname _n An example is easily obtained by setting Y_0 = X_1 and Y_n = X_ - X_n for n \geq 1, where X_n is as in the previous example. A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below. *
    Monotone convergence theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Infor ...
    : Let \ be a sequence of random variables, with 0 \leq X_n \leq X_ (a.s) for each n \geq 0. Furthermore, let X_n \to X pointwise. Then, the monotone convergence theorem states that \lim_n\operatorname _n\operatorname Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let \^\infty_ be non-negative random variables. It follows from
    monotone convergence theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Infor ...
    that \operatorname\left sum^\infty_X_i\right= \sum^\infty_\operatorname _i * Fatou's lemma: Let \ be a sequence of non-negative random variables. Fatou's lemma states that \operatorname liminf_n X_n\leq \liminf_n \operatorname _n Corollary. Let X_n \geq 0 with \operatorname _n\leq C for all n \geq 0. If X_n \to X (a.s), then \operatorname \leq C. Proof is by observing that X = \liminf_n X_n (a.s.) and applying Fatou's lemma. *
    Dominated convergence theorem In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the ''L''1 norm. Its power and utility are two of the primary t ...
    : Let \ be a sequence of random variables. If X_n\to X
    pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some function f. An important class of pointwise concepts are the ''pointwise operations'', that is, operations defined ...
    (a.s.), , X_n, \leq Y \leq +\infty (a.s.), and \operatorname \infty. Then, according to the dominated convergence theorem, **\operatorname, X, \leq \operatorname <\infty; **\lim_n\operatorname _n\operatorname /math> **\lim_n\operatorname, X_n - X, = 0. * Uniform integrability: In some cases, the equality \lim_n\operatorname _n\operatorname
    lim_n X_n Lim or LIM may refer to: Name * Lim (Korean surname), a common Korean surname * Lim (Chinese surname), Hokkien, Hakka, Teochew and Hainanese spelling of the Chinese family name "Lin" * Liza Lim (born 1966), Australian classical composer Abbre ...
    /math> holds when the sequence \ is ''uniformly integrable''.


    Relationship with characteristic function

    The probability density function f_X of a scalar random variable X is related to its
    characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
    \varphi_X by the inversion formula: : f_X(x) = \frac\int_ e^\varphi_X(t) \, \mathrmt. For the expected value of g(X) (where g:\to is a
    Borel function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in d ...
    ), we can use this inversion formula to obtain : \operatorname (X)= \frac \int_ g(x)\left \int_ e^\varphi_X(t) \, \mathrmt \right,\mathrmx. If \operatorname (X)/math> is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem, : \operatorname (X)= \frac \int_ G(t) \varphi_X(t) \, \mathrmt, where :G(t) = \int_ g(x) e^ \, \mathrmx is the Fourier transform of g(x). The expression for \operatorname (X)/math> also follows directly from
    Plancherel theorem In mathematics, the Plancherel theorem (sometimes called the Parseval–Plancherel identity) is a result in harmonic analysis, proven by Michel Plancherel in 1910. It states that the integral of a function's squared modulus is equal to the integ ...
    .


    Uses and applications

    The expectation of a random variable plays an important role in a variety of contexts. For example, in
    decision theory Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
    , an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their
    utility function As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
    . For a different example, in statistics, where one seeks estimates for unknown parameters based on available data, the estimate itself is a random variable. In such settings, a desirable criterion for a "good" estimator is that it is ''
    unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
    ''; that is, the expected value of the estimate is equal to the
    true value In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population exa ...
    of the underlying parameter. It is possible to construct an expected value equal to the probability of an event, by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies. The expected values of the powers of ''X'' are called the moments of ''X''; the moments about the mean of ''X'' are expected values of powers of . The moments of some random variables can be used to specify their distributions, via their
    moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
    s. To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an
    unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
    manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the
    size Size in general is the magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to linear dimensions ( length, width, height, diameter, perimeter), area, or volume. Size can also be m ...
    of the sample gets larger, the
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    of this estimate gets smaller. This property is often exploited in a wide variety of applications, including general problems of
    statistical estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...
    and
    machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
    , to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. \operatorname() = \operatorname /math>, where _ is the indicator function of the set \mathcal. In
    classical mechanics Classical mechanics is a physical theory describing the motion of macroscopic objects, from projectiles to parts of machinery, and astronomical objects, such as spacecraft, planets, stars, and galaxies. For objects governed by classi ...
    , the center of mass is an analogous concept to expectation. For example, suppose ''X'' is a discrete random variable with values ''xi'' and corresponding probabilities ''pi''. Now consider a weightless rod on which are placed weights, at locations ''xi'' along the rod and having masses ''pi'' (whose sum is one). The point at which the rod balances is E 'X'' Expected values can also be used to compute the
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    , by means of the computational formula for the variance :\operatorname(X)= \operatorname ^2- (\operatorname ^2. A very important application of the expectation value is in the field of
    quantum mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistr ...
    . The expectation value of a quantum mechanical operator \hat operating on a
    quantum state In quantum physics, a quantum state is a mathematical entity that provides a probability distribution for the outcomes of each possible measurement on a system. Knowledge of the quantum state together with the rules for the system's evolution i ...
    vector , \psi\rangle is written as \langle\hat\rangle = \langle\psi, A, \psi\rangle. The
    uncertainty Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable ...
    in \hat can be calculated by the formula (\Delta A)^2 = \langle\hat^2\rangle - \langle \hat \rangle^2 .


    See also

    * Center of mass *
    Central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...
    * Chebyshev's inequality (an inequality on location and scale parameters) * Conditional expectation * Expectation (the general term) *
    Expectation value (quantum mechanics) In quantum mechanics, the expectation value is the probabilistic expected value of the result (measurement) of an experiment. It can be thought of as an average of all the possible outcomes of a measurement as weighted by their likelihood, and as ...
    *
    Law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if X is a random variable whose expected v ...
    —the expected value of the conditional expected value of ''X'' given ''Y'' is the same as the expected value of ''X''. * Moment (mathematics) *
    Nonlinear expectation In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations. The common use of nonlinea ...
    (a generalization of the expected value) *
    Sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...
    *
    Population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothe ...
    * Wald's equation—an equation for calculating the expected value of a random number of random variables


    References


    Literature

    * * * * * * * * *


    External Links

    {{DEFAULTSORT:Expected Value Theory of probability distributions Gambling terminology Articles containing proofs>X, 0, then X=0 (a.s.). * If X = Y (a.s.), then \operatorname X= \operatorname Y/math>. In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y. * If X=c (a.s.) for some real number , then \operatorname = c. In particular, for a random variable X with well-defined expectation, \operatorname operatorname[X = \operatorname /math>. A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value. * As a consequence of the formula as discussed above, together with the triangle inequality, it follows that for any random variable X with well-defined expectation, one has , \operatorname \leq \operatorname, X, . *Let denote the indicator function of an event , then is given by the probability of . This is nothing but a different way of stating the expectation of a
    Bernoulli random variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
    , as calculated in the table above. *
  • Formulas in terms of CDF: If F(x) is the cumulative distribution function of a random variable , then : \operatorname = \int_^\infty x\,dF(x), where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of Lebesgue-Stieltjes. As a consequence of
    integration by parts In calculus, and more generally in mathematical analysis, integration by parts or partial integration is a process that finds the integral of a product of functions in terms of the integral of the product of their derivative and antiderivative. ...
    as applied to this representation of , it can be proved that \operatorname = \int_0^\infty (1-F(x))\,dx - \int^0_ F(x)\,dx, with the integrals taken in the sense of Lebesgue. As a special case, for any random variable valued in the nonnegative integers , one has \operatorname \sum _^\infty \operatorname(X>n), :where denotes the underlying probability measure. *Non-multiplicativity: In general, the expected value is not multiplicative, i.e. \operatorname Y/math> is not necessarily equal to \operatorname cdot \operatorname /math>. If X and Y are
    independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
    , then one can show that \operatorname Y\operatorname \operatorname /math>. If the random variables are dependent, then generally \operatorname Y\neq \operatorname \operatorname /math>, although in special cases of dependency the equality may hold. *
    Law of the unconscious statistician In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem used to calculate the expected value of a function ''g''(''X'') of a random variable ''X'' when one knows the probability distribution of ''X'' but ...
    : The expected value of a measurable function of X, g(X), given that X has a probability density function f(x), is given by the
    inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
    of f and g: \operatorname (X)= \int_ g(x) f(x)\, dx . This formula also holds in multidimensional case, when g is a function of several random variables, and f is their joint density.


    Inequalities

    Concentration inequalities control the likelihood of a random variable taking on large values.
    Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Markov, ...
    is among the best-known and simplest to prove: for a ''nonnegative'' random variable and any positive number , it states that \operatorname(X\geq a)\leq\frac. If is any random variable with finite expectation, then Markov's inequality may be applied to the random variable to obtain Chebyshev's inequality \operatorname(, X-\text \geq a)\leq\frac, where is the
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    . These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two standard deviations of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than is otherwise available. For example, in the case of an unweighted dice, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%. The Kolmogorov inequality extends the Chebyshev inequality to the context of sums of random variables. The following three inequalities are of fundamental importance in the field of
    mathematical analysis Analysis is the branch of mathematics dealing with continuous functions, limit (mathematics), limits, and related theories, such as Derivative, differentiation, Integral, integration, measure (mathematics), measure, infinite sequences, series (m ...
    and its applications to probability theory. * Jensen's inequality: Let be a convex function and a random variable with finite expectation. Then f(\operatorname(X)) \leq \operatorname (f(X)). :Part of the assertion is that the negative part of has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of can be phrased as saying that the output of the weighted average of ''two'' inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that for positive numbers , one obtains the Lyapunov inequality \left(\operatorname, X, ^s\right)^\leq\left(\operatorname, X, ^t\right)^. :This can also be proved by the Hölder inequality. In measure theory, this is particularly notable for proving the inclusion of , in the special case of
    probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
    s. * Hölder's inequality: if and are numbers satisfying , then \operatorname, XY, \leq(\operatorname, X, ^p)^(\operatorname, Y, ^q)^. : for any random variables and . The special case of is called the Cauchy–Schwarz inequality, and is particularly well-known. *
    Minkowski inequality In mathematical analysis, the Minkowski inequality establishes that the L''p'' spaces are normed vector spaces. Let ''S'' be a measure space, let and let ''f'' and ''g'' be elements of L''p''(''S''). Then is in L''p''(''S''), and we have the t ...
    : given any number , for any random variables and with and both finite, it follows that is also finite and \Bigl(\operatorname, X+Y, ^p\Bigr)^\leq\Bigl(\operatorname, X, ^p\Bigr)^+\Bigl(\operatorname, Y, ^p\Bigr)^. The Hölder and Minkowski inequalities can be extended to general measure spaces, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.


    Expectations under convergence of random variables

    In general, it is not the case that \operatorname _n\to \operatorname /math> even if X_n\to X pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let U be a random variable distributed uniformly on ,1/math>. For n\geq 1, define a sequence of random variables :X_n = n \cdot \mathbf\left\, with \ being the indicator function of the event A. Then, it follows that X_n \to 0 pointwise. But, \operatorname _n= n \cdot \operatorname\left(U \in \left 0, \tfrac\right\right) = n \cdot \tfrac = 1 for each n. Hence, \lim_ \operatorname _n= 1 \neq 0 = \operatorname\left \lim_ X_n \right Analogously, for general sequence of random variables \, the expected value operator is not \sigma-additive, i.e. :\operatorname\left sum^\infty_ Y_n\right\neq \sum^\infty_\operatorname _n An example is easily obtained by setting Y_0 = X_1 and Y_n = X_ - X_n for n \geq 1, where X_n is as in the previous example. A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below. *
    Monotone convergence theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Infor ...
    : Let \ be a sequence of random variables, with 0 \leq X_n \leq X_ (a.s) for each n \geq 0. Furthermore, let X_n \to X pointwise. Then, the monotone convergence theorem states that \lim_n\operatorname _n\operatorname Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let \^\infty_ be non-negative random variables. It follows from
    monotone convergence theorem In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences (sequences that are non-decreasing or non-increasing) that are also bounded. Infor ...
    that \operatorname\left sum^\infty_X_i\right= \sum^\infty_\operatorname _i * Fatou's lemma: Let \ be a sequence of non-negative random variables. Fatou's lemma states that \operatorname liminf_n X_n\leq \liminf_n \operatorname _n Corollary. Let X_n \geq 0 with \operatorname _n\leq C for all n \geq 0. If X_n \to X (a.s), then \operatorname \leq C. Proof is by observing that X = \liminf_n X_n (a.s.) and applying Fatou's lemma. *
    Dominated convergence theorem In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the ''L''1 norm. Its power and utility are two of the primary t ...
    : Let \ be a sequence of random variables. If X_n\to X
    pointwise In mathematics, the qualifier pointwise is used to indicate that a certain property is defined by considering each value f(x) of some function f. An important class of pointwise concepts are the ''pointwise operations'', that is, operations defined ...
    (a.s.), , X_n, \leq Y \leq +\infty (a.s.), and \operatorname \infty. Then, according to the dominated convergence theorem, **\operatorname, X, \leq \operatorname <\infty; **\lim_n\operatorname _n\operatorname /math> **\lim_n\operatorname, X_n - X, = 0. * Uniform integrability: In some cases, the equality \lim_n\operatorname _n\operatorname
    lim_n X_n Lim or LIM may refer to: Name * Lim (Korean surname), a common Korean surname * Lim (Chinese surname), Hokkien, Hakka, Teochew and Hainanese spelling of the Chinese family name "Lin" * Liza Lim (born 1966), Australian classical composer Abbre ...
    /math> holds when the sequence \ is ''uniformly integrable''.


    Relationship with characteristic function

    The probability density function f_X of a scalar random variable X is related to its
    characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
    \varphi_X by the inversion formula: : f_X(x) = \frac\int_ e^\varphi_X(t) \, \mathrmt. For the expected value of g(X) (where g:\to is a
    Borel function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in d ...
    ), we can use this inversion formula to obtain : \operatorname (X)= \frac \int_ g(x)\left \int_ e^\varphi_X(t) \, \mathrmt \right,\mathrmx. If \operatorname (X)/math> is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem, : \operatorname (X)= \frac \int_ G(t) \varphi_X(t) \, \mathrmt, where :G(t) = \int_ g(x) e^ \, \mathrmx is the Fourier transform of g(x). The expression for \operatorname (X)/math> also follows directly from
    Plancherel theorem In mathematics, the Plancherel theorem (sometimes called the Parseval–Plancherel identity) is a result in harmonic analysis, proven by Michel Plancherel in 1910. It states that the integral of a function's squared modulus is equal to the integ ...
    .


    Uses and applications

    The expectation of a random variable plays an important role in a variety of contexts. For example, in
    decision theory Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
    , an agent making an optimal choice in the context of incomplete information is often assumed to maximize the expected value of their
    utility function As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
    . For a different example, in statistics, where one seeks estimates for unknown parameters based on available data, the estimate itself is a random variable. In such settings, a desirable criterion for a "good" estimator is that it is ''
    unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
    ''; that is, the expected value of the estimate is equal to the
    true value In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population exa ...
    of the underlying parameter. It is possible to construct an expected value equal to the probability of an event, by taking the expectation of an indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating probabilities by frequencies. The expected values of the powers of ''X'' are called the moments of ''X''; the moments about the mean of ''X'' are expected values of powers of . The moments of some random variables can be used to specify their distributions, via their
    moment generating function In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compare ...
    s. To empirically estimate the expected value of a random variable, one repeatedly measures observations of the variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the true expected value in an
    unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
    manner and has the property of minimizing the sum of the squares of the residuals (the sum of the squared differences between the observations and the estimate). The law of large numbers demonstrates (under fairly mild conditions) that, as the
    size Size in general is the magnitude or dimensions of a thing. More specifically, ''geometrical size'' (or ''spatial size'') can refer to linear dimensions ( length, width, height, diameter, perimeter), area, or volume. Size can also be m ...
    of the sample gets larger, the
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    of this estimate gets smaller. This property is often exploited in a wide variety of applications, including general problems of
    statistical estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...
    and
    machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
    , to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most quantities of interest can be written in terms of expectation, e.g. \operatorname() = \operatorname /math>, where _ is the indicator function of the set \mathcal. In
    classical mechanics Classical mechanics is a physical theory describing the motion of macroscopic objects, from projectiles to parts of machinery, and astronomical objects, such as spacecraft, planets, stars, and galaxies. For objects governed by classi ...
    , the center of mass is an analogous concept to expectation. For example, suppose ''X'' is a discrete random variable with values ''xi'' and corresponding probabilities ''pi''. Now consider a weightless rod on which are placed weights, at locations ''xi'' along the rod and having masses ''pi'' (whose sum is one). The point at which the rod balances is E 'X'' Expected values can also be used to compute the
    variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
    , by means of the computational formula for the variance :\operatorname(X)= \operatorname ^2- (\operatorname ^2. A very important application of the expectation value is in the field of
    quantum mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistr ...
    . The expectation value of a quantum mechanical operator \hat operating on a
    quantum state In quantum physics, a quantum state is a mathematical entity that provides a probability distribution for the outcomes of each possible measurement on a system. Knowledge of the quantum state together with the rules for the system's evolution i ...
    vector , \psi\rangle is written as \langle\hat\rangle = \langle\psi, A, \psi\rangle. The
    uncertainty Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable ...
    in \hat can be calculated by the formula (\Delta A)^2 = \langle\hat^2\rangle - \langle \hat \rangle^2 .


    See also

    * Center of mass *
    Central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...
    * Chebyshev's inequality (an inequality on location and scale parameters) * Conditional expectation * Expectation (the general term) *
    Expectation value (quantum mechanics) In quantum mechanics, the expectation value is the probabilistic expected value of the result (measurement) of an experiment. It can be thought of as an average of all the possible outcomes of a measurement as weighted by their likelihood, and as ...
    *
    Law of total expectation The proposition in probability theory known as the law of total expectation, the law of iterated expectations (LIE), Adam's law, the tower rule, and the smoothing theorem, among other names, states that if X is a random variable whose expected v ...
    —the expected value of the conditional expected value of ''X'' given ''Y'' is the same as the expected value of ''X''. * Moment (mathematics) *
    Nonlinear expectation In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations. The common use of nonlinea ...
    (a generalization of the expected value) *
    Sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...
    *
    Population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothe ...
    * Wald's equation—an equation for calculating the expected value of a random number of random variables


    References


    Literature

    * * * * * * * * *


    External Links

    {{DEFAULTSORT:Expected Value Theory of probability distributions Gambling terminology Articles containing proofs