mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...

, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a

convex function In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of a function, graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigra ...

of an

integral In mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented i ...

to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by

Otto Hölder Ludwig Otto Hölder (December 22, 1859 – August 29, 1937) was a German mathematician born in Stuttgart. Early life and education Hölder was the youngest of three sons of professor Otto Hölder (1811–1890), and a grandson of professor Christ ...

in 1889. Given its generality, the

inequality Inequality may refer to: Economics * Attention inequality, unequal distribution of attention across users, groups of people, issues in etc. in attention economy * Economic inequality, difference in economic well-being between population groups * ...

appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple

corollary In mathematics and logic, a corollary ( , ) is a theorem of less importance which can be readily deduced from a previous, more notable statement. A corollary could, for instance, be a proposition which is incidentally proved while proving another ...

that the opposite is true of concave transformations. Jensen's inequality generalizes the statement that the

secant line Secant is a term in mathematics derived from the Latin ''secare'' ("to cut"). It may refer to: * a secant line, in geometry * the secant variety, in algebraic geometry * secant (trigonometry) (Latin: secans), the multiplicative inverse (or reciproc ...

of a convex function lies ''above'' the

graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...

of the

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...

, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for ''t'' ∈ ,1, :

t f(x_1) + (1-t) f(x_2),

while the graph of the function is the convex function of the weighted means, :

f(t x_1 + (1-t) x_2).

Thus, Jensen's inequality is :

f(t x_1 + (1-t) x_2) \leq t f(x_1) + (1-t) f(x_2).

In the context of

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, it is generally stated in the following form: if ''X'' is a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

and is a convex function, then :

\varphi(\operatorname \leq \operatorname \left varphi(X)\right

The difference between the two sides of the inequality,

\operatorname \left varphi(X)\right - \varphi\left(\operatorname right)

, is called the

Jensen gap Jensen may refer to: People *Jensen (surname) *Jensen (given name) *Jensen (gamer), Danish professional ''League of Legends'' player Places Australia * Jensen Oval, Sydney, Australia, a soccer park * Jensen, Queensland, a suburb of Townsvill ...

Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...

or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its ''full strength''.

Finite form

For a real

\varphi

, numbers

x_1, x_2, \ldots, x_n

in its domain, and positive weights

a_i

, Jensen's inequality can be stated as: and the inequality is reversed if

\varphi

concave Concave or concavity may refer to: Science and technology * Concave lens * Concave mirror Mathematics * Concave function, the negative of a convex function * Concave polygon, a polygon which is not convex * Concave set * The concavity In ca ...

, which is Equality holds if and only if

x_1=x_2=\cdots =x_n

\varphi

is linear on a domain containing

x_1,x_2,\cdots ,x_n

. As a particular case, if the weights

a_i

are all equal, then () and () become For instance, the function is ''

'', so substituting

\varphi(x) = \log(x)

in the previous formula () establishes the (logarithm of the) familiar arithmetic-mean/geometric-mean inequality: :

\log\!\left( \frac\right) \geq \frac \quad \text \quad
\frac \geq \sqrt /math>

A common application has  as a function of another variable (or set of variables) , that is, x_i = g(t_i) . All of this carries directly over to the general continuous case: the weights  are replaced by a non-negative integrable function , such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic and probabilistic form

Let

(\Omega, A, \mu)

be a

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

. Let

f : \Omega \to \mathbb

be a

\mu

-measurable function and

\varphi : \mathbb \to \mathbb

be convex. Then:

\varphi\left(\int_\Omega f \,\mathrm\mu\right) \leq \int_\Omega \varphi \circ f \,\mathrm\mu

In real analysis, we may require an estimate on :

\varphi\left(\int_a^b f(x)\, dx\right)

where

a, b \in \mathbb

, and

\to \R

is a non-negative Lebesgue-

integrable In mathematics, integrability is a property of certain dynamical systems. While there are several distinct formal definitions, informally speaking, an integrable system is a dynamical system with sufficiently many conserved quantities, or first ...

function. In this case, the Lebesgue measure of

, b /math> need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get

: \varphi\left(\frac\int_a^b  f(x)\, dx\right) \le \frac \int_a^b \varphi(f(x)) \,dx. The same result can be equivalently stated in a

setting, by a simple change of notation. Let

(\Omega, \mathfrak,\operatorname)

be a

, ''X'' an

real-valued

and a

. Then: :

\varphi\left(\operatorname right) \leq \operatorname \left \varphi(X) \right

In this probability setting, the measure is intended as a probability

\operatorname

, the integral with respect to as an

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

\operatorname

, and the function

f

as a

''X''. Note that the equality holds if and only if is a linear function on some convex set

A

such that

\mathrm(X \in A) = 1

(which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic setting

More generally, let ''T'' be a real

topological vector space In mathematics, a topological vector space (also called a linear topological space and commonly abbreviated TVS or t.v.s.) is one of the basic structures investigated in functional analysis. A topological vector space is a vector space that is als ...

, and ''X'' a ''T''-valued

random variable. In this general setting, ''integrable'' means that there exists an element

\operatorname /math> in ''T'', such that for any element ''z'' in the

dual space In mathematics, any vector space ''V'' has a corresponding dual vector space (or just dual space for short) consisting of all linear forms on ''V'', together with the vector space structure of pointwise addition and scalar multiplication by const ...

of ''T'':

\operatorname, \langle z, X \rangle, <\infty

, and

\langle z, \operatorname rangle = \operatorname langle z, X \rangle /math>. Then, for any measurable convex function  and any sub-

σ-algebra In mathematical analysis and in probability theory, a σ-algebra (also σ-field) on a set ''X'' is a collection Σ of subsets of ''X'' that includes the empty subset, is closed under complement, and is closed under countable unions and countabl ...

\mathfrak

\mathfrak

: :

\varphi\left(\operatorname\left \mid\mathfrak\right right) \leq  \operatorname\left varphi(X)\mid\mathfrak\right

Here

\operatorname cdot\mid\mathfrak /math> stands for the expectation conditioned to the σ-algebra \mathfrak . This general statement reduces to the previous ones when the topological vector space  is the

real axis In elementary mathematics, a number line is a picture of a graduated straight line that serves as visual representation of the real numbers. Every point of a number line is assumed to correspond to a real number, and every real number to a poin ...

, and

\mathfrak

is the trivial -algebra (where is the

empty set In mathematics, the empty set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, while in other ...

, and is the

sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...

A sharpened and generalized form

Let ''X'' be a one-dimensional random variable with mean

\mu

and variance

\sigma^2\ge 0

. Let

\varphi(x)

be a twice differentiable function, and define the function :

h(x)\triangleq\frac-\frac.

Then :

right)\le \sigma^2\sup h(x) \le \sigma^2\sup \frac.

In particular, when

\varphi(x)

is convex, then

\varphi''(x)\ge 0

, and the standard form of Jensen's inequality immediately follows for the case where

\varphi(x)

is additionally assumed to be twice differentiable.

Proofs

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where is a real number (see figure). Assuming a hypothetical distribution of values, one can immediately identify the position of

\operatorname /math> and its image \varphi(\operatorname in the graph. Noticing that for convex mappings  the corresponding distribution of  values is increasingly "stretched out" for increasing values of , it is easy to see that the distribution of  is broader in the interval corresponding to  and narrower in  for any ; in particular, this is also true for X_0 = \operatorname /math>. Consequently, in this picture the expectation of  will always shift upwards with respect to the position of \varphi(\operatorname . A similar reasoning holds if the distribution of  covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e. 

: \varphi(\operatorname \leq  \operatorname varphi(X) = \operatorname with equality when  is not strictly convex, e.g. when it is a straight line, or when  follows a

degenerate distribution In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter ...

(i.e. is a constant). The proofs below formalize this intuitive notion.

Proof 1 (finite form)

If and are two arbitrary nonnegative real numbers such that then convexity of implies :

\forall x_1, x_2: \qquad \varphi \left (\lambda_1 x_1+\lambda_2 x_2 \right )\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2).

This can be generalized: if are nonnegative real numbers such that , then :

\varphi(\lambda_1 x_1+\lambda_2 x_2+\cdots+\lambda_n x_n)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2)+\cdots+\lambda_n\,\varphi(x_n),

for any . The ''finite form'' of the Jensen's inequality can be proved by

induction Induction, Inducible or Inductive may refer to: Biology and medicine * Labor induction (birth/pregnancy) * Induction chemotherapy, in medicine * Induced stem cells, stem cells derived from somatic, reproductive, pluripotent or other cell t ...

: by convexity hypotheses, the statement is true for ''n'' = 2. Suppose the statement is true for some ''n'', so :

\varphi\left(\sum_^\lambda_i x_i\right) \leq \sum_^\lambda_i \varphi\left(x_i\right)

for any such that . One needs to prove it for . At least one of the is strictly smaller than

1

, say ; therefore by convexity inequality: :

\begin
\varphi\left(\sum_^\lambda_i x_i\right) &= \varphi\left((1-\lambda_)\sum_^ \frac x_i + \lambda_ x_ \right) \\
&\leq (1-\lambda_) \varphi\left(\sum_^ \frac x_i \right)+\lambda_\,\varphi(x_).
\end

Since , :

\sum_^ \frac = 1

, applying the induction hypothesis gives :

\varphi\left(\sum_^\frac x_i\right) \leq \sum_^\frac \varphi(x_i)

therefore :

\begin
\varphi\left(\sum_^\lambda_i x_i\right) 
&\leq (1-\lambda_) \sum_^\frac \varphi(x_i)+\lambda_\,\varphi(x_) 
=\sum_^\lambda_i \varphi(x_i)
\end

We deduce the equality is true for , by the principle of mathematical induction it follows that the result is also true for all integer greater than 2. In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as: :

\varphi\left(\int x\,d\mu_n(x) \right)\leq \int \varphi(x)\,d\mu_n(x),

where ''μ''_''n'' is a measure given by an arbitrary

convex combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other word ...

Dirac delta In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire ...

s: :

\mu_n= \sum_^n \lambda_i \delta_.

Since convex functions are

continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...

, and since convex combinations of Dirac deltas are weakly

dense Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematically ...

in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)

Let

g

be a real-valued

\mu

-integrable function on a probability space

\Omega

, and let

\varphi

be a convex function on the real numbers. Since

\varphi

is convex, at each real number

x

we have a nonempty set of

subderivative In mathematics, the subderivative, subgradient, and subdifferential generalize the derivative to convex functions which are not necessarily differentiable. Subderivatives arise in convex analysis, the study of convex functions, often in connection ...

s, which may be thought of as lines touching the graph of

\varphi

x

, but which are at or below the graph of

\varphi

at all points (support lines of the graph). Now, if we define :

x_0:=\int_\Omega g\, d\mu,

because of the existence of subderivatives for convex functions, we may choose

a

and

b

such that :

ax + b \leq \varphi(x),

for all real

x

and :

ax_0+ b = \varphi(x_0).

But then we have that :

\varphi \circ g (\omega) \geq ag(\omega)+ b

for almost all

\omega \in \Omega

. Since we have a probability measure, the integral is monotone with

\mu(\Omega) = 1

so that :

\int_\Omega \varphi \circ g\, d\mu  \geq \int_\Omega (ag + b)\, d\mu  = a\int_\Omega g\, d\mu + b\int_\Omega d\mu = ax_0 + b = \varphi (x_0) = \varphi \left (\int_\Omega g\, d\mu \right ),

as desired.

Proof 3 (general inequality in a probabilistic setting)

Let ''X'' be an integrable random variable that takes values in a real topological vector space ''T''. Since

\varphi: T \to \R

is convex, for any

x,y \in T

, the quantity :

\frac,

is decreasing as approaches 0⁺. In particular, the ''subdifferential'' of

\varphi

evaluated at in the direction is well-defined by :

(D\varphi)(x)\cdot y:=\lim_ \frac=\inf_ \frac.

It is easily seen that the subdifferential is linear in (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for , one gets :

\varphi(x)\leq \varphi(x+y)-(D\varphi)(x)\cdot y.

In particular, for an arbitrary sub--algebra

\mathfrak

we can evaluate the last inequality when

x = \operatorname \mid\mathfrak \,y=X-\operatorname \mid\mathfrak /math> to obtain

: \varphi(\operatorname \mid\mathfrak \leq \varphi(X)-(D\varphi)(\operatorname \mid\mathfrak \cdot (X-\operatorname \mid\mathfrak . Now, if we take the expectation conditioned to \mathfrak on both sides of the previous expression, we get the result since:

: \operatorname \left [\left[(D\varphi)(\operatorname \mid\mathfrak \cdot (X-\operatorname \mid\mathfrak \right]\mid\mathfrak \right] = (D\varphi)(\operatorname \mid\mathfrak \cdot \operatorname[\left( X-\operatorname[X\mid\mathfrak] \right) \mid \mathfrak]=0, by the linearity of the subdifferential in the ''y'' variable, and the following well-known property of the

conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – give ...

: :

\right)_\mid\mathfrak_\right_.html" ;"title="\mid\mathfrak.html" ;"title="\left(\operatorname[X\mid\mathfrak">\left(\operatorname[X\mid\mathfrak\right) \mid\mathfrak \right ">\mid\mathfrak.html" ;"title="\left(\operatorname[X\mid\mathfrak">\left(\operatorname[X\mid\mathfrak\right) \mid\mathfrak \right = \operatorname[ X \mid\mathfrak].

Applications and special cases

Form involving a probability density function

Suppose is a measurable subset of the real line and ''f''(''x'') is a non-negative function such that :

\int_^\infty f(x)\,dx = 1.

In probabilistic language, ''f'' is a

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

. Then Jensen's inequality becomes the following statement about convex integrals: If ''g'' is any real-valued measurable function and

\varphi

is convex over the range of ''g'', then :

\varphi\left(\int_^\infty g(x)f(x)\, dx\right) \le \int_^\infty \varphi(g(x)) f(x)\, dx.

If ''g''(''x'') = ''x'', then this form of the inequality reduces to a commonly used special case: :

\varphi\left(\int_^\infty x\, f(x)\, dx\right) \le \int_^\infty \varphi(x)\,f(x)\, dx.

This is applied in

Variational Bayesian methods Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...

Example: even moments of a random variable

If ''g''(''x'') = ''x²ⁿ'', and ''X'' is a random variable, then ''g'' is convex as :

\frac(x) = 2n(2n - 1)x^ \geq 0\quad \forall\ x \in \R

and so :

g(\operatorname = (\operatorname^ \leq\operatorname^

In particular, if some even moment ''2n'' of ''X'' is finite, ''X'' has a finite mean. An extension of this argument shows ''X'' has finite moments of every order

l\in\N

dividing ''n''.

Alternative finite form

Let and take to be the

counting measure In mathematics, specifically measure theory, the counting measure is an intuitive way to put a measure on any set – the "size" of a subset is taken to be the number of elements in the subset if the subset has finitely many elements, and infinity ...

on , then the general form reduces to a statement about sums: :

\varphi\left(\sum_^ g(x_i)\lambda_i \right) \le \sum_^ \varphi(g(x_i)) \lambda_i,

provided that and :

\lambda_1 + \cdots + \lambda_n = 1.

There is also an infinite discrete form.

Statistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving: :

e^ \leq \operatorname \left e^X \right

where the

s are with respect to some

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

in the

. Proof: Let

\varphi(x) = e^x

\varphi\left(\operatorname right) \leq \operatorname \left \varphi(X) \right

Information theory

If is the true probability density for , and is another density, then applying Jensen's inequality for the random variable and the convex function gives :

\operatorname

varphi(Y) Phi (; uppercase Φ, lowercase φ or ϕ; grc, ϕεῖ ''pheî'' ; Modern Greek: ''fi'' ) is the 21st letter of the Greek alphabet. In Archaic and Classical Greek (c. 9th century BC to 4th century BC), it represented an aspirated voicele ...

\ge \varphi(\operatorname Therefore: :

-D(p(x)\, q(x))=\int p(x) \log \left (\frac \right ) \, dx \le \log \left ( \int p(x) \frac\,dx \right ) = \log \left (\int q(x)\,dx \right ) =0

a result called

Gibbs' inequality 200px, Josiah Willard Gibbs In information theory, Gibbs' inequality is a statement about the information entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequ ...

. It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities ''p'' rather than any other distribution ''q''. The quantity that is non-negative is called the

Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...

of ''q'' from ''p''. Since is a strictly convex function for , it follows that equality holds when equals almost everywhere.

Rao–Blackwell theorem

If ''L'' is a convex function and

\mathfrak

a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get :

L(\operatorname delta(X) \mid \mathfrak \le \operatorname (\delta(X)) \mid \mathfrak \quad \Longrightarrow \quad \operatorname delta(X)_\mid_\mathfrak .html" ;"title="(\operatorname delta(X) \mid \mathfrak ">(\operatorname delta(X) \mid \mathfrak \le \operatorname (\delta(X))

So if δ(''X'') is some

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

of an unobserved parameter θ given a vector of observables ''X''; and if ''T''(''X'') is a

sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...

for θ; then an improved estimator, in the sense of having a smaller expected loss ''L'', can be obtained by calculating :

\delta_1 (X) = \operatorname_delta(X') \mid T(X')= T(X)

the expected value of δ with respect to θ, taken over all possible vectors of observations ''X'' compatible with the same value of ''T''(''X'') as that observed. Further, because T is a sufficient statistics,

\delta_1 (X)

does not depend on θ, hence, becomes a statistics. This result is known as the

Rao–Blackwell theorem In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squa ...

Financial Performance Simulation

A popular method of measuring the investment performance of an investment is the

Internal Rate of Return Internal rate of return (IRR) is a method of calculating an investment’s rate of return. The term ''internal'' refers to the fact that the calculation excludes external factors, such as the risk-free rate, inflation, the cost of capital, or fin ...

(IRR) which is the rate by which a series of uncertain future cash flows are discounted using Present Value Theory to cause the sum of the future cash flows to equal the initial investment. While it is tempting to perform Monte Carlo simulation of the IRR, Jensen's Inequality introduces a bias due to fact that the IRR function is a curved function and the expectation operator is a linear function.

Notes

References

* *

Tristan Needham Tristan Needham is a British mathematician and professor of mathematics at the University of San Francisco. Education, career and publications Tristan is the son of social anthropologist Rodney Needham of Oxford, England. He attended the Dragon ...

(1993) "A Visual Explanation of Jensen's Inequality",

American Mathematical Monthly ''The American Mathematical Monthly'' is a mathematical journal founded by Benjamin Finkel in 1894. It is published ten times each year by Taylor & Francis for the Mathematical Association of America. The ''American Mathematical Monthly'' is an e ...

100(8):768–71. * * * *Sam Savage (2012
The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty
(1st ed.) Wiley. ISBN 978-0471381976

External links

Jensen's Operator Inequality
of Hansen and Pedersen. * * * {{Convex analysis and variational analysis Convex analysis Inequalities Probabilistic inequalities Statistical inequalities Theorems in analysis Theorems involving convexity Articles containing proofs

Statements

Finite form

Measure-theoretic and probabilistic form

General inequality in a probabilistic setting

A sharpened and generalized form

Proofs

Proof 1 (finite form)

Proof 2 (measure-theoretic form)

Proof 3 (general inequality in a probabilistic setting)

Applications and special cases

Form involving a probability density function

Example: even moments of a random variable

Alternative finite form

Statistical physics

Information theory

Rao–Blackwell theorem

Financial Performance Simulation

See also

Notes

References

External links