In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and
theoretical computer science
Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory.
It is difficult to circumsc ...
, McDiarmid's inequality is a
concentration inequality
In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random vari ...
which bounds the deviation between the sampled value and the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
of certain functions when they are evaluated on
independent random variables
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independen ...
. McDiarmid's inequality applies to functions that satisfy a ''bounded differences'' property, meaning that replacing a single argument to the function while leaving all other arguments unchanged cannot cause too large of a change in the value of the function.
=Statement=
A function
satisfies the ''bounded differences property'' if substituting the value of the
th coordinate
changes the value of
by at most
. More formally, if there are constants
such that for all
Extensions
Unbalanced distributions
A stronger bound may be given when the arguments to the function are sampled from unbalanced distributions, such that resampling a single argument rarely causes a large change to the function value.
This may be used to characterize, for example, the value of a function on
graphs
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
* Graph (topology), a topological space resembling a graph in the sense of discr ...
when evaluated on sparse
random graphs
In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs ...
and
hypergraphs
In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.
Formally, an undirected hypergraph H is a pair H = (X,E) ...
, since in a sparse random graph, it is much more likely for any particular edge to be missing than to be present.
Differences bounded with high probability
McDiarmid's inequality may be extended to the case where the function being analyzed does not strictly satisfy the bounded differences property, but large differences remain very rare.
There exist stronger refinements to this analysis in some distribution-dependent scenarios, such as those that arise in
learning theory.
Sub-Gaussian and sub-exponential norms
Let the ''
kth centered conditional version'' of a function
f be
:
f_k(X)(x) := f(x_1, \ldots, x_, X_k, x_, \ldots, x_n) - \mathbb_f(x_1, \ldots, x_, X'_k, x_, \ldots, x_n),
so that
f_k(X) is a random variable depending on random values of
x_1, \ldots, x_, x_, \ldots, x_n.
Bennett and Bernstein forms
Refinements to McDiarmid's inequality in the style of
Bennett's inequality In probability theory, Bennett's inequality provides an upper bound on the probability that the sum of independent random variables deviates from its expected value by more than any specified amount. Bennett's inequality was proved by George Bennett ...
and
Bernstein inequalities are made possible by defining a variance term for each function argument. Let
:
\begin
B &:= \max_ \sup_ \left, f(x_1, \dots, x_, X_k, x_, \dots, x_n) - \mathbb_f(x_1, \dots, x_, X_k, x_, \dots, x_n)\, \\
V_k &:= \sup_ \mathbb_ \left(f(x_1, \dots, x_, X_k, x_, \dots, x_n) - \mathbb_f(x_1, \dots, x_, X_k, x_, \dots, x_n)\right)^2, \\
\tilde \sigma^2 &:= \sum_^n V_k.
\end
=Proof=
The following proof of McDiarmid's inequality
[ constructs the Doob martingale tracking the ]conditional expected value
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...
of the function as more and more of its arguments are sampled and conditioned on, and then applies a martingale concentration inequality (Azuma's inequality In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences.
Suppose \ is a martingale (or super-martingale) a ...
).
An alternate argument avoiding the use of martingales also exists, taking advantage of the independence of the function arguments to provide a Chernoff-bound-like argument.[
For better readability, we will introduce a notational shorthand: z_ will denote z_i, \dots, z_j for any z \in \mathcal^n and integers 1 \le i \le j \le n, so that, for example,
: f(X_, y, x_) := f(X_1, \ldots, X_, y, x_, \ldots, x_n).
Pick any x_1', x_2', \ldots, x_n'. Then, for any x_1, x_2, \ldots, x_n, by ]triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.
This statement permits the inclusion of degenerate triangles, bu ...
,
:
\begin
&, f(x_) - f(x'_), \\ pt\leq & , f(x_) - f(x'_, x_n), + c_n\\
\leq & , f(x_) - f(x'_, x_), + c_ + c_n\\
\leq & \ldots \\
\leq & \sum_^n c_i ,
\end
and thus f is bounded.
Since f is bounded, define the Doob martingale \ (each Z_i being a random variable depending on the random values of X_1, \ldots, X_i) as
: Z_i:=\mathbb (X_) \mid X_ /math>
for all i\geq 1 and Z_0: = \mathbb(X_)
X, or x, is the twenty-fourth and third-to-last Letter (alphabet), letter in the Latin alphabet, used in the English alphabet, modern English alphabet, the alphabets of other western European languages and others worldwide. Its English a ...
/math>, so that Z_n = f(X_).
Now define the random variables for each i
:
\begin
U_i &:= \sup_ \mathbb (X_, x, X_) \mid X_, X_i = x- \mathbb (X_, X_) \mid X_ \\
L_i &:= \inf_ \mathbb (X_, x, X_) \mid X_, X_i = x- \mathbb (X_, X_) \mid X_ \\
\end
Since X_i, \ldots, X_n are independent of each other, conditioning on X_i = x does not affect the probabilities of the other variables, so these are equal to the expressions
:
\begin
U_i &= \sup_ \mathbb (X_, x, X_) - f(X_, X_) \mid X_ \\
L_i &= \inf_ \mathbb (X_, x, X_) - f(X_, X_) \mid X_ \\
\end
Note that L_i \leq Z_i - Z_ \leq U_i. In addition,
:
\begin
U_i - L_i &= \sup_
\mathbb (X_, u, X_) \mid X_-\mathbb (X_, \ell, X_) \mid X_\\ pt&=\sup_
\mathbb (X_, u, X_) - f(X_, l, X_) \mid X_\\
&\leq \sup_
\mathbb _i \mid X_ \\ pt&\leq c_i
\end
Then, applying the general form of Azuma's inequality to \left\, we have
:
\text(f(X_1, \ldots, X_n) - \mathbb (X_1, \ldots, X_n) \geq \varepsilon )
= \operatorname(Z_n - Z_0 \geq \varepsilon)
\leq \exp \left(-\frac\right).
The one-sided bound in the other direction is obtained by applying Azuma's inequality to \left\ and the two-sided bound follows from a union bound
In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individu ...
. \square
See also
=References=
{{Portal bar, Mathematics, Science, Technology
Probabilistic inequalities
Statistical inequalities
Martingale theory