probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

and

theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...

, McDiarmid's inequality is a

concentration inequality In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random vari ...

which bounds the deviation between the sampled value and the

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...

of certain functions when they are evaluated on

independent random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independen ...

. McDiarmid's inequality applies to functions that satisfy a ''bounded differences'' property, meaning that replacing a single argument to the function while leaving all other arguments unchanged cannot cause too large of a change in the value of the function. =Statement= A function

f: \mathcal_1 \times \mathcal_2 \times \cdots \times \mathcal_n \rightarrow \mathbb

satisfies the ''bounded differences property'' if substituting the value of the

i

th coordinate

x_i

changes the value of

f

by at most

c_i

. More formally, if there are constants

c_1, c_2, \dots, c_n

such that for all

i\in /math>, and all x_1\in \mathcal_1,\,x_2\in \mathcal_2,\, \ldots,\, x_n \in \mathcal_n,
: \sup_ \left, f(x_1, \dots, x_, x_i, x_, \ldots, x_n) - f(x_1, \dots, x_, x_i', x_, \ldots, x_n)\ \leq c_i.

Extensions

Unbalanced distributions

A stronger bound may be given when the arguments to the function are sampled from unbalanced distributions, such that resampling a single argument rarely causes a large change to the function value. This may be used to characterize, for example, the value of a function on

graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties * Graph (topology), a topological space resembling a graph in the sense of discr ...

when evaluated on sparse

random graphs In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs ...

and

hypergraphs In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices. Formally, an undirected hypergraph H is a pair H = (X,E) ...

, since in a sparse random graph, it is much more likely for any particular edge to be missing than to be present.

Differences bounded with high probability

McDiarmid's inequality may be extended to the case where the function being analyzed does not strictly satisfy the bounded differences property, but large differences remain very rare. There exist stronger refinements to this analysis in some distribution-dependent scenarios, such as those that arise in learning theory.

Sub-Gaussian and sub-exponential norms

Let the ''

k

th centered conditional version'' of a function

f

be :

f_k(X)(x) := f(x_1, \ldots, x_, X_k, x_, \ldots, x_n) - \mathbb_f(x_1, \ldots, x_, X'_k, x_, \ldots, x_n),

so that

f_k(X)

is a random variable depending on random values of

x_1, \ldots, x_, x_, \ldots, x_n

Bennett and Bernstein forms

Refinements to McDiarmid's inequality in the style of

Bennett's inequality In probability theory, Bennett's inequality provides an upper bound on the probability that the sum of independent random variables deviates from its expected value by more than any specified amount. Bennett's inequality was proved by George Bennett ...

and Bernstein inequalities are made possible by defining a variance term for each function argument. Let :

\begin
B &:= \max_ \sup_ \left, f(x_1, \dots, x_, X_k, x_, \dots, x_n) - \mathbb_f(x_1, \dots, x_, X_k, x_, \dots, x_n)\, \\
V_k &:= \sup_ \mathbb_ \left(f(x_1, \dots, x_, X_k, x_, \dots, x_n) - \mathbb_f(x_1, \dots, x_, X_k, x_, \dots, x_n)\right)^2, \\
\tilde \sigma^2 &:= \sum_^n V_k.
\end

=Proof= The following proof of McDiarmid's inequality constructs the Doob martingale tracking the

conditional expected value In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...

of the function as more and more of its arguments are sampled and conditioned on, and then applies a martingale concentration inequality (

Azuma's inequality In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences. Suppose \ is a martingale (or super-martingale) a ...

). An alternate argument avoiding the use of martingales also exists, taking advantage of the independence of the function arguments to provide a Chernoff-bound-like argument. For better readability, we will introduce a notational shorthand:

z_

will denote

z_i, \dots, z_j

for any

z \in \mathcal^n

and integers

1 \le i \le j \le n

, so that, for example, :

f(X_, y, x_) := f(X_1, \ldots, X_, y, x_, \ldots, x_n).

Pick any

x_1', x_2', \ldots, x_n'

. Then, for any

x_1, x_2, \ldots, x_n

, by

triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, bu ...

, :

\leq & , f(x_) - f(x'_, x_n), + c_n\\ \leq & , f(x_) - f(x'_, x_), + c_ + c_n\\ \leq & \ldots \\ \leq & \sum_^n c_i , \end

and thus

f

is bounded. Since

f

is bounded, define the Doob martingale

\

(each

Z_i

being a random variable depending on the random values of

X_1, \ldots, X_i

) as :

Z_i:=\mathbb (X_) \mid X_/math>
for all i\geq 1 and Z_0: = \mathbb

(X_) X, or x, is the twenty-fourth and third-to-last Letter (alphabet), letter in the Latin alphabet, used in the English alphabet, modern English alphabet, the alphabets of other western European languages and others worldwide. Its English a ...

/math>, so that

Z_n = f(X_)

. Now define the random variables for each

i

\\ \end

Since

X_i, \ldots, X_n

are independent of each other, conditioning on

X_i = x

does not affect the probabilities of the other variables, so these are equal to the expressions :

\\ \end

Note that

L_i \leq Z_i - Z_ \leq U_i

. In addition, :

&\leq c_i \end

Then, applying the general form of Azuma's inequality to

\left\

, we have :

\geq \varepsilon ) = \operatorname(Z_n - Z_0 \geq \varepsilon) \leq \exp \left(-\frac\right).

The one-sided bound in the other direction is obtained by applying Azuma's inequality to

\left\

and the two-sided bound follows from a

union bound In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individu ...

\square

Extensions

Unbalanced distributions

Differences bounded with high probability

Sub-Gaussian and sub-exponential norms

Bennett and Bernstein forms

See also