McDiarmid's Inequality
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
, McDiarmid's inequality is a
concentration inequality In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random vari ...
which bounds the deviation between the sampled value and the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of certain functions when they are evaluated on
independent random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
. McDiarmid's inequality applies to functions that satisfy a ''bounded differences'' property, meaning that replacing a single argument to the function while leaving all other arguments unchanged cannot cause too large of a change in the value of the function. =Statement= A function f: \mathcal_1 \times \mathcal_2 \times \cdots \times \mathcal_n \rightarrow \mathbb satisfies the ''bounded differences property'' if substituting the value of the ith coordinate x_i changes the value of f by at most c_i. More formally, if there are constants c_1, c_2, \dots, c_n such that for all i\in /math>, and all x_1\in \mathcal_1,\,x_2\in \mathcal_2,\, \ldots,\, x_n \in \mathcal_n, : \sup_ \left, f(x_1, \dots, x_, x_i, x_, \ldots, x_n) - f(x_1, \dots, x_, x_i', x_, \ldots, x_n)\ \leq c_i.


Extensions


Unbalanced distributions

A stronger bound may be given when the arguments to the function are sampled from unbalanced distributions, such that resampling a single argument rarely causes a large change to the function value. This may be used to characterize, for example, the value of a function on
graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
when evaluated on sparse
random graphs In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs li ...
and
hypergraphs In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices. Formally, an undirected hypergraph H is a pair H = (X,E) w ...
, since in a sparse random graph, it is much more likely for any particular edge to be missing than to be present.


Differences bounded with high probability

McDiarmid's inequality may be extended to the case where the function being analyzed does not strictly satisfy the bounded differences property, but large differences remain very rare. There exist stronger refinements to this analysis in some distribution-dependent scenarios, such as those that arise in learning theory.


Sub-Gaussian and sub-exponential norms

Let the ''kth centered conditional version'' of a function f be : f_k(X)(x) := f(x_1, \ldots, x_, X_k, x_, \ldots, x_n) - \mathbb_f(x_1, \ldots, x_, X'_k, x_, \ldots, x_n), so that f_k(X) is a random variable depending on random values of x_1, \ldots, x_, x_, \ldots, x_n.


Bennett and Bernstein forms

Refinements to McDiarmid's inequality in the style of Bennett's inequality and Bernstein inequalities are made possible by defining a variance term for each function argument. Let : \begin B &:= \max_ \sup_ \left, f(x_1, \dots, x_, X_k, x_, \dots, x_n) - \mathbb_f(x_1, \dots, x_, X_k, x_, \dots, x_n)\, \\ V_k &:= \sup_ \mathbb_ \left(f(x_1, \dots, x_, X_k, x_, \dots, x_n) - \mathbb_f(x_1, \dots, x_, X_k, x_, \dots, x_n)\right)^2, \\ \tilde \sigma^2 &:= \sum_^n V_k. \end =Proof= The following proof of McDiarmid's inequality constructs the
Doob martingale In the probability theory, mathematical theory of probability, a Doob martingale (named after Joseph L. Doob, also known as a Levy martingale) is a stochastic process that approximates a given random variable and has the Martingale (probability t ...
tracking the
conditional expected value In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...
of the function as more and more of its arguments are sampled and conditioned on, and then applies a martingale concentration inequality (
Azuma's inequality In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration inequality, concentration result for the values of martingale (probability theory), martingales that have bounded differe ...
). An alternate argument avoiding the use of martingales also exists, taking advantage of the independence of the function arguments to provide a Chernoff-bound-like argument. For better readability, we will introduce a notational shorthand: z_ will denote z_i, \dots, z_j for any z \in \mathcal^n and integers 1 \le i \le j \le n, so that, for example, : f(X_, y, x_) := f(X_1, \ldots, X_, y, x_, \ldots, x_n). Pick any x_1', x_2', \ldots, x_n'. Then, for any x_1, x_2, \ldots, x_n, by
triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but ...
, : \begin &, f(x_) - f(x'_), \\ pt\leq & , f(x_) - f(x'_, x_n), + c_n\\ \leq & , f(x_) - f(x'_, x_), + c_ + c_n\\ \leq & \ldots \\ \leq & \sum_^n c_i , \end and thus f is bounded. Since f is bounded, define the Doob martingale \ (each Z_i being a random variable depending on the random values of X_1, \ldots, X_i) as : Z_i:=\mathbb (X_) \mid X_ /math> for all i\geq 1 and Z_0: = \mathbb (X_)/math>, so that Z_n = f(X_). Now define the random variables for each i : \begin U_i &:= \sup_ \mathbb (X_, x, X_) \mid X_, X_i = x- \mathbb (X_, X_) \mid X_ \\ L_i &:= \inf_ \mathbb (X_, x, X_) \mid X_, X_i = x- \mathbb (X_, X_) \mid X_ \\ \end Since X_i, \ldots, X_n are independent of each other, conditioning on X_i = x does not affect the probabilities of the other variables, so these are equal to the expressions : \begin U_i &= \sup_ \mathbb (X_, x, X_) - f(X_, X_) \mid X_ \\ L_i &= \inf_ \mathbb (X_, x, X_) - f(X_, X_) \mid X_ \\ \end Note that L_i \leq Z_i - Z_ \leq U_i. In addition, : \begin U_i - L_i &= \sup_ \mathbb (X_, u, X_) \mid X_-\mathbb (X_, \ell, X_) \mid X_\\ pt&=\sup_ \mathbb (X_, u, X_) - f(X_, l, X_) \mid X_\\ &\leq \sup_ \mathbb _i \mid X_ \\ pt&\leq c_i \end Then, applying the general form of Azuma's inequality to \left\, we have : \text(f(X_1, \ldots, X_n) - \mathbb (X_1, \ldots, X_n) \geq \varepsilon ) = \operatorname(Z_n - Z_0 \geq \varepsilon) \leq \exp \left(-\frac\right). The one-sided bound in the other direction is obtained by applying Azuma's inequality to \left\ and the two-sided bound follows from a
union bound In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individ ...
. \square


See also

=References= {{Portal bar, Mathematics, Science, Technology Probabilistic inequalities Statistical inequalities Martingale theory