Concentration Inequality
   HOME
*





Concentration Inequality
In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables. Concentration inequalities can be sorted according to how much information about the random variable is needed in order to use them. Markov's inequality Let X be a random variable that is non-negative (almost surely). Then, for every constant a > 0, : \Pr(X \geq a) \leq \frac. Note the following extension to Markov's inequality: if \Phi is a strictly increasing and non-negative function, then :\Pr(X \geq a) = \Pr(\Phi (X) \geq \Phi (a)) \leq \frac. Cheb ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Probability Theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of non-deterministic or uncertain processes or measured quantities that may either be single occurrences or evolve over time in a random fashion). Although it is not possible to perfectly predict random events, much can be said about their behavior. Two major results in probability ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bernstein Inequalities (probability Theory)
In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X''''n'' be independent Bernoulli random variables taking values +1 and −1 with probability 1/2 (this distribution is also known as the Rademacher distribution), then for every positive \varepsilon, :\mathbb\left (\left, \frac\sum_^n X_i\ > \varepsilon \right ) \leq 2\exp \left (-\frac \right). Bernstein inequalities were proved and published by Sergei Bernstein in the 1920s and 1930s.J.V.Uspensky, "Introduction to Mathematical Probability", McGraw-Hill Book Company, 1937 Later, these inequalities were rediscovered several times in various forms. Thus, special cases of the Bernstein inequalities are also known as the Chernoff bound, Hoeffding's inequality and Azuma's inequality. Some of the inequalities 1. Let X_1, \ldots, X_n be independent zero-mean random variables. Suppose that , X_i, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Khintchine Inequality
In mathematics, the Khintchine inequality, named after Aleksandr Khinchin and spelled in multiple ways in the Latin alphabet, is a theorem from probability, and is also frequently used in analysis. Heuristically, it says that if we pick N complex numbers x_1,\dots,x_N \in\mathbb, and add them together each multiplied by a random sign \pm 1 , then the expected value of the sum's modulus, or the modulus it will be closest to on average, will be not too far off from \sqrt. Statement Let \_^N be i.i.d. random variables with P(\varepsilon_n=\pm1)=\frac12 for n=1,\ldots, N, i.e., a sequence with Rademacher distribution. Let 0 and let x_1,\ldots,x_N\in \mathbb. Then : A_p \left( \sum_^N , x_n, ^2 \right)^ \leq \left(\operatorname \left, \sum_^N \varepsilon_n x_n\^p \right)^ \leq B_p \left(\sum_^N , x_n, ^2\right)^ for some constants A_p,B_p>0 depending only on p (see
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Paley–Zygmund Inequality
In mathematics, the Paley–Zygmund inequality bounds the probability that a positive random variable is small, in terms of its first two moments. The inequality was proved by Raymond Paley and Antoni Zygmund. Theorem: If ''Z'' ≥ 0 is a random variable with finite variance, and if 0 \le \theta \le 1, then : \operatorname( Z > \theta\operatorname ) \ge (1-\theta)^2 \frac. Proof: First, : \operatorname = \operatorname Z \, \mathbf_ + \operatorname Z \, \mathbf_ The first addend is at most \theta \operatorname /math>, while the second is at most \operatorname ^2 \operatorname( Z > \theta\operatorname ^ by the Cauchy–Schwarz inequality. The desired inequality then follows. ∎ Related inequalities The Paley–Zygmund inequality can be written as : \operatorname( Z > \theta \operatorname ) \ge \frac. This can be improved. By the Cauchy–Schwarz inequality, : \operatorname - \theta \operatorname[Z \le \operatorname[ (Z - \theta \operatorname \mathbf_ ">.ht ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Graph Theory
In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are connected by '' edges'' (also called ''links'' or ''lines''). A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics. Definitions Definitions in graph theory vary. The following are some of the more basic ways of defining graphs and related mathematical structures. Graph In one restricted but very common sense of the term, a graph is an ordered pair G=(V,E) comprising: * V, a set of vertices (also called nodes or points); * E \subseteq \, a set of edges (also called links or lines), which are unordered pairs of vertices (that is, an edge is associated with t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Theory Of Computing
''Theory of Computing'' is a peer-reviewed open access scientific journal covering theoretical computer science. The journal was established in 2005 and is published by the Department of Computer Science of the University of Chicago. The editor-in-chief is László Babai László "Laci" Babai (born July 20, 1950, in Budapest) a fellow of the American Academy of Arts and Sciences, and won the Knuth Prize. Babai was an invited speaker at the International Congresses of Mathematicians in Kyoto (1990), Zürich (1994, ... (University of Chicago). External links * Publications established in 2005 Creative Commons Attribution-licensed journals Computer science journals University of Chicago {{compu-journal-stub ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Gap-Hamming Problem
In communication complexity, the gap-Hamming problem asks, if Alice and Bob are each given a (potentially different) string, what is the minimal number of bits that they need to exchange in order for Alice to approximately compute the Hamming distance between their strings. The solution to the problem roughly states that, if Alice and Bob are each given a string, then any communication protocol used to compute the Hamming distance between their strings does (asymptotically) no better than Bob sending his whole string to Alice. More specifically, if Alice and Bob are each given n-bit strings, there exists no communication protocol that lets Alice compute the hamming distance between their strings to within \pm\sqrt using less than \Omega(n) bits. The gap-Hamming problem has applications to proving lower bounds for many streaming algorithms, including moment frequency estimation and entropy estimation. Formal statement In this problem, Alice and Bob each receive a string, x \in \ ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Communication Complexity
In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties (traditionally called Alice and Bob) each receive a (potentially different) n-bit string x and y. The ''goal'' is for Alice to compute the value of a certain function, f(x, y), that depends on both x and y, with the least amount of communication between them. While Alice and Bob can always succeed by having Bob send his whole n-bit string to Alice (who then computes the function f), the idea here is to find clever ways of calculating ''f'' with fewer than n bits of communication. Note that, unlike in computational complexity theory, communication complexity is not concerned with th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Empirical Distribution Function
In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by at each of the data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value. The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution, according to the Glivenko–Cantelli theorem. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function. Definition Let be independent, identically distributed real random variables with the common cumulative distribut ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Independent And Identically Distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as ''i.i.d.'', ''iid'', or ''IID''. IID was first defined in statistics and finds application in different fields such as data mining and signal processing. Introduction In statistics, we commonly deal with random samples. A random sample can be thought of as a set of objects that are chosen randomly. Or, more formally, it’s “a sequence of independent, identically distributed (IID) random variables”. In other words, the terms ''random sample'' and ''IID'' are basically one and the same. In statistics, we usually say “random sample,” but in probability it’s more common to say “IID.” * Identically Distributed means that there are no overall trends–the distribution doesn’t fluctuate and all items in the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cumulative Distribution Function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Every probability distribution supported on the real numbers, discrete or "mixed" as well as continuous, is uniquely identified by an ''upwards continuous'' ''monotonic increasing'' cumulative distribution function F : \mathbb R \rightarrow ,1/math> satisfying \lim_F(x)=0 and \lim_F(x)=1. In the case of a scalar continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are also used to specify the distribution of multivariate random variables. Definition The cumulative distribution function of a real-valued random variable X is the function given by where the right-hand side represents the probability that the random variable X takes on a value less tha ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Total Variation Distance Of Probability Measures
In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance. Definition Consider a measurable space (\Omega, \mathcal) and probability measures P and Q defined on (\Omega, \mathcal). The total variation distance between P and Q is defined as: :\delta(P,Q)=\sup_\left, P(A)-Q(A)\. Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event. Properties Relation to other distances The total variation distance is related to the Kullback–Leibler divergence by Pinsker’s inequality: :\delta(P,Q) \le \sqrt. One also has the following inequality, due to Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when D_(P\parallel Q)>2: :\delta(P,Q) \le \s ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]