HOME
*





Bretagnolle–Huber Inequality
In information theory, the Bretagnolle–Huber inequality bounds the total variation distance between two probability distributions P and Q by a concave and bounded function of the Kullback–Leibler divergence D_\mathrm(P \parallel Q). The bound can be viewed as an alternative to the well-known Pinsker's inequality: when D_\mathrm(P \parallel Q) is large (larger than 2 for instance.), Pinsker's inequality is vacuous, while Bretagnolle–Huber remains bounded and hence non-vacuous. It is used in statistics and machine learning to prove information-theoretic lower bounds relying on hypothesis testing Formal statement Preliminary definitions Let P and Q be two probability distributions on a measurable space (\mathcal, \mathcal). Recall that the total variation between P and Q is defined by : d_\mathrm(P,Q) = \sup_ \. The Kullback-Leibler divergence is defined as follows: :D_\mathrm(P \parallel Q) = \begin \int_ \log\bigl(\frac\bigr)\, dP & \text P \ll Q, \\ mm+\infty & \tex ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Total Variation
In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function ''f'', defined on an interval 'a'', ''b''⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation ''x'' ↦ ''f''(''x''), for ''x'' ∈ 'a'', ''b'' Functions whose total variation is finite are called functions of bounded variation. Historical note The concept of total variation for functions of one real variable was first introduced by Camille Jordan in the paper . He used the new concept in order to prove a convergence theorem for Fourier series of discontinuous periodic functions whose variation is bounded. The extension of the concept to functions of more than one variable however is not simple for various reasons. Definitions Total variation for functions of one real variable Th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Kullback–Leibler Divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different from a second, reference probability distribution ''Q''. A simple interpretation of the KL divergence of ''P'' from ''Q'' is the expected excess surprise from using ''Q'' as a model when the actual distribution is ''P''. While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalized Pythagorean theorem (which applies to squared distances). In the simple case, a relative entropy of 0 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Pinsker's Inequality
In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that bounds the total variation distance (or statistical distance) in terms of the Kullback–Leibler divergence. The inequality is tight up to constant factors. Formal statement Pinsker's inequality states that, if P and Q are two probability distributions on a measurable space (X, \Sigma), then :\delta(P,Q) \le \sqrt, where :\delta(P,Q)=\sup \bigl\ is the total variation distance (or statistical distance) between P and Q and :D_(P\parallel Q) = \operatorname_P \left( \log \frac \right) = \int_X \left( \log \frac \right) \, \mathrm P is the Kullback–Leibler divergence in nats. When the sample space X is a finite set, the Kullback–Leibler divergence is given by : D_(P\parallel Q) = \sum_ \left( \log \frac\right) P(i)\! Note that in terms of the total variation norm \, P - Q \, of the signed measure P - Q, Pinsker's inequality differs from the one given a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hypothesis Testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. History Early use While hypothesis testing was popularized early in the 20th century, early forms were used in the 1700s. The first use is credited to John Arbuthnot (1710), followed by Pierre-Simon Laplace (1770s), in analyzing the human sex ratio at birth; see . Modern origins and early controversy Modern significance testing is largely the product of Karl Pearson ( ''p''-value, Pearson's chi-squared test), William Sealy Gosset ( Student's t-distribution), and Ronald Fisher ("null hypothesis", analysis of variance, "significance test"), while hypothesis testing was developed by Jerzy Neyman and Egon Pearson (son of Karl). Ronald Fisher began his life in statistics as a Bayesian (Zabell 1992), but Fisher soon grew disenchanted with t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Probability Distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of the sample space). For instance, if is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of would take the value 0.5 (1 in 2 or 1/2) for , and 0.5 for (assuming that the coin is fair). Examples of random phenomena include the weather conditions at some future date, the height of a randomly selected person, the fraction of male students in a school, the results of a survey to be conducted, etc. Introduction A probability distribution is a mathematical description of the probabilities of events, subsets of the sample space. The sample space, often denoted by \Omega, is the set of all possible outcomes of a random phe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Measurable Space
In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured. Definition Consider a set X and a σ-algebra \mathcal A on X. Then the tuple (X, \mathcal A) is called a measurable space. Note that in contrast to a measure space, no measure is needed for a measurable space. Example Look at the set: X = \. One possible \sigma-algebra would be: \mathcal A_1 = \. Then \left(X, \mathcal A_1\right) is a measurable space. Another possible \sigma-algebra would be the power set on X: \mathcal A_2 = \mathcal P(X). With this, a second measurable space on the set X is given by \left(X, \mathcal A_2\right). Common measurable spaces If X is finite or countably infinite, the \sigma-algebra is most often the power set on X, so \mathcal A = \mathcal P(X). This leads to the measurable space (X, \mathcal P(X)). If X is a topological space In mathematics, a topological space is, rou ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Absolute Continuity
In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central operations of calculus— differentiation and integration. This relationship is commonly characterized (by the fundamental theorem of calculus) in the framework of Riemann integration, but with absolute continuity it may be formulated in terms of Lebesgue integration. For real-valued functions on the real line, two interrelated notions appear: absolute continuity of functions and absolute continuity of measures. These two notions are generalized in different directions. The usual derivative of a function is related to the '' Radon–Nikodym derivative'', or ''density'', of a measure. We have the following chains of inclusions for functions over a compact subset of the real line: : ''absolutely continuous'' ⊆ ''uniformly continuous'' = ''con ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bhattacharyya Distance
In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. It is not a metric, despite named a "distance", since it does not obey the triangle inequality. Definition For probability distributions P and Q on the same domain \mathcal, the Bhattacharyya distance is defined as :D_B(P,Q) = -\ln \left( BC(P,Q) \right) where :BC(P,Q) = \sum_ \sqrt is the Bhattacharyya coefficient for discrete probability distributions. For continuous probability distributions, with P(dx) = p(x)dx and Q(dx) = q(x) dx where p(x) and q(x) are the probability density functions, the Bhattacharyya coefficient is defined as :BC(P,Q) = \int_ \sqrt\, dx. More generally, given two probability measures P, Q on a measurable space (\mathcal X, \mathcal B), let \lambda be a ( sigma finite) measure such that P and Q are absolute ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Jensen's Inequality
In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations. Jensen's inequality generalizes the statement that the secant line of a convex function lies ''above'' the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for ''t'' ∈  ,1, :t f(x_1) + (1-t) f(x_2), while t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bernoulli Distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q = 1-p. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/ yes/true/ one with probability ''p'' and failure/no/ false/zero with probability ''q''. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and ''p'' would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and ''p'' would be the probability of tails). In particular, unfair coins ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]