Total Variation Distance Of Probability Measures
   HOME
*





Total Variation Distance Of Probability Measures
In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance. Definition Consider a measurable space (\Omega, \mathcal) and probability measures P and Q defined on (\Omega, \mathcal). The total variation distance between P and Q is defined as: :\delta(P,Q)=\sup_\left, P(A)-Q(A)\. Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event. Properties Relation to other distances The total variation distance is related to the Kullback–Leibler divergence by Pinsker’s inequality: :\delta(P,Q) \le \sqrt. One also has the following inequality, due to Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when D_(P\parallel Q)>2: :\delta(P,Q) \le \s ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Probability Theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of non-deterministic or uncertain processes or measured quantities that may either be single occurrences or evolve over time in a random fashion). Although it is not possible to perfectly predict random events, much can be said about their behavior. Two major results in probability ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Markov Chains And Mixing Times
''Markov Chains and Mixing Times'' is a book on Markov chain mixing times. The second edition was written by David A. Levin, and Yuval Peres. Elizabeth Wilmer was a co-author on the first edition and is credited as a contributor to the second edition. The first edition was published in 2009 by the American Mathematical Society, with an expanded second edition in 2017. Background A Markov chain is a stochastic process defined by a set of states and, for each state, a probability distribution on the states. Starting from an initial state, it follows a sequence of states where each state in the sequence is chosen randomly from the distribution associated with the previous state. In that sense, it is "memoryless": each random choice depends only on the current state, and not on the past history of states. Under mild restrictions, a Markov chain with a finite set of states will have a stationary distribution that it converges to, meaning that, after a sufficiently large number of steps, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Wasserstein Metric
In mathematics, the Wasserstein distance or Kantorovich– Rubinstein metric is a distance function defined between probability distributions on a given metric space M. It is named after Leonid Vaseršteĭn. Intuitively, if each distribution is viewed as a unit amount of earth (soil) piled on ''M'', the metric is the minimum "cost" of turning one pile into the other, which is assumed to be the amount of earth that needs to be moved times the mean distance it has to be moved. This problem was first formalised by Gaspard Monge in 1781. Because of this analogy, the metric is known in computer science as the earth mover's distance. The name "Wasserstein distance" was coined by R. L. Dobrushin in 1970, after learning of it in the work of Leonid Vaseršteĭn on Markov processes describing large systems of automata (Russian, 1969). However the metric was first defined by Leonid Kantorovich in ''The Mathematical Method of Production Planning and Organization'' (Russian original 1939 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Kolmogorov–Smirnov Test
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test). In essence, the test answers the question "What is the probability that this collection of samples could have been drawn from that probability distribution?" or, in the second case, "What is the probability that these two sets of samples were drawn from the same (but unknown) probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov. The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distributio ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Total Variation
In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function ''f'', defined on an interval 'a'', ''b''⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation ''x'' ↦ ''f''(''x''), for ''x'' ∈ 'a'', ''b'' Functions whose total variation is finite are called functions of bounded variation. Historical note The concept of total variation for functions of one real variable was first introduced by Camille Jordan in the paper . He used the new concept in order to prove a convergence theorem for Fourier series of discontinuous periodic functions whose variation is bounded. The extension of the concept to functions of more than one variable however is not simple for various reasons. Definitions Total variation for functions of one real variable Th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Transportation Theory (mathematics)
In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.G. Monge. ''Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris, avec les Mémoires de Mathématique et de Physique pour la même année'', pages 666–704, 1781. In the 1920s A.N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection ''Transportation Planning Volume I'' for the National Commissariat of Transportation of the Soviet Union, he published a paper "Methods of Finding the Minimal Kilometrage in Cargo-transportation in space". Major advances were made in the field during World War II by the Soviet mathematician and economist Leonid Kantorovich.L. Kantorovich. ''On the translocation of masses.'' C.R. (Doklady) Acad. Sci. URSS (N.S. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Lp Space
In mathematics, the spaces are function spaces defined using a natural generalization of the Norm (mathematics)#p-norm, -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Nicolas Bourbaki, Bourbaki group they were first introduced by Frigyes Riesz . spaces form an important class of Banach spaces in functional analysis, and of topological vector spaces. Because of their key role in the mathematical analysis of measure and probability spaces, Lebesgue spaces are used also in the theoretical discussion of problems in physics, statistics, economics, finance, engineering, and other disciplines. Applications Statistics In statistics, measures of central tendency and statistical dispersion, such as the mean, median, and standard deviation, are defined in terms of metrics, and measures of central tendency can be characterized as Central tendency#Solutions to variational problems, solutions to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Hellinger Distance
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance. Definition Measure theory To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures on a measure space \mathcal that are absolutely continuous with respect to an auxiliary measure \lambda. Such a measure always exists, e.g \lambda = (P + Q). The square of the Hellinger distance between P and Q is defined as the quantity :H^2(P,Q) = \frac\displaystyle \int_ \left(\sqrt - \sqrt\right)^2 \lambda(dx). Here, P(dx) = p(x)\lambda(dx) and Q(dx) = q(x) \lambda(dx), i.e. p and q(x) = are the Radon–Nikodym derivatives of ''P'' and ''Q'' respe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


L1 Norm
In mathematics, the spaces are function spaces defined using a natural generalization of the -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Bourbaki group they were first introduced by Frigyes Riesz . spaces form an important class of Banach spaces in functional analysis, and of topological vector spaces. Because of their key role in the mathematical analysis of measure and probability spaces, Lebesgue spaces are used also in the theoretical discussion of problems in physics, statistics, economics, finance, engineering, and other disciplines. Applications Statistics In statistics, measures of central tendency and statistical dispersion, such as the mean, median, and standard deviation, are defined in terms of metrics, and measures of central tendency can be characterized as solutions to variational problems. In penalized regression, "L1 penalty" and "L2 penalty" refer to penaliz ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Statistical Distance
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points. A distance between populations can be interpreted as measuring the distance between two probability distributions and hence they are essentially measures of distances between probability measures. Where statistical distance measures relate to the differences between random variables, these may have statistical dependence,Dodge, Y. (2003)—entry for distance and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values. Statistical distance measures are not typically m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bretagnolle–Huber Inequality
In information theory, the Bretagnolle–Huber inequality bounds the total variation distance between two probability distributions P and Q by a concave and bounded function of the Kullback–Leibler divergence D_\mathrm(P \parallel Q). The bound can be viewed as an alternative to the well-known Pinsker's inequality: when D_\mathrm(P \parallel Q) is large (larger than 2 for instance.), Pinsker's inequality is vacuous, while Bretagnolle–Huber remains bounded and hence non-vacuous. It is used in statistics and machine learning to prove information-theoretic lower bounds relying on hypothesis testing Formal statement Preliminary definitions Let P and Q be two probability distributions on a measurable space (\mathcal, \mathcal). Recall that the total variation between P and Q is defined by : d_\mathrm(P,Q) = \sup_ \. The Kullback-Leibler divergence is defined as follows: :D_\mathrm(P \parallel Q) = \begin \int_ \log\bigl(\frac\bigr)\, dP & \text P \ll Q, \\ mm+\infty & \tex ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Pinsker's Inequality
In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that bounds the total variation distance (or statistical distance) in terms of the Kullback–Leibler divergence. The inequality is tight up to constant factors. Formal statement Pinsker's inequality states that, if P and Q are two probability distributions on a measurable space (X, \Sigma), then :\delta(P,Q) \le \sqrt, where :\delta(P,Q)=\sup \bigl\ is the total variation distance (or statistical distance) between P and Q and :D_(P\parallel Q) = \operatorname_P \left( \log \frac \right) = \int_X \left( \log \frac \right) \, \mathrm P is the Kullback–Leibler divergence in nats. When the sample space X is a finite set, the Kullback–Leibler divergence is given by : D_(P\parallel Q) = \sum_ \left( \log \frac\right) P(i)\! Note that in terms of the total variation norm \, P - Q \, of the signed measure P - Q, Pinsker's inequality differs from the one given a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]