Kullback's Inequality
   HOME
*





Kullback's Inequality
In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function. If ''P'' and ''Q'' are probability distributions on the real line, such that ''P'' is absolutely continuous with respect to ''Q'', i.e. ''P'' << ''Q'', and whose first moments exist, then D_(P\parallel Q) \ge \Psi_Q^*(\mu'_1(P)), where \Psi_Q^* is the rate function, i.e. the convex conjugate of the cumulant-generating function, of Q, and \mu'_1(P) is the first of P. The

picture info

Information Theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering (field), information engineering, and electrical engineering. A key measure in information theory is information entropy, entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a dice, die (with six equally likely outcomes). Some other important measures in information theory are mutual informat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Moment-generating Function
In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions. As its name implies, the moment-generating function can be used to compute a distribution’s moments: the ''n''th moment about 0 is the ''n''th derivative of the moment-generating function, evaluated at 0. In addition to real-valued distributions (univariate distributions), moment-generating functions can be defined for vector- or matrix-valued random variables, and can even be extended to more general cases. The moment-generating func ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Information Theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering (field), information engineering, and electrical engineering. A key measure in information theory is information entropy, entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a dice, die (with six equally likely outcomes). Some other important measures in information theory are mutual informat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Moment-generating Function
In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions. As its name implies, the moment-generating function can be used to compute a distribution’s moments: the ''n''th moment about 0 is the ''n''th derivative of the moment-generating function, evaluated at 0. In addition to real-valued distributions (univariate distributions), moment-generating functions can be defined for vector- or matrix-valued random variables, and can even be extended to more general cases. The moment-generating func ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Rate Function
In mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events. It is required to have several properties which assist in the formulation of the large deviation principle. In some sense, the large deviation principle is an analogue of weak convergence of probability measures, but one which takes account of how well the rare events behave. A rate function is also called a Cramér function, after the Swedish probabilist Harald Cramér. Definitions Rate function An extended real-valued function ''I'' : ''X'' →  , +∞defined on a Hausdorff topological space ''X'' is said to be a rate function if it is not identically +∞ and is lower semi-continuous, i.e. all the sub-level sets :\ \mbox c \geq 0 are closed in ''X''. If, furthermore, they are compact, then ''I'' is said to be a good rate function. A family of probability measures (''μ''''δ'')''δ'' >  ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Convex Conjugate
In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformation, or Fenchel conjugate (after Adrien-Marie Legendre and Werner Fenchel). It allows in particular for a far reaching generalization of Lagrangian duality. Definition Let X be a real topological vector space and let X^ be the dual space to X. Denote by :\langle \cdot , \cdot \rangle : X^ \times X \to \mathbb the canonical dual pairing, which is defined by \left( x^*, x \right) \mapsto x^* (x). For a function f : X \to \mathbb \cup \ taking values on the extended real number line, its is the function :f^ : X^ \to \mathbb \cup \ whose value at x^* \in X^ is defined to be the supremum: :f^ \left( x^ \right) := \sup \left\, or, equivalently, in terms of the infimum: :f^ \left( x^ \right) := - \inf \left\. This definition can be ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Large Deviations Theory
In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insurance mathematics, namely ruin theory with Cramér and Lundberg. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of ''concentration of measures'' and widely generalizes the notion of convergence of probability measures. Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or ''tail'' events. Introductory examples An elementary example Consider a sequence of independent tosses of a fair coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by where we encode head as 1 and tail as 0. Now let M_N ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Fisher Information
In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance of the score, or the expected value of the observed information. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior (according to the Bernstein–von Mises theorem, which was anticipated by Laplace for exponential families). The role of the Fisher information in the asymptotic theory of maximum-likelihood estimation was emphasized by the statistician Ronald Fisher (following some initial results by Francis Ysidro Edgeworth). The Fisher information is also used in the calculation of the Jeffreys prior, which is used in Bayesian statistics. The Fisher information matrix is used to calculate the covariance matrices associat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cumulant
In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa. The first cumulant is the mean, the second cumulant is the variance, and the third cumulant is the same as the third central moment. But fourth and higher-order cumulants are not equal to central moments. In some cases theoretical treatments of problems in terms of cumulants are simpler than those using moments. In particular, when two or more random variables are statistically independent, the -th-order cumulant of their sum is equal to the sum of their -th-order cumulants. As well, the third and higher-order cumulants of a normal distribution are zero, and it is the only distribution with this property. Just as for moments, where ''joint moments'' are used for collections of random variab ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Cramér–Rao Bound
In estimation theory and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the variance of unbiased estimators of a deterministic (fixed, though unknown) parameter, the variance of any such estimator is at least as high as the inverse of the Fisher information. Equivalently, it expresses an upper bound on the precision (the inverse of variance) of unbiased estimators: the precision of any such estimator is at most the Fisher information. The result is named in honor of Harald Cramér and C. R. Rao, but has independently also been derived by Maurice Fréchet, Georges Darmois, as well as Alexander Aitken and Harold Silverstone. An unbiased estimator that achieves this lower bound is said to be (fully) '' efficient''. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is therefore the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Gibbs' Inequality
200px, Josiah Willard Gibbs In information theory, Gibbs' inequality is a statement about the information entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality. It was first presented by J. Willard Gibbs in the 19th century. Gibbs' inequality Suppose that : P = \ is a discrete probability distribution. Then for any other probability distribution : Q = \ the following inequality between positive quantities (since pi and qi are between zero and one) holds: : - \sum_^n p_i \log p_i \leq - \sum_^n p_i \log q_i with equality if and only if : p_i = q_i for all ''i''. Put in words, the information entropy of a distribution P is less than or equal to its cross entropy with any other distribution Q. The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written: : D_(P\, Q) \equi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Natural Exponential Family
In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF). Definition Univariate case The natural exponential families (NEF) are a subset of the exponential families. A NEF is an exponential family in which the natural parameter ''η'' and the natural statistic ''T''(''x'') are both the identity. A distribution in an exponential family with parameter ''θ'' can be written with probability density function (PDF) : f_X(x\mid \theta) = h(x)\ \exp\Big(\ \eta(\theta) T(x) - A(\theta)\ \Big) \,\! , where h(x) and A(\theta) are known functions. A distribution in a natural exponential family with parameter θ can thus be written with PDF : f_X(x\mid \theta) = h(x)\ \exp\Big(\ \theta x - A(\theta)\ \Big) \,\! . [Note that slightly different notation is used by the originator of the NEF, Carl Morris.Morris C. (2006) "Natural exponential families", ''Encyclopedia of Statis ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]