Sanov's Theorem

	Sanov's Theorem In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables. Let ''A'' be a set of probability distributions over an alphabet ''X'', and let ''q'' be an arbitrary distribution over ''X'' (where ''q'' may or may not be in ''A''). Suppose we draw ''n'' i.i.d. samples from ''q'', represented by the vector x^n = x_1, x_2, \ldots, x_n. Then, we have the following bound on the probability that the empirical measure \hat_ of the samples falls within the set ''A'': :q^n(\hat_\in A) \le (n+1)^ 2^, where * q^n is the joint probability distribution on X^n, and * p^* is the information projection of ''q'' onto ''A''. In words, the probability of drawing an atypical distribution is bounded by a function ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics with the major subdisciplines of number theory, algebra, geometry, and analysis, respectively. There is no general consensus among mathematicians about a common definition for their academic discipline. Most mathematical activity involves the discovery of properties of abstract objects and the use of pure reason to prove them. These objects consist of either abstractions from nature orin modern mathematicsentities that are stipulated to have certain properties, called axioms. A ''proof'' consists of a succession of applications of deductive rules to already established results. These results include previously proved theorems, axioms, andin case of abstraction from naturesome basic properties that are considered true starting points of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Information Theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering (field), information engineering, and electrical engineering. A key measure in information theory is information entropy, entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a dice, die (with six equally likely outcomes). Some other important measures in information theory are mutual informat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Typical Set In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself. This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence ''X''''n'' using ''nH''(''X'') bits on average, and, hence, justifying the use of entropy as a measure of information from a source. The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases. (Weakly) typical sequences (weak typicality, entropy typicality) If a sequence ''x''1, ..., ''x''''n'' is drawn from an i.i.d. distribution ''X ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Probability Distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of the sample space). For instance, if is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of would take the value 0.5 (1 in 2 or 1/2) for , and 0.5 for (assuming that the coin is fair). Examples of random phenomena include the weather conditions at some future date, the height of a randomly selected person, the fraction of male students in a school, the results of a survey to be conducted, etc. Introduction A probability distribution is a mathematical description of the probabilities of events, subsets of the sample space. The sample space, often denoted by \Omega, is the set of all possible outcomes of a random phe ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Large Deviations Theory In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insurance mathematics, namely ruin theory with Cramér and Lundberg. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of ''concentration of measures'' and widely generalizes the notion of convergence of probability measures. Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or ''tail'' events. Introductory examples An elementary example Consider a sequence of independent tosses of a fair coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by where we encode head as 1 and tail as 0. Now let M_N ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Rate Function In mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events. It is required to have several properties which assist in the formulation of the large deviation principle. In some sense, the large deviation principle is an analogue of weak convergence of probability measures, but one which takes account of how well the rare events behave. A rate function is also called a Cramér function, after the Swedish probabilist Harald Cramér. Definitions Rate function An extended real-valued function ''I'' : ''X'' → , +∞defined on a Hausdorff topological space ''X'' is said to be a rate function if it is not identically +∞ and is lower semi-continuous, i.e. all the sub-level sets :\ \mbox c \geq 0 are closed in ''X''. If, furthermore, they are compact, then ''I'' is said to be a good rate function. A family of probability measures (''μ''''δ'')''δ'' > ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Empirical Measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics. The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure P. We collect observations X_1, X_2, \dots , X_n and compute relative frequencies. We can estimate P, or a related distribution function F by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence. Definition Let X_1, X_2, \dots be a sequence of independent identically distributed random variables with values in the state space ''S'' with probability distribution ''P''. Definition :The ''empirical measure'' ''P''''n'' is defined for meas ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Independent And Identically Distributed Random Variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as ''i.i.d.'', ''iid'', or ''IID''. IID was first defined in statistics and finds application in different fields such as data mining and signal processing. Introduction In statistics, we commonly deal with random samples. A random sample can be thought of as a set of objects that are chosen randomly. Or, more formally, it’s “a sequence of independent, identically distributed (IID) random variables”. In other words, the terms ''random sample'' and ''IID'' are basically one and the same. In statistics, we usually say “random sample,” but in probability it’s more common to say “IID.” * Identically Distributed means that there are no overall trends–the distribution doesn’t fluctuate and all items in t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Information Projection In information theory, the information projection or I-projection of a probability distribution ''q'' onto a set of distributions ''P'' is :p^* = \underset \operatorname_(p, , q). where D_ is the Kullback–Leibler divergence from ''q'' to ''p''. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection p^* is the "closest" distribution to ''q'' of all the distributions in ''P''. The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when ''P'' is convex: \operatorname_(p, , q) \geq \operatorname_(p, , p^) + \operatorname_(p^, , q). This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space. It is worthwhile to note that since \operatorname_(p, , q) \geq 0 and continuous in p, if ''P'' is closed and non-empty, then there exists at least one minimizer to the optim ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Kullback–Leibler Divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different from a second, reference probability distribution ''Q''. A simple interpretation of the KL divergence of ''P'' from ''Q'' is the expected excess surprise from using ''Q'' as a model when the actual distribution is ''P''. While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalized Pythagorean theorem (which applies to squared distances). In the simple case, a relative entropy of 0 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Interior (topology) In mathematics, specifically in general topology, topology, the interior of a subset of a topological space is the Union (set theory), union of all subsets of that are Open set, open in . A point that is in the interior of is an interior point of . The interior of is the Absolute complement, complement of the closure (topology), closure of the complement of . In this sense interior and closure are Duality_(mathematics)#Duality_in_logic_and_set_theory, dual notions. The exterior of a set is the complement of the closure of ; it consists of the points that are in neither the set nor its boundary (topology), boundary. The interior, boundary, and exterior of a subset together partition of a set, partition the whole space into three blocks (or fewer when one or more of these is empty set, empty). Definitions Interior point If is a subset of a Euclidean space, then is an interior point of if there exists an open ball centered at which is completely contained in . (This is i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Closure (topology) In topology, the closure of a subset of points in a topological space consists of all points in together with all limit points of . The closure of may equivalently be defined as the union of and its boundary, and also as the intersection of all closed sets containing . Intuitively, the closure can be thought of as all the points that are either in or "near" . A point which is in the closure of is a point of closure of . The notion of closure is in many ways dual to the notion of interior. Definitions Point of closure For S as a subset of a Euclidean space, x is a point of closure of S if every open ball centered at x contains a point of S (this point can be x itself). This definition generalizes to any subset S of a metric space X. Fully expressed, for X as a metric space with metric d, x is a point of closure of S if for every r > 0 there exists some s \in S such that the distance d(x, s) < r ( $x = s$ is allowed). Another way to express this is to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]