Stein's Method

	Stein's Method Stein's method is a general method in probability theory to obtain bounds on the distance between two probability distributions with respect to a probability metric. It was introduced by Charles Stein, who first published it in 1972, to obtain a bound between the distribution of a sum of m-dependent sequence of random variables and a standard normal distribution in the Kolmogorov (uniform) metric and hence to prove not only a central limit theorem, but also bounds on the rates of convergence for the given metric. History At the end of the 1960s, unsatisfied with the by-then known proofs of a specific central limit theorem, Charles Stein developed a new way of proving the theorem for his statistics lecture.Charles Stein: The Invariant, the Direct and the "Pretentious" . Interview given ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Probability Theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of non-deterministic or uncertain processes or measured quantities that may either be single occurrences or evolve over time in a random fashion). Although it is not possible to perfectly predict random events, much can be said about their behavior. Two major results in probability ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Stein Discrepancy A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers,J. Gorham and L. Mackey. Measuring Sample Quality with Stein's Method. Advances in Neural Information Processing Systems, 2015. but has since been used in diverse settings in statistics, machine learning and computer science. Definition Let \mathcal be a measurable space and let \mathcal be a set of measurable functions of the form m : \mathcal \rightarrow \mathbb. A natural notion of distance between two probability distributions P, Q, defined on \mathcal, is provided by an integral probability metric : (1.1) \quad d_(P , Q) := \sup_ , \mathbb_(X)- \mathbb_(Y) , where for the purposes of exposition we assume that the expectations exist, and that the set \mathcal is sufficiently rich that (1.1) is indeed a metric on the set of probability distributions on \mathcal, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Independent And Identically Distributed Random Variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as ''i.i.d.'', ''iid'', or ''IID''. IID was first defined in statistics and finds application in different fields such as data mining and signal processing. Introduction In statistics, we commonly deal with random samples. A random sample can be thought of as a set of objects that are chosen randomly. Or, more formally, it’s “a sequence of independent, identically distributed (IID) random variables”. In other words, the terms ''random sample'' and ''IID'' are basically one and the same. In statistics, we usually say “random sample,” but in probability it’s more common to say “IID.” * Identically Distributed means that there are no overall trends–the distribution doesn’t fluctuate and all items in t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by \sigma^2, s^2, \operatorname(X), V(X), or \mathbb(X). An advantage of variance as a measure of dispersion is that it is more amenable to algebraic manipulation than other measures of dispersion such as the expected absolute deviation; for e ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Supremum Norm In mathematical analysis, the uniform norm (or ) assigns to real- or complex-valued bounded functions defined on a set the non-negative number :\, f\, _\infty = \, f\, _ = \sup\left\. This norm is also called the , the , the , or, when the supremum is in fact the maximum, the . The name "uniform norm" derives from the fact that a sequence of functions converges to under the metric derived from the uniform norm if and only if converges to uniformly. If is a continuous function on a closed and bounded interval, or more generally a compact set, then it is bounded and the supremum in the above definition is attained by the Weierstrass extreme value theorem, so we can replace the supremum by the maximum. In this case, the norm is also called the . In particular, if is some vector such that x = \left(x_1, x_2, \ldots, x_n\right) in finite dimensional coordinate space, it takes the form: :\, x\, _\infty := \max \left(\left, x_1\ , \ldots , \left, x_n\\right). Metric and ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Difference Operator In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter k that is independent of n; this number k is called the ''order'' of the relation. If the values of the first k numbers in the sequence have been given, the rest of the sequence can be calculated by repeatedly applying the equation. In ''linear recurrences'', the th term is equated to a linear function of the k previous terms. A famous example is the recurrence for the Fibonacci numbers, F_n=F_+F_ where the order k is two and the linear function merely adds the two previous terms. This example is a linear recurrence with constant coefficients, because the coefficients of the linear function (1 and 1) are constants that do not depend on n. For these recurrences, one can express the general term of the sequence as a closed-form expression of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Differential Operator In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and returns another function (in the style of a higher-order function in computer science). This article considers mainly linear differential operators, which are the most common type. However, non-linear differential operators also exist, such as the Schwarzian derivative. Definition An order-m linear differential operator is a map A from a function space \mathcal_1 to another function space \mathcal_2 that can be written as: A = \sum_a_\alpha(x) D^\alpha\ , where \alpha = (\alpha_1,\alpha_2,\cdots,\alpha_n) is a multi-index of non-negative integers, , \alpha, = \alpha_1 + \alpha_2 + \cdots + \alpha_n, and for each \alpha, a_\alpha(x) is a function on some open domain in ''n''-dimensional space. The operator D^\alpha is interpreted as D^\alp ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	CRC Press The CRC Press, LLC is an American publishing group that specializes in producing technical books. Many of their books relate to engineering, science and mathematics. Their scope also includes books on business, forensics and information technology. CRC Press is now a division of Taylor & Francis, itself a subsidiary of Informa. History The CRC Press was founded as the Chemical Rubber Company (CRC) in 1903 by brothers Arthur, Leo and Emanuel Friedman in Cleveland, Ohio, based on an earlier enterprise by Arthur, who had begun selling rubber laboratory aprons in 1900. The company gradually expanded to include sales of laboratory equipment to chemists. In 1913 the CRC offered a short (116-page) manual called the ''Rubber Handbook'' as an incentive for any purchase of a dozen aprons. Since then the ''Rubber Handbook'' has evolved into the CRC's flagship book, the '' CRC Handbook of Chemistry and Physics''. In 1964, Chemical Rubber decided to focus on its publishing ventures ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Stein's Lemma Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory. The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed. Statement of the lemma Suppose ''X'' is a normally distributed random variable with expectation μ and variance σ2. Further suppose ''g'' is a function for which the two expectations E(''g''(''X'') (''X'' − μ)) and E(''g'' ′(''X'')) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.) Then :E\bigl(g(X)(X-\mu)\bigr)=\sigma^2 E\bigl(g'(X)\bigr). In general, suppose ''X'' and ''Y'' are jointly normally d ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Lipschitz Continuity In mathematical analysis, Lipschitz continuity, named after German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for functions. Intuitively, a Lipschitz continuous function is limited in how fast it can change: there exists a real number such that, for every pair of points on the graph of this function, the absolute value of the slope of the line connecting them is not greater than this real number; the smallest such bound is called the ''Lipschitz constant'' of the function (or '' modulus of uniform continuity''). For instance, every function that has bounded first derivatives is Lipschitz continuous. In the theory of differential equations, Lipschitz continuity is the central condition of the Picard–Lindelöf theorem which guarantees the existence and uniqueness of the solution to an initial value problem. A special type of Lipschitz continuity, called contraction, is used in the Banach fixed-point theorem. We have the following chain of strict inclus ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Wasserstein Metric In mathematics, the Wasserstein distance or Kantorovich– Rubinstein metric is a distance function defined between probability distributions on a given metric space M. It is named after Leonid Vaseršteĭn. Intuitively, if each distribution is viewed as a unit amount of earth (soil) piled on ''M'', the metric is the minimum "cost" of turning one pile into the other, which is assumed to be the amount of earth that needs to be moved times the mean distance it has to be moved. This problem was first formalised by Gaspard Monge in 1781. Because of this analogy, the metric is known in computer science as the earth mover's distance. The name "Wasserstein distance" was coined by R. L. Dobrushin in 1970, after learning of it in the work of Leonid Vaseršteĭn on Markov processes describing large systems of automata (Russian, 1969). However the metric was first defined by Leonid Kantorovich in ''The Mathematical Method of Production Planning and Organization'' (Russian original 1939 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Indicator Function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\in A, and \mathbf_(x)=0 otherwise, where \mathbf_A is a common notation for the indicator function. Other common notations are I_A, and \chi_A. The indicator function of is the Iverson bracket of the property of belonging to ; that is, :\mathbf_(x)= \in A For example, the Dirichlet function is the indicator function of the rational numbers as a subset of the real numbers. Definition The indicator function of a subset of a set is a function \mathbf_A \colon X \to \ defined as \mathbf_A(x) := \begin 1 ~&\text~ x \in A~, \\ 0 ~&\text~ x \notin A~. \end The Iverson bracket provides the equivalent notation, \in A/math> or to be used instead of \mathbf_(x)\,. The function \mathbf_A is sometimes denoted , , , or even just . Nota ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]