Conjugate Prior Distribution

	Conjugate Prior Distribution In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function p(x \mid \theta). A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior; otherwise, numerical integration may be necessary. Further, conjugate priors may give intuition by more transparently showing how a likelihood function updates a prior distribution. The concept, as well as the term "conjugate prior", were introduced by Howard Raiffa and Robert Schlaifer in their work on Bayesian decision theory.Howard Raiffa and Robert Schlaifer. ''Applied Statistical Decision Theory''. Division of Research, Graduate School of Business Administration, Harvard University, 1961. A similar concept had been discovered independently by George Alfred ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bayesian Probability Bayesian probability is an Probability interpretations, interpretation of the concept of probability, in which, instead of frequentist probability, frequency or propensity probability, propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief. The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with Hypothesis, hypotheses; that is, with propositions whose truth value, truth or falsity is unknown. In the Bayesian view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability. Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies a prior probability. This, in turn, is then updated to a posterior probability in the light of new, re ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Uniform Distribution (continuous) In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, ''a'' and ''b'', which are the minimum and maximum values. The interval can either be closed (e.g. , b or open (e.g. (a, b)). Therefore, the distribution is often abbreviated ''U'' (''a'', ''b''), where U stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable ''X'' under no constraint other than that it is contained in the distribution's support. Definitions Probability density function The probability density function of the continuous uniform distribution is: : f(x)=\begin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Hyperprior In Bayesian statistics, a hyperprior is a prior distribution on a hyperparameter, that is, on a parameter of a prior distribution. As with the term ''hyperparameter,'' the use of ''hyper'' is to distinguish it from a prior distribution of a parameter of the model for the underlying system. They arise particularly in the use of hierarchical models. For example, if one is using a beta distribution to model the distribution of the parameter ''p'' of a Bernoulli distribution, then: * The Bernoulli distribution (with parameter ''p'') is the ''model'' of the underlying system; * ''p'' is a ''parameter'' of the underlying system (Bernoulli distribution); * The beta distribution (with parameters ''α'' and ''β'') is the ''prior'' distribution of ''p''; * ''α'' and ''β'' are parameters of the prior distribution (beta distribution), hence ''hyperparameters;'' * A prior distribution of ''α'' and ''β'' is thus a ''hyperprior.'' In principle, one can iterate the above: if the hyperprior ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Diagonalized In linear algebra, a square matrix A is called diagonalizable or non-defective if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix P and a diagonal matrix D such that or equivalently (Such D are not unique.) For a finite-dimensional vector space a linear map T:V\to V is called diagonalizable if there exists an ordered basis of V consisting of eigenvectors of T. These definitions are equivalent: if T has a matrix representation T = PDP^ as above, then the column vectors of P form a basis consisting of eigenvectors of and the diagonal entries of D are the corresponding eigenvalues of with respect to this eigenvector basis, A is represented by Diagonalization is the process of finding the above P and Diagonalizable matrices and maps are especially easy for computations, once their eigenvalues and eigenvectors are known. One can raise a diagonal matrix D to a power by simply raising the diagonal entries to that power, and the determina ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Convex Combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other words, the operation is equivalent to a standard weighted average, but whose weights are expressed as a percent of the total weight, instead of as a fraction of the ''count'' of the weights as in a standard weighted average. More formally, given a finite number of points x_1, x_2, \dots, x_n in a real vector space, a convex combination of these points is a point of the form :\alpha_1x_1+\alpha_2x_2+\cdots+\alpha_nx_n where the real numbers \alpha_i satisfy \alpha_i\ge 0 and \alpha_1+\alpha_2+\cdots+\alpha_n=1. As a particular example, every convex combination of two points lies on the line segment between the points. A set is convex if it contains all convex combinations of its points. The convex hull of a given set of points is identical ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Operator Theory In mathematics, operator theory is the study of linear operators on function spaces, beginning with differential operators and integral operators. The operators may be presented abstractly by their characteristics, such as bounded linear operators or closed operators, and consideration may be given to nonlinear operators. The study, which depends heavily on the topology of function spaces, is a branch of functional analysis. If a collection of operators forms an algebra over a field, then it is an operator algebra. The description of operator algebras is part of operator theory. Single operator theory Single operator theory deals with the properties and classification of operators, considered one at a time. For example, the classification of normal operators in terms of their spectra falls into this category. Spectrum of operators The spectral theorem is any of a number of results about linear operators or about matrices. In broad terms the spectral theorem provides cond ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Eigenfunctions In mathematics, an eigenfunction of a linear operator ''D'' defined on some function space is any non-zero function f in that space that, when acted upon by ''D'', is only multiplied by some scaling factor called an eigenvalue. As an equation, this condition can be written as Df = \lambda f for some scalar eigenvalue \lambda. The solutions to this equation may also be subject to boundary conditions that limit the allowable eigenvalues and eigenfunctions. An eigenfunction is a type of eigenvector. Eigenfunctions In general, an eigenvector of a linear operator ''D'' defined on some vector space is a nonzero vector in the domain of ''D'' that, when ''D'' acts upon it, is simply scaled by some scalar value called an eigenvalue. In the special case where ''D'' is defined on a function space, the eigenvectors are referred to as eigenfunctions. That is, a function ''f'' is an eigenfunction of ''D'' if it satisfies the equation where λ is a scalar. The solutions to Equation may also ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Multivariate Normal Distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be ''k''-variate normally distributed if every linear combination of its ''k'' components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value. Definitions Notation and parameterization The multivariate normal distribution of a ''k''-dimensional random vector \mathbf = (X_1,\ldots,X_k)^ can be written in the following notation: : \mathbf\ \sim\ \mathcal(\boldsymbol\mu,\, \boldsymbol\Sigma), or to make it explicitly known that ''X'' i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Covariance Matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances (i.e., the covariance of each element with itself). Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions. As an example, the variation in a collection of random points in two-dimensional space cannot be characterized fully by a single number, nor would the variances in the x and y directions contain all of the necessary information; a 2 \times 2 matrix would be necessary to fully characterize the two-dimensional variation. The covariance matrix of a random vector \mathbf is typically denoted by \operatorname_ or \Sigma. Definition Throughout this article, boldfaced unsubsc ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Wishart Distribution In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions defined over symmetric, nonnegative-definite random matrices (i.e. matrix-valued random variables). In random matrix theory, the space of Wishart matrices is called the ''Wishart ensemble''. These distributions are of great importance in the estimation of covariance matrices in multivariate statistics. In Bayesian statistics, the Wishart distribution is the conjugate prior of the inverse covariance-matrix of a multivariate-normal random-vector. Definition Suppose is a matrix, each column of which is independently drawn from a -variate normal distribution with zero mean: :G_ = (g_i^1,\dots,g_i^p)^T\sim \mathcal_p(0,V). Then the Wishart distribution is the probability distribution of the random matrix :S= G G^T = \sum_^n G_G_^T kno ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Exponential Family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. The terms "distribution" and "family" are often used loosely: specifically, ''an'' exponential family is a ''set'' of distributions, where the specific distribution varies with the parameter; however, a parametric ''family'' of distributions is often referred to as "''a'' distribution" (like "the normal distribution", meaning "the family of normal distributions"), and the set of all exponential families is sometimes l ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]