Hyperprior
   HOME

TheInfoList



OR:

In
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, a hyperprior is a
prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
on a
hyperparameter In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to mo ...
, that is, on a parameter of a
prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
. As with the term ''hyperparameter,'' the use of ''hyper'' is to distinguish it from a prior distribution of a parameter of the model for the underlying system. They arise particularly in the use of
hierarchical model A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containin ...
s. For example, if one is using a
beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
to model the distribution of the parameter ''p'' of a
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...
, then: * The Bernoulli distribution (with parameter ''p'') is the ''model'' of the underlying system; * ''p'' is a ''parameter'' of the underlying system (Bernoulli distribution); * The beta distribution (with parameters ''α'' and ''β'') is the ''prior'' distribution of ''p''; * ''α'' and ''β'' are parameters of the prior distribution (beta distribution), hence ''hyperparameters;'' * A prior distribution of ''α'' and ''β'' is thus a ''hyperprior.'' In principle, one can iterate the above: if the hyperprior itself has hyperparameters, these may be called hyperhyperparameters, and so forth. One can analogously call the posterior distribution on the hyperparameter the hyperposterior, and, if these are in the same family, call them conjugate hyperdistributions or a conjugate hyperprior. However, this rapidly becomes very abstract and removed from the original problem.


Purpose

Hyperpriors, like conjugate priors, are a computational convenience – they do not change the process of Bayesian inference, but simply allow one to more easily describe and compute with the prior.


Uncertainty

Firstly, use of a hyperprior allows one to express uncertainty in a hyperparameter: taking a fixed prior is an assumption, varying a hyperparameter of the prior allows one to do sensitivity analysis on this assumption, and taking a distribution on this hyperparameter allows one to express uncertainty in this assumption: "assume that the prior is of this form (this parametric family), but that we are uncertain as to precisely what the values of the parameters should be".


Mixture distribution

More abstractly, if one uses a hyperprior, then the prior distribution (on the parameter of the underlying model) itself is a
mixture density In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collectio ...
: it is the weighted average of the various prior distributions (over different hyperparameters), with the hyperprior being the weighting. This adds additional possible distributions (beyond the parametric family one is using), because parametric families of distributions are generally not
convex set In geometry, a subset of a Euclidean space, or more generally an affine space over the reals, is convex if, given any two points in the subset, the subset contains the whole line segment that joins them. Equivalently, a convex set or a convex r ...
s – as a mixture density is a
convex combination In convex geometry and vector algebra, a convex combination is a linear combination of points (which can be vectors, scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other word ...
of distributions, it will in general lie ''outside'' the family. For instance, the mixture of two normal distributions is not a normal distribution: if one takes different means (sufficiently distant) and mix 50% of each, one obtains a bimodal distribution, which is thus not normal. In fact, the convex hull of normal distributions is dense in all distributions, so in some cases, you can arbitrarily closely approximate a given prior by using a family with a suitable hyperprior. What makes this approach particularly useful is if one uses conjugate priors: individual conjugate priors have easily computed posteriors, and thus a mixture of conjugate priors is the same mixture of posteriors: one only needs to know how each conjugate prior changes. Using a single conjugate prior may be too restrictive, but using a mixture of conjugate priors may give one the desired distribution in a form that is easy to compute with. This is similar to decomposing a function in terms of eigenfunctions – see Conjugate prior: Analogy with eigenfunctions.


Dynamical system

A hyperprior is a distribution on the space of possible hyperparameters. If one is using conjugate priors, then this space is preserved by moving to posteriors – thus as data arrives, the distribution changes, but remains on this space: as data arrives, the distribution evolves as a
dynamical system In mathematics, a dynamical system is a system in which a Function (mathematics), function describes the time dependence of a Point (geometry), point in an ambient space. Examples include the mathematical models that describe the swinging of a ...
(each point of hyperparameter space evolving to the updated hyperparameters), over time converging, just as the prior itself converges.


References


Further reading

* {{cite book , last=Bernardo , first=J. M. , last2=Smith , first2=A. F. M. , year=2000 , title=Bayesian Theory , location=New York , publisher=Wiley , isbn=0-471-49464-X , url=https://books.google.com/books?id=11nSgIcd7xQC Bayesian statistics