HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a concentration parameter is a special kind of numerical parameter of a
parametric family In mathematics and its applications, a parametric family or a parameterized family is a indexed family, family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters. Common examples are p ...
of
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s. Concentration parameters occur in two kinds of distribution: In the
Von Mises–Fisher distribution In directional statistics, the von Mises–Fisher distribution (named after Richard von Mises and Ronald Fisher), is a probability distribution on the (p-1)-sphere in \mathbb^. If p=2 the distribution reduces to the von Mises distribution on the ...
, and in conjunction with distributions whose domain is a probability distribution, such as the symmetric Dirichlet distribution and the
Dirichlet process In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a p ...
. The rest of this article focuses on the latter usage. The larger the value of the concentration parameter, the more evenly distributed is the resulting distribution (the more it tends towards the uniform distribution). The smaller the value of the concentration parameter, the more sparsely distributed is the resulting distribution, with most values or ranges of values having a probability near zero (in other words, the more it tends towards a distribution concentrated on a single point, the
degenerate distribution In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter ...
defined by the
Dirac delta function In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire ...
).


Dirichlet distribution

In the case of multivariate Dirichlet distributions, there is some confusion over how to define the concentration parameter. In the topic modelling literature, it is often defined as the sum of the individual Dirichlet parameters, when discussing symmetric Dirichlet distributions (where the parameters are the same for all dimensions) it is often defined to be the value of the single Dirichlet parameter used in all dimensions. This second definition is smaller by a factor of the dimension of the distribution. A concentration parameter of 1 (or ''k'', the dimension of the Dirichlet distribution, by the definition used in the topic modelling literature) results in all sets of probabilities being equally likely, i.e., in this case the Dirichlet distribution of dimension ''k'' is equivalent to a uniform distribution over a ''k-1''-dimensional simplex. Note that this is ''not'' the same as what happens when the concentration parameter tends towards infinity. In the former case, all resulting distributions are equally likely (the distribution over distributions is uniform). In the latter case, only near-uniform distributions are likely (the distribution over distributions is highly peaked around the uniform distribution). Meanwhile, in the limit as the concentration parameter tends towards zero, only distributions with nearly all mass concentrated on one of their components are likely (the distribution over distributions is highly peaked around the ''k'' possible
Dirac delta distribution In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire ...
s centered on one of the components, or in terms of the ''k''-dimensional simplex, is highly peaked at corners of the simplex).


Sparse prior

An example of where a sparse prior (concentration parameter much less than 1) is called for, consider a
topic model In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden ...
, which is used to learn the topics that are discussed in a set of documents, where each "topic" is described using a
categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can ...
over a vocabulary of words. A typical vocabulary might have 100,000 words, leading to a 100,000-dimensional categorical distribution. The
prior distribution In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
for the parameters of the categorical distribution would likely be a symmetric Dirichlet distribution. However, a coherent topic might only have a few hundred words with any significant probability mass. Accordingly, a reasonable setting for the concentration parameter might be 0.01 or 0.001. With a larger vocabulary of around 1,000,000 words, an even smaller value, e.g. 0.0001, might be appropriate.


See also

*
Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \bold ...
*
Dirichlet process In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a p ...
* Pitman–Yor process *
Location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
*
Scale parameter In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution. Definition If a family o ...


References

{{reflist Statistical parameters