The concept of a normalizing constant arises in
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
and a variety of other areas of
mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.
Definition
In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, a normalizing constant is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
or a
probability mass function
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
.
Examples
If we start from the simple
Gaussian function
In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form
f(x) = \exp (-x^2)
and with parametric extension
f(x) = a \exp\left( -\frac \right)
for arbitrary real constants , and non-zero . It is n ...
we have the corresponding
Gaussian integral
The Gaussian integral, also known as the Euler–Poisson integral, is the integral of the Gaussian function f(x) = e^ over the entire real line. Named after the German mathematician Carl Friedrich Gauss, the integral is
\int_^\infty e^\,dx = \s ...
Now if we use the latter's
reciprocal value as a normalizing constant for the former, defining a function
as
so that its
integral
In mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented i ...
is unit
then the function
is a probability density function. This is the density of the standard
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
. (''Standard'', in this case, means the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
is 0 and the
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
is 1.)
And constant
is the normalizing constant of function
.
Similarly,
and consequently
is a probability mass function on the set of all nonnegative integers. This is the probability mass function of the
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
with expected value λ.
Note that if the probability density function is a function of various parameters, so too will be its normalizing constant. The parametrised normalizing constant for the
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability t ...
plays a central role in
statistical mechanics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic be ...
. In that context, the normalizing constant is called the
partition function.
Bayes' theorem
Bayes' theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
says that the posterior probability measure is proportional to the product of the prior probability measure and the
likelihood function
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
. ''Proportional to'' implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have
:
where P(H
0) is the prior probability that the hypothesis is true; P(D, H
0) is the
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
of the data given that the hypothesis is true, but given that the data are known it is the
likelihood
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
of the hypothesis (or its parameters) given the data; P(H
0, D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality:
:
Since P(H, D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that
:
In this case, the
reciprocal
Reciprocal may refer to:
In mathematics
* Multiplicative inverse, in mathematics, the number 1/''x'', which multiplied by ''x'' gives the product 1, also known as a ''reciprocal''
* Reciprocal polynomial, a polynomial obtained from another pol ...
of the value
:
is the ''normalizing constant''. It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.
For concreteness, there are many methods of estimating the normalizing constant for practical purposes. Methods include the bridge sampling technique, the naive Monte Carlo estimator, the generalized harmonic mean estimator, and importance sampling.
Non-probabilistic uses
The
Legendre polynomials
In physical science and mathematics, Legendre polynomials (named after Adrien-Marie Legendre, who discovered them in 1782) are a system of complete and orthogonal polynomials, with a vast number of mathematical properties, and numerous applicat ...
are characterized by
orthogonality
In mathematics, orthogonality is the generalization of the geometric notion of ''perpendicularity''.
By extension, orthogonality is also used to refer to the separation of specific features of a system. The term also has specialized meanings in ...
with respect to the uniform measure on the interval
ˆ’1, 1and the fact that they are normalized so that their value at 1 is 1. The constant by which one multiplies a polynomial so its value at 1 is a normalizing constant.
Orthonormal
In linear algebra, two vectors in an inner product space are orthonormal if they are orthogonal (or perpendicular along a line) unit vectors. A set of vectors form an orthonormal set if all vectors in the set are mutually orthogonal and all of un ...
functions are normalized such that
with respect to some inner product .
The constant is used to establish the
hyperbolic functions
In mathematics, hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. Just as the points form a circle with a unit radius, the points form the right half of the u ...
cosh and sinh from the lengths of the adjacent and opposite sides of a
hyperbolic triangle
In hyperbolic geometry, a hyperbolic triangle is a triangle in the hyperbolic plane. It consists of three line segments called ''sides'' or ''edges'' and three points called ''angles'' or ''vertices''.
Just as in the Euclidean case, three po ...
.
See also
*
Normalization (statistics)
In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging ...
Notes
References
Continuous Distributionsat Department of Mathematical Sciences: University of Alabama in Huntsville
*{{cite book, last = Feller, first = William, authorlink = William Feller, title = An Introduction to Probability Theory and its Applications (volume I), publisher = John Wiley & Sons, date = 1968, isbn = 0-471-25708-7
Theory of probability distributions
1 (number)