probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one. For example, a Gaussian function can be normalized into a probability density function, which gives the standard normal distribution. In Bayes' theorem, a normalizing constant is used to ensure that the sum of all possible hypotheses equals 1. Other uses of normalizing constants include making the value of a Legendre polynomial at 1 and in the orthogonality of orthonormal functions. A similar concept has been used in areas other than probability, such as for polynomials.

Definition

, a normalizing constant is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

or a

probability mass function In probability and statistics, a probability mass function (sometimes called ''probability function'' or ''frequency function'') is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes i ...

Examples

If we start from the simple

Gaussian function In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function (mathematics), function of the base form f(x) = \exp (-x^2) and with parametric extension f(x) = a \exp\left( -\frac \right) for arbitrary real number, rea ...

p(x) = e^, \quad x\in(-\infty,\infty)

we have the corresponding

Gaussian integral The Gaussian integral, also known as the Euler–Poisson integral, is the integral of the Gaussian function f(x) = e^ over the entire real line. Named after the German mathematician Carl Friedrich Gauss, the integral is \int_^\infty e^\,dx = \s ...

\int_^\infty p(x) \, dx = \int_^\infty e^ \, dx = \sqrt,

Now if we use the latter's reciprocal value as a normalizing constant for the former, defining a function

\varphi(x)

\varphi(x) = \frac p(x) = \frac e^

so that its

integral In mathematics, an integral is the continuous analog of a Summation, sum, which is used to calculate area, areas, volume, volumes, and their generalizations. Integration, the process of computing an integral, is one of the two fundamental oper ...

is unit

\int_^\infty \varphi(x) \, dx = \int_^\infty \frac e^ \, dx = 1

then the function

\varphi(x)

is a probability density function. This is the density of the standard

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

. (''Standard'', in this case, means the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

is 0 and the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

is 1.) And constant

\frac

is the normalizing constant of function

p(x)

. Similarly,

\sum_^\infty \frac = e^ ,

and consequently

f(n) = \frac

is a probability mass function on the set of all nonnegative integers. This is the probability mass function of the

Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...

with expected value λ. Note that if the probability density function is a function of various parameters, so too will be its normalizing constant. The parametrised normalizing constant for the

Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability tha ...

plays a central role in

statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...

. In that context, the normalizing constant is called the partition function.

Bayes' theorem

Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...

says that the posterior probability measure is proportional to the product of the prior probability measure and the

likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...

. ''Proportional to'' implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have

P(H_0, D) = \frac

where P(H₀) is the prior probability that the hypothesis is true; P(D, H₀) is the

conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...

of the data given that the hypothesis is true, but given that the data are known it is the

likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...

of the hypothesis (or its parameters) given the data; P(H₀, D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality:

P(H_0, D) \propto P(D, H_0)P(H_0).

Since P(H, D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that

P(H_0, D) = \frac .

In this case, the reciprocal of the value

P(D) = \sum_i P(D, H_i)P(H_i) \;

is the ''normalizing constant''. It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral. For concreteness, there are many methods of estimating the normalizing constant for practical purposes. Methods include the bridge sampling technique, the naive Monte Carlo estimator, the generalized harmonic mean estimator, and importance sampling.

Non-probabilistic uses

The

Legendre polynomials In mathematics, Legendre polynomials, named after Adrien-Marie Legendre (1782), are a system of complete and orthogonal polynomials with a wide number of mathematical properties and numerous applications. They can be defined in many ways, and t ...

are characterized by

orthogonality In mathematics, orthogonality is the generalization of the geometric notion of '' perpendicularity''. Although many authors use the two terms ''perpendicular'' and ''orthogonal'' interchangeably, the term ''perpendicular'' is more specifically ...

with respect to the uniform measure on the interval ��1, 1and the fact that they are normalized so that their value at 1 is 1. The constant by which one multiplies a polynomial so its value at 1 is a normalizing constant. Orthonormal functions are normalized such that

\langle f_i , \, f_j \rangle = \, \delta_

with respect to some inner product . The constant is used to establish the

hyperbolic functions In mathematics, hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. Just as the points form a circle with a unit radius, the points form the right half of the u ...

cosh and sinh from the lengths of the adjacent and opposite sides of a

hyperbolic triangle In hyperbolic geometry, a hyperbolic triangle is a triangle in the hyperbolic plane. It consists of three line segments called ''sides'' or ''edges'' and three point (geometry), points called ''angles'' or ''vertices''. Just as in the Euclidea ...

References

* {{refend Theory of probability distributions 1 (number)

Definition

Examples

Bayes' theorem

Non-probabilistic uses

See also

References