The partition function or configuration integral, as used in

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...

and

dynamical systems In mathematics, a dynamical system is a system in which a Function (mathematics), function describes the time dependence of a Point (geometry), point in an ambient space, such as in a parametric curve. Examples include the mathematical models ...

, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the Hopfield network), and applications such as genomics,

corpus linguistics Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...

and

artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...

, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom. The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate expectation values and Green's functions, forming a bridge to Fredholm theory. It also provides a natural setting for the

information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to proba ...

approach to information theory, where the

Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability distributions. It can be used to calculate the ...

can be understood to be a correlation function derived from the partition function; it happens to define a

Riemannian manifold In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the N-sphere, n-sphere, hyperbolic space, and smooth surf ...

. When the setting for random variables is on complex projective space or projective Hilbert space, geometrized with the Fubini–Study metric, the theory of

quantum mechanics Quantum mechanics is the fundamental physical Scientific theory, theory that describes the behavior of matter and of light; its unusual characteristics typically occur at and below the scale of atoms. Reprinted, Addison-Wesley, 1989, It is ...

and more generally

quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines Field theory (physics), field theory and the principle of relativity with ideas behind quantum mechanics. QFT is used in particle physics to construct phy ...

results. In these theories, the partition function is heavily exploited in the path integral formulation, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued simplex of probability theory, an extra factor of ''i'' appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.

Definition

Given a set of

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

X_i

taking on values

x_i

, and some sort of potential function or

Hamiltonian Hamiltonian may refer to: * Hamiltonian mechanics, a function that represents the total energy of a system * Hamiltonian (quantum mechanics), an operator corresponding to the total energy of that system ** Dyall Hamiltonian, a modified Hamiltonian ...

H(x_1,x_2,\dots)

, the partition function is defined as

Z(\beta) = \sum_ \exp \left(-\beta H(x_1,x_2,\dots) \right)

The function ''H'' is understood to be a real-valued function on the space of states

\

, while

\beta

is a real-valued free parameter (conventionally, the inverse temperature). The sum over the

x_i

is understood to be a sum over all possible values that each of the random variables

X_i

may take. Thus, the sum is to be replaced by an

integral In mathematics, an integral is the continuous analog of a Summation, sum, which is used to calculate area, areas, volume, volumes, and their generalizations. Integration, the process of computing an integral, is one of the two fundamental oper ...

when the

X_i

are continuous, rather than discrete. Thus, one writes

Z(\beta) = \int \exp \left(-\beta H(x_1,x_2,\dots) \right) \, dx_1 \, dx_2 \cdots

for the case of continuously-varying

X_i

. When ''H'' is an observable, such as a finite-dimensional matrix or an infinite-dimensional Hilbert space operator or element of a C-star algebra, it is common to express the summation as a trace, so that

Z(\beta) = \operatorname\left(\exp\left(-\beta H\right)\right)

When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be trace class, that is, of a form such that the summation exists and is bounded. The number of variables

X_i

need not be

countable In mathematics, a Set (mathematics), set is countable if either it is finite set, finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function fro ...

, in which case the sums are to be replaced by functional integrals. Although there are many notations for functional integrals, a common one would be

\right)

Such is the case for the partition function in quantum field theory. A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a generating function for correlation functions. This is discussed in greater detail below.

The parameter ''β''

The role or meaning of the parameter

\beta

can be understood in a variety of different ways. In classical thermodynamics, it is an inverse temperature. More generally, one would say that it is the variable that is conjugate to some (arbitrary) function

H

of the random variables

X

. The word ''conjugate'' here is used in the sense of conjugate generalized coordinates in

Lagrangian mechanics In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the d'Alembert principle of virtual work. It was introduced by the Italian-French mathematician and astronomer Joseph-Louis Lagrange in his presentation to the ...

, thus, properly

\beta

is a Lagrange multiplier. It is not uncommonly called the generalized force. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the expectation value of

H

, even as many different

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

s can give rise to exactly this same (fixed) value. For the general case, one considers a set of functions

\

that each depend on the random variables

X_i

. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of Lagrange multipliers. In the general case, maximum entropy methods illustrate the manner in which this is done. Some specific examples are in order. In basic thermodynamics problems, when using the canonical ensemble, the use of just one parameter

\beta

reflects the fact that there is only one expectation value that must be held constant: the free energy (due to conservation of energy). For chemistry problems involving chemical reactions, the grand canonical ensemble provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the fugacity, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms). For the general case, one has

Z(\beta) = \sum_ \exp \left(-\sum_k\beta_k H_k(x_i) \right)

with

\beta = (\beta_1, \beta_2, \dots)

a point in a space. For a collection of observables

H_k

, one would write

Z(\beta) = \operatorname\left,\exp \left(-\sum_k\beta_k H_k\right)\right /math>

As before, it is presumed that the argument of  is trace class .

The corresponding Gibbs measure then provides a probability distribution such that the expectation value of each H_k is a fixed value. More precisely, one has \frac \left(- \log Z \right) = \langle H_k\rangle = \mathrm\left_k\right /math>

with the angle brackets \langle H_k \rangle denoting the expected value of H_k, and \operatorname,\cdot\, /math> being a common alternative notation.  A precise definition of this expectation value is given below.

Although the value of \beta is commonly taken to be real, it need not be, in general; this is discussed in the section Normalization below. The values of \beta can be understood to be the coordinates of points in a space; this space is in fact a

manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...

, as sketched below. The study of these spaces as manifolds constitutes the field of

Symmetry

The potential function itself commonly takes the form of a sum:

H(x_1,x_2,\dots) = \sum_s V(s)\,

where the sum over ''s'' is a sum over some subset of the power set ''P''(''X'') of the set

X = \lbrace x_1,x_2,\dots \rbrace

. For example, in

statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...

, such as the

Ising model The Ising model (or Lenz–Ising model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical models in physics, mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that r ...

, the sum is over pairs of nearest neighbors. In probability theory, such as Markov networks, the sum might be over the cliques of a graph; so, for the Ising model and other

lattice models Lattice may refer to: Arts and design * Latticework, an ornamental criss-crossed framework, an arrangement of crossing laths or other thin strips of material * Lattice (music), an organized grid model of pitch ratios * Lattice (pastry), an ...

, the maximal cliques are edges. The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the action of a group symmetry, such as translational invariance. Such symmetries can be discrete or continuous; they materialize in the correlation functions for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice versa). This symmetry has a critically important interpretation in probability theory: it implies that the Gibbs measure has the Markov property; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the equivalence classes of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as Hopfield networks.

As a measure

The value of the expression

\exp \left(-\beta H(x_1,x_2,\dots) \right)

can be interpreted as a likelihood that a specific configuration of values

(x_1,x_2,\dots)

occurs in the system. Thus, given a specific configuration

(x_1,x_2,\dots)

P(x_1,x_2,\dots) = \frac \exp \left(-\beta H(x_1,x_2,\dots) \right)

is the

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

of the configuration

(x_1,x_2,\dots)

occurring in the system, which is now properly normalized so that

0\le P(x_1,x_2,\dots)\le 1

, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide a measure (a

) on the probability space; formally, it is called the Gibbs measure. It generalizes the narrower concepts of the grand canonical ensemble and canonical ensemble in statistical mechanics. There exists at least one configuration

(x_1,x_2,\dots)

for which the probability is maximized; this configuration is conventionally called the ground state. If the configuration is unique, the ground state is said to be non-degenerate, and the system is said to be ergodic; otherwise the ground state is degenerate. The ground state may or may not commute with the generators of the symmetry; if commutes, it is said to be an invariant measure. When it does not commute, the symmetry is said to be spontaneously broken. Conditions under which a ground state exists and is unique are given by the Karush–Kuhn–Tucker conditions; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.

Normalization

The values taken by

\beta

depend on the mathematical space over which the random field varies. Thus, real-valued random fields take values on a simplex: this is the geometrical way of saying that the sum of probabilities must total to one. For quantum mechanics, the random variables range over complex projective space (or complex-valued projective Hilbert space), where the random variables are interpreted as probability amplitudes. The emphasis here is on the word ''projective'', as the amplitudes are still normalized to one. The normalization for the potential function is the Jacobian for the appropriate mathematical space: it is 1 for ordinary probabilities, and ''i'' for Hilbert space; thus, in

, one sees

it H

in the exponential, rather than

\beta H

. The partition function is very heavily exploited in the path integral formulation of quantum field theory, to great effect. The theory there is very nearly identical to that presented here, aside from this difference, and the fact that it is usually formulated on four-dimensional space-time, rather than in a general way.

Expectation values

The partition function is commonly used as a probability-generating function for expectation values of various functions of the random variables. So, for example, taking

\beta

as an adjustable parameter, then the derivative of

\log(Z(\beta))

with respect to

\beta

= \langle H \rangle = -\frac

gives the average (expectation value) of ''H''. In physics, this would be called the average

energy Energy () is the physical quantity, quantitative physical property, property that is transferred to a physical body, body or to a physical system, recognizable in the performance of Work (thermodynamics), work and in the form of heat and l ...

of the system. Given the definition of the probability measure above, the expectation value of any function ''f'' of the random variables ''X'' may now be written as expected: so, for discrete-valued ''X'', one writes

\begin
\langle f\rangle
& = \sum_ f(x_1,x_2,\dots) P(x_1,x_2,\dots) \\
& = \frac \sum_ f(x_1,x_2,\dots) \exp \left(-\beta H(x_1,x_2,\dots) \right)
\end

The above notation makes sense for a finite number of discrete random variables. In more general settings, the summations should be replaced with integrals over a probability space. Thus, for example, the entropy is given by

& = -k_\text \sum_ P(x_1, x_2, \dots) \ln P(x_1,x_2,\dots) \\ & = k_\text \left(\beta \langle H\rangle + \log Z(\beta)\right) \end

The Gibbs measure is the unique statistical distribution that maximizes the entropy for a fixed expectation value of the energy; this underlies its use in maximum entropy methods.

Information geometry

The points

\beta

can be understood to form a space, and specifically, a

. Thus, it is reasonable to ask about the structure of this manifold; this is the task of

. Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definite covariance matrix

g_(\beta) = \frac \left(-\log Z(\beta)\right) = 
\langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle

This matrix is positive semi-definite, and may be interpreted as a metric tensor, specifically, a Riemannian metric. Equipping the space of Lagrange multipliers with a metric in this way turns it into a

. The study of such manifolds is referred to as

; the metric above is the

. Here,

\beta

serves as a coordinate on the manifold. It is interesting to compare the above definition to the simpler Fisher information, from which it is inspired. That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value:

\begin g_(\beta)
& = \left\langle \left(H_i - \left\langle H_i \right\rangle\right) \left( H_j - \left\langle H_j \right\rangle\right) \right\rangle \\
& = \sum_ P(x) \left(H_i - \left\langle H_i \right\rangle\right) \left( H_j - \left\langle H_j \right\rangle\right) \\
& = \sum_ P(x)
\left(H_i + \frac\right)
\left(H_j + \frac\right)
\\
& = \sum_ P(x)
\frac
\frac \\
\end

where we've written

P(x)

for

P(x_1,x_2,\dots)

and the summation is understood to be over all values of all random variables

X_k

. For continuous-valued random variables, the summations are replaced by integrals, of course. Curiously, the

can also be understood as the flat-space Euclidean metric, after appropriate change of variables, as described in the main article on it. When the

\beta

are complex-valued, the resulting metric is the Fubini–Study metric. When written in terms of mixed states, instead of pure states, it is known as the Bures metric.

Correlation functions

By introducing artificial auxiliary functions

J_k

into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing

\begin Z(\beta,J)
& = Z(\beta,J_1,J_2,\dots) \\
& = \sum_ \exp \left(-\beta H(x_1,x_2,\dots) +
\sum_n J_n x_n
\right)
\end

one then has

= \langle x_k \rangle = \left. \frac \log Z(\beta,J)\_

as the expectation value of

x_k

. In the path integral formulation of

, these auxiliary functions are commonly referred to as source fields. Multiple differentiations lead to the connected correlation functions of the random variables. Thus the correlation function

C(x_j,x_k)

between variables

x_j

and

x_k

is given by:

C(x_j,x_k) = \left.
\frac
\frac
\log Z(\beta,J)\_

Gaussian integrals

For the case where ''H'' can be written as a

quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two (" form" is another name for a homogeneous polynomial). For example, 4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong t ...

involving a differential operator, that is, as

H = \frac \sum_n x_n D x_n

then partition function can be understood to be a sum or

over Gaussians. The correlation function

C(x_j,x_k)

can be understood to be the Green's function for the differential operator (and generally giving rise to Fredholm theory). In the quantum field theory setting, such functions are referred to as propagators; higher order correlators are called n-point functions; working with them defines the effective action of a theory. When the random variables are anti-commuting

Grassmann number In mathematical physics, a Grassmann number, named after Hermann Grassmann (also called an anticommuting number or supernumber), is an element of the exterior algebra of a complex vector space. The special case of a 1-dimensional algebra is known a ...

s, then the partition function can be expressed as a determinant of the operator ''D''. This is done by writing it as a Berezin integral (also called Grassmann integral).

General properties

Partition functions are used to discuss critical scaling, universality and are subject to the renormalization group.

References

{{reflist Entropy and information