The partition function or configuration integral, as used in
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
,
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
and
dynamical systems
In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a ...
, is a generalization of the definition of a
partition function in statistical mechanics
In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggre ...
. It is a special case of a
normalizing constant
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.
...
in probability theory, for the
Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more ge ...
, the
Gibbs measure, has the
Markov property
In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov propert ...
. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the
Hopfield network
A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
), and applications such as
genomics
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
,
corpus linguistics
Corpus linguistics is the study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora ...
and
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
, which employ
Markov network
In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said ...
s, and
Markov logic network
A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference. Markov logic networks generalize first-order logic, in the sense that, in a certain limit, all u ...
s. The Gibbs measure is also the unique measure that has the property of maximizing the
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
for a fixed expectation value of the energy; this underlies the appearance of the partition function in
maximum entropy method
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
s and the algorithms derived therefrom.
The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate
expectation value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
s and
Green's function
In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions.
This means that if \operatorname is the linear differenti ...
s, forming a bridge to
Fredholm theory In mathematics, Fredholm theory is a theory of integral equations. In the narrowest sense, Fredholm theory concerns itself with the solution of the Fredholm integral equation. In a broader sense, the abstract structure of Fredholm's theory is giv ...
. It also provides a natural setting for the
information geometry
Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to pro ...
approach to information theory, where the
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...
can be understood to be a
correlation function
A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables r ...
derived from the partition function; it happens to define a
Riemannian manifold
In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent space ...
.
When the setting for random variables is on
complex projective space
In mathematics, complex projective space is the projective space with respect to the field of complex numbers. By analogy, whereas the points of a real projective space label the lines through the origin of a real Euclidean space, the points of a ...
or
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by
:w \sim v if and only if v = \ ...
, geometrized with the
Fubini–Study metric
In mathematics, the Fubini–Study metric is a Kähler metric on projective Hilbert space, that is, on a complex projective space CP''n'' endowed with a Hermitian form. This metric was originally described in 1904 and 1905 by Guido Fubini and ...
, the theory of
quantum mechanics
Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, ...
and more generally
quantum field theory
In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles and ...
results. In these theories, the partition function is heavily exploited in the
path integral formulation
The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional i ...
, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued
simplex
In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
of probability theory, an extra factor of ''i'' appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.
Definition
Given a set of
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s
taking on values
, and some sort of
potential function or
Hamiltonian , the partition function is defined as
:
The function ''H'' is understood to be a real-valued function on the space of states
, while
is a real-valued free parameter (conventionally, the
inverse temperature). The sum over the
is understood to be a sum over all possible values that each of the random variables
may take. Thus, the sum is to be replaced by an
integral
In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
when the
are continuous, rather than discrete. Thus, one writes
:
for the case of continuously-varying
.
When ''H'' is an
observable
In physics, an observable is a physical quantity that can be measured. Examples include position and momentum. In systems governed by classical mechanics, it is a real-valued "function" on the set of all possible system states. In quantum phy ...
, such as a finite-dimensional
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** '' The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
or an infinite-dimensional
Hilbert space
In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
operator or element of a
C-star algebra
In mathematics, specifically in functional analysis, a C∗-algebra (pronounced "C-star") is a Banach algebra together with an involution satisfying the properties of the adjoint. A particular case is that of a complex algebra ''A'' of continuous ...
, it is common to express the summation as a
trace, so that
:
When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be
trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace ...
, that is, of a form such that the summation exists and is bounded.
The number of variables
need not be
countable
In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbers ...
, in which case the sums are to be replaced by
functional integrals. Although there are many notations for functional integrals, a common one would be
:
Such is the case for the
partition function in quantum field theory
In quantum field theory, partition functions are generating function, generating functionals for correlation function (quantum field theory), correlation functions, making them key objects of study in the path integral formulation, path integral ...
.
A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a
generating function
In mathematics, a generating function is a way of encoding an infinite sequence of numbers () by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary serie ...
for
correlation function
A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables r ...
s. This is discussed in greater detail below.
The parameter β
The role or meaning of the parameter
can be understood in a variety of different ways. In classical thermodynamics, it is an
inverse temperature. More generally, one would say that it is the variable that is
conjugate to some (arbitrary) function
of the random variables
. The word ''conjugate'' here is used in the sense of conjugate
generalized coordinates
In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state.,p. 39 ...
in
Lagrangian mechanics
In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle (also known as the principle of least action). It was introduced by the Italian-French mathematician and astronomer Joseph- ...
, thus, properly
is a
Lagrange multiplier
In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
. It is not uncommonly called the
generalized force Generalized forces find use in Lagrangian mechanics, where they play a role conjugate to generalized coordinates. They are obtained from the applied forces, Fi, i=1,..., n, acting on a system that has its configuration defined in terms of generali ...
. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the
expectation value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of
, even as many different
probability distributions can give rise to exactly this same (fixed) value.
For the general case, one considers a set of functions
that each depend on the random variables
. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of
Lagrange multiplier
In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
s. In the general case,
maximum entropy method
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
s illustrate the manner in which this is done.
Some specific examples are in order. In basic thermodynamics problems, when using the
canonical ensemble
In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the heat ...
, the use of just one parameter
reflects the fact that there is only one expectation value that must be held constant: the
free energy (due to
conservation of energy
In physics and chemistry, the law of conservation of energy states that the total energy of an isolated system remains constant; it is said to be ''conserved'' over time. This law, first proposed and tested by Émilie du Châtelet, means th ...
). For chemistry problems involving chemical reactions, the
grand canonical ensemble
In statistical mechanics, the grand canonical ensemble (also known as the macrocanonical ensemble) is the statistical ensemble that is used to represent the possible states of a mechanical system of particles that are in thermodynamic equilibriu ...
provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the
fugacity
In chemical thermodynamics, the fugacity of a real gas is an effective partial pressure which replaces the mechanical partial pressure in an accurate computation of the chemical equilibrium constant. It is equal to the pressure of an ideal gas whic ...
, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms).
For the general case, one has
:
with
a point in a space.
For a collection of observables
, one would write
: