
In
protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its Protein secondary structure, secondary and Protein tertiary structure, tertiary structure ...
, statistical potentials or knowledge-based potentials are
scoring functions SCORE may refer to:
*SCORE (software), a music scorewriter program
*SCORE (television), a weekend sports service of the defunct Financial News Network
*SCORE! Educational Centers
*SCORE International, an offroad racing organization
*Sarawak Corridor ...
derived from an analysis of known
protein structures in the
Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
(PDB).
The original method to obtain such potentials is the ''quasi-chemical approximation'', due to Miyazawa and Jernigan. It was later followed by the ''potential of mean force'' (statistical PMF ), developed by Sippl.
Although the obtained scores are often considered as approximations of the
free energy—thus referred to as ''pseudo-energies''—this physical interpretation is incorrect.
Nonetheless, they are applied with success in many cases, because they frequently correlate with actual
Gibbs free energy
In thermodynamics, the Gibbs free energy (or Gibbs energy as the recommended name; symbol is a thermodynamic potential that can be used to calculate the maximum amount of Work (thermodynamics), work, other than Work (thermodynamics)#Pressure–v ...
differences.
Overview
Possible features to which a pseudo-energy can be assigned include:
*
interatomic distances,
*
torsion angles,
*
solvent exposure,
* or
hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
geometry.
The classic application is, however, based on pairwise
amino acid contacts or distances, thus producing statistical
interatomic potential
Interatomic potentials are mathematical functions to calculate the potential energy of a system of atoms with given positions in space.M. P. Allen and D. J. Tildesley. Computer Simulation of Liquids. Oxford University Press, Oxford, England, 198 ...
s. For pairwise amino acid contacts, a statistical potential is formulated as an
interaction matrix that assigns a weight or
energy value to each possible pair of
standard amino acids. The energy of a particular structural model is then the combined energy of all pairwise contacts (defined as two amino acids within a certain distance of each other) in the structure. The energies are determined using statistics on amino acid contacts in a database of known protein structures (obtained from the
PDB).
History
Initial development
Many textbooks present the statistical PMFs as proposed by Sippl
as a simple consequence of the
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability tha ...
, as applied to pairwise distances between amino acids. This is incorrect, but a useful start to introduce the construction of the potential in practice.
The Boltzmann distribution applied to a specific pair of amino acids,
is given by:
:
where
is the distance,
is the
Boltzmann constant
The Boltzmann constant ( or ) is the proportionality factor that relates the average relative thermal energy of particles in a ideal gas, gas with the thermodynamic temperature of the gas. It occurs in the definitions of the kelvin (K) and the ...
,
is
the temperature and
is the
partition function, with
:
The quantity
is the free energy assigned to the pairwise system.
Simple rearrangement results in the ''inverse Boltzmann formula'',
which expresses the free energy
as a function of
:
:
To construct a PMF, one then introduces a so-called ''reference state'' with a corresponding distribution
and partition function
, and calculates the following free energy difference:
:
The reference state typically results from a hypothetical
system in which the specific interactions between the amino acids
are absent. The second term involving
and
can be ignored, as it is a constant.
In practice,
is estimated from the database of known protein
structures, while
typically results from calculations
or simulations. For example,
could be the conditional probability
of finding the
atoms of a valine and a serine at a given
distance
from each other, giving rise to the free energy difference
. The total free energy difference of a protein,
, is then claimed to be the sum
of all the pairwise free energies:
where the sum runs over all amino acid pairs
(with