computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...

, protein p''K''_a calculations are used to estimate the p''K''_a values of

amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...

s as they exist within

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...

s. These calculations complement the p''K''_a values reported for amino acids in their free state, and are used frequently within the fields of

molecular modeling Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials scien ...

structural bioinformatics Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromol ...

, and

Amino acid p''K''_a values

p''K''_a values of amino acid

side chain In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called the "main chain" or backbone. The side chain is a hydrocarbon branching element of a molecule that is attached to a ...

s play an important role in defining the pH-dependent characteristics of a protein. The pH-dependence of the activity displayed by

enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different molecule ...

s and the pH-dependence of

protein stability Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduci ...

, for example, are properties that are determined by the p''K''_a values of amino acid side chains. The p''K''_a values of an amino acid side chain in solution is typically inferred from the p''K''_a values of model compounds (compounds that are similar to the side chains of amino acids). See

Amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...

for the p''K''_a values of all amino acid side chains inferred in such a way. There are also numerous experimental studies that have yielded such values, for example by use of

NMR spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fiel ...

. The table below lists the model p''K''_a values that are often used in a protein p''K''_a calculation, and contains a third column based on protein studies.Hass and Mulder (2015) ''Annu. Rev. Biophys.'' vol 44 pp. 53–7
doi 10.1146/annurev-biophys-083012-130351

The effect of the protein environment

When a protein folds, the titratable amino acids in the protein are transferred from a solution-like environment to an environment determined by the 3-dimensional structure of the protein. For example, in an unfolded protein an aspartic acid typically is in an environment which exposes the titratable side chain to water. When the protein folds the aspartic acid could find itself buried deep in the protein interior with no exposure to solvent. Furthermore, in the folded protein the aspartic acid will be closer to other titratable groups in the protein and will also interact with permanent charges (e.g. ions) and dipoles in the protein. All of these effects alter the p''K''_a value of the amino acid side chain, and p''K''_a calculation methods generally calculate the effect of the protein environment on the model p''K''_a value of an amino acid side chain.Bashford (2004) ''Front Biosci.'' vol. 9 pp. 1082–9
doi 10.2741/1187
/ref>Gunner et al. (2006) ''Biochim. Biophys. Acta'' vol. 1757 (8) pp. 942–6
doi 10.1016/j.bbabio.2006.06.005
/ref>Ullmann et al. (2008) ''Photosynth. Res.'' 97 vol. 112 pp. 33–5
doi 10.1007/s11120-008-9306-1
/ref>Antosiewicz et al. (2011) ''Mol. BioSyst.'' vol. 7 pp. 2923–294
doi 10.1039/C1MB05170A
/ref> Typically the effects of the protein environment on the amino acid p''K''_a value are divided into pH-independent effects and pH-dependent effects. The pH-independent effects (desolvation, interactions with permanent charges and dipoles) are added to the model p''K''_a value to give the intrinsic p''K''_a value. The pH-dependent effects cannot be added in the same straightforward way and have to be accounted for using Boltzmann summation, Tanford–Roxby iterations or other methods. The interplay of the intrinsic p''K''_a values of a system with the electrostatic interaction energies between titratable groups can produce quite spectacular effects such as non-Henderson–Hasselbalch

titration curve Titrations are often recorded on graphs called titration curves, which generally contain the volume of the titrant as the independent variable and the pH of the solution as the dependent variable (because it changes depending on the composition o ...

s and even back-titration effects.A. Onufriev, D.A. Case and G. M. Ullmann (2001). ''Biochemistry'' 40: 3413–341
doi 10.1021/bi002740q
/ref> The image below shows a theoretical system consisting of three acidic residues. One group is displaying a back-titration event (blue group).

p''K''_a calculation methods

Several software packages and webserver are available for the calculation of protein p''K''_a values. See links below o
this table

Using the Poisson–Boltzmann equation

Some methods are based on solutions to the

Poisson–Boltzmann equation The Poisson–Boltzmann equation is a useful equation in many settings, whether it be to understand physiology, physiological interfaces, polymer science, electron interactions in a semiconductor, or more. It aims to describe the distribution of th ...

(PBE), often referred to as FDPB-based methods (''FDPB'' is for "

finite difference A finite difference is a mathematical expression of the form . If a finite difference is divided by , one gets a difference quotient. The approximation of derivatives by finite differences plays a central role in finite difference methods for t ...

Poisson–Boltzmann"). The PBE is a modification of

Poisson's equation Poisson's equation is an elliptic partial differential equation of broad utility in theoretical physics. For example, the solution to Poisson's equation is the potential field caused by a given electric charge or mass density distribution; with t ...

that incorporates a description of the effect of solvent ions on the electrostatic field around a molecule. Th
H++ web server
th
pKD webserverMCCEKarlsberg+PETIT
an

use the FDPB method to compute p''K''_a values of amino acid side chains. FDPB-based methods calculate the change in the p''K''_a value of an amino acid side chain when that side chain is moved from a hypothetical fully solvated state to its position in the protein. To perform such a calculation, one needs theoretical methods that can calculate the effect of the protein interior on a p''K''_a value, and knowledge of the pKa values of amino acid side chains in their fully solvated states.

Empirical methods

A set of empirical rules relating the protein structure to the p''K''_a values of ionizable residues have been developed b
Li, Robertson, and Jensen
These rules form the basis for th
web-accessible
program called PROPKA for rapid predictions of p''K''_a values. A recent empirical p''K''_a prediction program was released b
Tan KP ''et.al.''
with the online serve
DEPTH web server

Molecular dynamics (MD)-based methods

Molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of th ...

methods of calculating p''K''_a values make it possible to include full flexibility of the titrated molecule.Donnini et al. (2011) ''J. Chem. Theory Comp.'' vol 7 pp. 1962–7
doi 10.1021/ct200061r
Wallace et al. (2011) ''J. Chem. Theory Comp.'' vol 7 pp. 2617–262
doi 10.1021/ct200146j
Goh et al. (2012) ''J. Chem. Theory Comp.'' vol 8 pp. 36–4
doi 10.1021/ct2006314
Molecular dynamics based methods are typically much more computationally expensive, and not necessarily more accurate, ways to predict p''K''_a values than approaches based on the

. Limited conformational flexibility can also be realized within a continuum electrostatics approach, e.g., for considering multiple amino acid sidechain rotamers. In addition, current commonly used molecular force fields do not take electronic polarizability into account, which could be an important property in determining protonation energies.

Determining p''K''_a values from titration curves or free energy calculations

From the

titration Titration (also known as titrimetry and volumetric analysis) is a common laboratory method of quantitative chemical analysis to determine the concentration of an identified analyte (a substance to be analyzed). A reagent, termed the ''titrant ...

of protonatable group, one can read the so-called p''K''_a which is equal to the pH value where the group is half-protonated. The p''K''_a is equal to the Henderson–Hasselbalch p''K''_a (p''K'') if the titration curve follows the

Henderson–Hasselbalch equation In chemistry and biochemistry, the Henderson–Hasselbalch equation :\ce = \ceK_\ce + \log_ \left( \frac \right) relates the pH of a chemical solution of a weak acid to the numerical value of the acid dissociation constant, ''K''a, of acid a ...

.Ullmann (2003) ''J. Phys. Chem. B'' vol 107 pp. 1263–7
doi 10.1021/jp026454v
Most p''K''_a calculation methods silently assume that all titration curves are Henderson–Hasselbalch shaped, and p''K''_a values in p''K''_a calculation programs are therefore often determined in this way. In the general case of multiple interacting protonatable sites, the p''K''_a value is not thermodynamically meaningful. In contrast, the Henderson–Hasselbalch p''K''_a value can be computed from the protonation free energy via

\mathrmK_^(\mathrm) =
\mathrm - \frac

and is thus in turn related to the protonation free energy of the site via

\Delta G^(\mathrm) = \mathrm \ln10 \; ( \mathrm - \mathrmK_^ )

. The protonation free energy can in principle be computed from the protonation probability of the group (pH) which can be read from its titration curve

\Delta G^(\mathrm) = -\mathrm\ln\left \frac \right

Titration curves can be computed within a continuum electrostatics approach with formally exact but more elaborate analytical or Monte Carlo (MC) methods, or inexact but fast approximate methods. MC methods that have been used to compute titration curvesUllmann et al. (2012) ''J. Comput. Chem.'' vol 33 pp. 887–90
doi 10.1002/jcc.22919
/ref> are Metropolis MCMetropolis et al. (1953) ''J. Chem. Phys.'' vol 23 pp. 1087–109
doi 10.1063/1.1699114
/ref>Beroza et al. (1991) ''Proc. Natl. Acad. Sci. USA'' vol 88 pp. 5804–580
doi 10.1073/pnas.88.13.5804
/ref> or Wang–Landau MC.Wang and Landau (2001) Phys. Rev. E vol 64 pp 05610
doi 10.1103/PhysRevE.64.056101
/ref> Approximate methods that use a mean-field approach for computing titration curves are the Tanford–Roxby method and hybrids of this method that combine an exact statistical mechanics treatment within clusters of strongly interacting sites with a mean-field treatment of intercluster interactions.Tanford and Roxby (1972) ''Biochemistry'' vol 11 pp. 2192–219
doi 10.1021/bi00761a029
/ref>Bashford and Karplus (1991) ''J. Phys. Chem.'' vol 95 pp. 9556–6
doi 10.1021/j100176a093
/ref>Gilson (1993) ''Proteins'' vol 15 pp. 266–8
doi 10.1002/prot.340150305
/ref>Antosiewicz et al. (1994) ''J. Mol. Biol.'' vol 238 pp. 415–3
doi 10.1006/jmbi.1994.1301
/ref>Spassov and Bashford (1999) ''J. Comput. Chem.'' vol 20 pp. 1091–111
doi 10.1002/(SICI)1096-987X(199908)20:11<1091::AID-JCC1>3.0.CO;2-3
/ref> In practice, it can be difficult to obtain statistically converged and accurate protonation free energies from titration curves if is close to a value of 1 or 0. In this case, one can use various free energy calculation methods to obtain the protonation free energy such as biased Metropolis MC,Beroza et al. (1995) ''Biophys. J.'' vol 68 pp. 2233–225
doi 10.1016/S0006-3495(95)80406-6
/ref> free-energy perturbation,Zwanzig (1954) ''J. Chem. Phys.'' vol 22 pp. 1420–142
doi 10.1063/1.1740409
/ref>Ullmann et al. 2011 ''J. Phys. Chem. B.'' vol 68 pp. 507–52
doi 10.1021/jp1093838
/ref>

thermodynamic integration Thermodynamic integration is a method used to compare the difference in free energy between two given states (e.g., A and B) whose potential energies U_A and U_B have different dependences on the spatial coordinates. Because the free energy of a ...

,Kirkwood (1935) ''J. Chem. Phys.'' vol 2 pp. 300–31
doi 10.1063/1.1749657
/ref>Bruckner and Boresch (2011) ''J. Comput. Chem.'' vol 32 pp. 1303–131
doi 10.1002/jcc.21713
/ref>Bruckner and Boresch (2011) ''J. Comput. Chem.'' vol 32 pp. 1320–133
doi 10.1002/jcc.21712
/ref> the non-equilibrium work methodJarzynski (1997) ''Phys. Rev. E'' vol pp. 2233–225
doi 10.1103/PhysRevE.56.5018
/ref> or the

Bennett acceptance ratio The Bennett acceptance ratio method (BAR) is an algorithm for estimating the difference in free energy between two systems (usually the systems will be simulated on the computer). It was suggested by Charles H. Bennett in 1976. Preliminaries Tak ...

method.Bennett (1976) ''J. Comput. Phys.'' vol 22 pp. 245–26
doi 10.1016/0021-9991(76)90078-4
/ref> Note that the p''K'' value does in general depend on the pH value.Bombarda et al. (2010) ''J. Phys. Chem. B'' vol 114 pp. 1994–200
doi 10.1021/jp908926w
This dependence is small for weakly interacting groups like well solvated amino acid sidechains on the protein surface, but can be large for strongly interacting groups like those buried in enzyme active sites or integral membrane proteins.Bashford and Gerwert (1992) ''J. Mol. Biol.'' vol 224 pp. 473–8
doi 10.1016/0022-2836(92)91009-E
/ref>Spassov et al. (2001) ''J. Mol. Biol.'' vol 312 pp. 203–1
doi 10.1006/jmbi.2001.4902
/ref>Ullmann et al. (2011) ''J. Phys. Chem. B'' vol 115 pp. 10346–5
doi 10.1021/jp204644h
/ref>

References

{{Reflist

Software for protein p''K''_a calculations

AccelrysPKA
Accelrys CHARMm based p''K''_a calculation
H++
Poisson–Boltzmann based p''K''_a calculations
MCCE2
Multi-Conformation Continuum Electrostatics (Version 2)
Karlsberg+
p''K''_a computation with multiple pH adapted conformations
PETIT
Proton and Electron TITration

Generalized Monte Carlo Titration
DEPTH web server
Empirical calculation of p''K''_a values using Residue Depth as a major feature Protein methods Equilibrium chemistry

Amino acid p''K''a values