HOME

TheInfoList



OR:

In the fields of
computational chemistry Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into computer programs, to calculate the structures and properties of mo ...
and
molecular modelling Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials sci ...
, scoring functions are
mathematical functions In mathematics, a function from a set to a set assigns to each element of exactly one element of .; the words map, mapping, transformation, correspondence, and operator are often used synonymously. The set is called the domain of the functi ...
used to approximately predict the binding affinity between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound such as a
drug A drug is any chemical substance that causes a change in an organism's physiology or psychology when consumed. Drugs are typically distinguished from food and substances that provide nutritional support. Consumption of drugs can be via inhala ...
and the second is the drug's biological target such as a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
receptor Receptor may refer to: *Sensory receptor, in physiology, any structure which, on receiving environmental stimuli, produces an informative nerve impulse *Receptor (biochemistry), in biochemistry, a protein molecule that receives and responds to a n ...
. Scoring functions have also been developed to predict the strength of
intermolecular An intermolecular force (IMF) (or secondary force) is the force that mediates interaction between molecules, including the electromagnetic forces of attraction or repulsion which act between atoms and other types of neighbouring particles, e.g. a ...
interactions between two proteins or between protein and DNA.


Utility

Scoring functions are widely used in
drug discovery In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or b ...
and other
molecular modelling Molecular modelling encompasses all methods, theoretical and computational, used to model or mimic the behaviour of molecules. The methods are used in the fields of computational chemistry, drug design, computational biology and materials sci ...
applications. These include: *
Virtual screening Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme. Virt ...
of
small molecule Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs ...
databases of candidate ligands to identify novel small molecules that bind to a protein target of interest and therefore are useful starting points for
drug discovery In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered. Historically, drugs were discovered by identifying the active ingredient from traditional remedies or b ...
* De novo design (design "from scratch") of novel small molecules that bind to a protein target * Lead optimization of screening hits to optimize their affinity and selectivity A potentially more reliable but much more computationally demanding alternative to scoring functions are
free energy perturbation Free energy perturbation (FEP) is a method based on statistical mechanics that is used in computational chemistry for computing free energy differences from molecular dynamics or Metropolis Monte Carlo simulations. The FEP method was introduce ...
calculations.


Prerequisites

Scoring functions are normally parameterized (or trained) against a data set consisting of experimentally determined binding affinities between molecular species similar to the species that one wishes to predict. For currently used methods aiming to predict affinities of
ligands In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's electr ...
for proteins the following must first be known or predicted: * Protein
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
– arrangement of the protein atoms in three-dimensional space. Protein structures may be determined by experimental techniques such as
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
or solution phase
NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with ...
methods or predicted by homology modelling. * Ligand active conformation – three-dimensional shape of the ligand when bound to the protein * Binding-mode – orientation of the two binding partners relative to each other in the complex The above information yields the three-dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilised within the docking run.


Classes

There are four general classes of scoring functions: * Force field – affinities are estimated by summing the strength of intermolecular van der Waals and
electrostatic Electrostatics is a branch of physics that studies electric charges at rest ( static electricity). Since classical times, it has been known that some materials, such as amber, attract lightweight particles after rubbing. The Greek word for ...
interactions between all atoms of the two molecules in the complex using a force field. The intramolecular energies (also referred to as
strain energy In physics, the elastic potential energy gained by a wire during elongation with a tensile (stretching) force is called strain energy. For linearly elastic materials, strain energy is: : U = \frac 1 2 V \sigma \epsilon = \frac 1 2 V E \epsilon ...
) of the two binding partners are also frequently included. Finally since the binding normally takes place in the presence of water, the desolvation energies of the ligand and of the protein are sometimes taken into account using implicit solvation methods such as GBSA or PBSA. * Empirical – based on counting the number of various types of interactions between the two binding partners. Counting may be based on the number of ligand and receptor atoms in contact with each other or by calculating the change in
solvent accessible surface area The accessible surface area (ASA) or solvent-accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent. Measurement of ASA is usually described in units of square angstroms (a standard unit of measurement ...
(ΔSASA) in the complex compared to the uncomplexed ligand and protein. The coefficients of the scoring function are usually fit using
multiple linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
methods. These interactions terms of the function may include for example: **
hydrophobic In chemistry, hydrophobicity is the physical property of a molecule that is seemingly repelled from a mass of water (known as a hydrophobe). In contrast, hydrophiles are attracted to water. Hydrophobic molecules tend to be nonpolar and, ...
— hydrophobic contacts (favorable), ** hydrophobic —
hydrophilic A hydrophile is a molecule or other molecular entity that is attracted to water molecules and tends to be dissolved by water.Liddell, H.G. & Scott, R. (1940). ''A Greek-English Lexicon'' Oxford: Clarendon Press. In contrast, hydrophobes are n ...
contacts (unfavorable) (Accounts for unmet hydrogen bonds, which are an important enthalpic contribution to binding. One lost hydrogen bond can account for 1–2 orders of magnitude in binding affinity.), ** number of
hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
s (favorable contribution to affinity, especially if shielded from solvent, if solvent exposed no contribution), ** number of rotatable bonds immobilized in complex formation (unfavorable
conformational entropy In chemical thermodynamics, conformational entropy is the entropy associated with the number of conformations of a molecule. The concept is most commonly applied to biological macromolecules such as proteins and RNA, but also be used for polysa ...
contribution). * Knowledge-based – based on statistical observations of intermolecular close contacts in large 3D databases (such as the
Cambridge Structural Database The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of molecules generally containing at least carbon and hydrogen, comprising a wide range of organic, metal- ...
or
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cr ...
) which are used to derive ''statistical'' " potentials of mean force". This method is founded on the assumption that close intermolecular interactions between certain types of atoms or functional groups that occur more frequently than one would expect by a random distribution are likely to be energetically favorable and therefore contribute favorably to binding affinity. * Machine-learning – Unlike these classical scoring functions, machine-learning scoring functions are characterized by not assuming a predetermined functional form for the relationship between binding affinity and the structural features describing the protein-ligand complex. In this way, the functional form is inferred directly from the data. Machine-learning scoring functions have consistently been found to outperform classical scoring functions at binding affinity prediction of diverse protein-ligand complexes. This has also been the case for target-specific complexes, although the advantage is target-dependent and mainly depends on the volume of relevant data available. When appropriate care is taken, machine-learning scoring functions tend to strongly outperform classical scoring functions at the related problem of structure-based virtual screening. Furthermore, if data specific for the target is available, this performance gap widens These reviews provide a broader overview on machine-learning scoring functions for structure-based drug design. The choice of decoys for a given target is one of the most important factors for training and testing any scoring function. The first three types, force-field, empirical and knowledge-based, are commonly referred to as classical scoring functions and are characterized by assuming their contributions to binding are linearly combined. Due to this constraint, classical scoring functions are unable to take advantage of large amounts of training data.


Refinement

Since different scoring functions are relatively co-linear, consensus scoring functions may not improve accuracy significantly. This claim went somewhat against the prevailing view in the field, since previous studies had suggested that consensus scoring was beneficial. A perfect scoring function would be able to predict the binding free energy between the ligand and its target. But in reality both the computational methods and the computational resources put restraints to this goal. So most often methods are selected that minimize the number of false positive and false negative ligands. In cases where an experimental training set of data of binding constants and structures are available a simple method has been developed to refine the scoring function used in molecular docking.


References

{{Reflist, 2 Docking Computational chemistry Cheminformatics Protein structure Bioinformatics