The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two
protein structure
Protein structure is the molecular geometry, three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single ami ...
s with known amino acid correspondences (e.g. identical
amino acid sequence
Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthesi ...
s) but different
tertiary structure
Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may int ...
s. It is most commonly used to compare the results of
protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is differen ...
to the experimentally determined structure as measured by
X-ray crystallography
X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angle ...
,
protein NMR Nuclear magnetic resonance spectroscopy of proteins (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and ...
, or, increasingly,
cryoelectron microscopy
Cryogenic electron microscopy (cryo-EM) is a cryomicroscopy technique applied on samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of vitreous ice. An aqueous sample sol ...
. The metric was developed by Adam Zemla at
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
and originally implemented in the Local-Global Alignment (LGA) program.
It is intended as a more accurate measurement than the common
root-mean-square deviation (RMSD) metric - which is sensitive to
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
regions created, for example, by poor modeling of individual
loop
Loop or LOOP may refer to:
Brands and enterprises
* Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live
* Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets
* Loop Mobile, ...
regions in a structure that is otherwise reasonably accurate.
The conventional GDT_TS score is computed over the
alpha carbon
In the nomenclature of organic chemistry, a locant is a term to indicate the position of a functional group or substituent within a molecule.
Numeric locants
The International Union of Pure and Applied Chemistry (IUPAC) recommends the use ...
atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.
GDT_TS measurements are used as major assessment criteria in the production of results from the
Critical Assessment of Structure Prediction (CASP), a large-scale experiment in the structure prediction community dedicated to assessing current modeling techniques.
The metric was first introduced as an evaluation standard in the third iteration of the biannual experiment (CASP3) in 1998.
Various extensions to the original method have been developed; variations that accounts for the positions of the
side chains are known as global distance calculations (GDC).
Calculation
The GDT score is calculated as the largest set of
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
residues'
alpha carbon
In the nomenclature of organic chemistry, a locant is a term to indicate the position of a functional group or substituent within a molecule.
Numeric locants
The International Union of Pure and Applied Chemistry (IUPAC) recommends the use ...
atoms in the model structure falling within a defined distance cutoff of their position in the experimental structure, after iteratively superimposing the two structures. By the original design the GDT algorithm calculates 20 GDT scores, i.e. for each of 20 consecutive distance cutoffs (0.5
Å, 1.0 Å, 1.5 Å, ... 10.0 Å).
For structure similarity assessment it is intended to use the GDT scores from several cutoff distances, and scores generally increase with increasing cutoff. A plateau in this increase may indicate an extreme divergence between the experimental and predicted structures, such that no additional atoms are included in any cutoff of a reasonable distance. The conventional GDT_TS total score in
CASP
Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...
is the average result of cutoffs at 1, 2, 4, and 8 Å.
Variations and extensions
The original GDT_TS is calculated based on the superimpositions and GDT scores produced by the Local-Global Alignment (LGA) program.
A "high accuracy" version called GDT_HA is computed by selection of smaller cutoff distances (half the size of GDT_TS) and thus more heavily penalizes larger deviations from the reference structure. It was used in the high accuracy category of CASP7.
CASP8 defined a new "TR score", which is GDT_TS minus a penalty for residues clustered too close, meant to penalize steric clashes in the predicted structure, sometimes to game the cutoff measure of GDT.
The primary GDT assessment uses only the
alpha carbon
In the nomenclature of organic chemistry, a locant is a term to indicate the position of a functional group or substituent within a molecule.
Numeric locants
The International Union of Pure and Applied Chemistry (IUPAC) recommends the use ...
atoms. To apply superposition‐based scoring to the
amino acid residue
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer may ...
side chains, a GDT‐like score called "global distance calculation for sidechains" (GDC_sc) was designed and implemented within the LGA program in 2008.
Instead of comparing residue positions on the basis of alpha carbons, GDC_sc uses a predefined "characteristic atom" near the end of each residue for the evaluation of inter-residue distance deviations. An "all atoms" variant of the GDC score (GDC_all) is calculated using full-model information, and is one of the standard measures used by CASP's organizers and assessors to evaluate accuracy of predicted structural models.
GDT scores are generally computed with respect to a single reference structure. In some cases, structural models with lower GDT scores to a reference structure determined by
protein NMR Nuclear magnetic resonance spectroscopy of proteins (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and ...
are nevertheless better fits to the underlying experimental data.
Methods have been developed to estimate the uncertainty of GDT scores due to
protein flexibility and uncertainty in the reference structure.
See also
*
Root mean square deviation (bioinformatics)
In bioinformatics, the root-mean-square deviation of atomic positions, or simply root-mean-square deviation (RMSD), is the measure of the average distance between the atoms (usually the backbone atoms) of superimposed proteins. Note that RMSD calcu ...
— A different structure comparison measure.
*
TM-score In bioinformatics, the template modeling score or TM-score is a measure of similarity between two protein structures. The TM-score is intended as a more accurate measure of the global similarity of full-length protein structures than the often used ...
— A different structure comparison measure.
References
{{reflist, 30em
External links
CASP14 results- summary tables of the latest CASP experiment run in 2020, including example plots of GDT score as a function of cutoff distance
services and documentation on structure comparison and similarity measures.
Bioinformatics
Computational chemistry