HOME

TheInfoList



OR:

Structural alignment attempts to establish homology between two or more
polymer A polymer (; Greek '' poly-'', "many" + '' -mer'', "part") is a substance or material consisting of very large molecules called macromolecules, composed of many repeating subunits. Due to their broad spectrum of properties, both synthetic a ...
structures based on their shape and three-dimensional conformation. This process is usually applied to
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
s but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no ''a priori'' knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Al ...
techniques. Structural alignment can therefore be used to imply
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
ary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of
convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...
by which multiple unrelated
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
sequences converge on a common
tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...
. Structural alignments can compare two sequences or multiple sequences. Because these alignments rely on information about all the query sequences' three-dimensional conformations, the method can only be used on sequences where these structures are known. These are usually found by
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
or
NMR spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fi ...
. It is possible to perform a structural alignment on structures produced by structure prediction methods. Indeed, evaluating such predictions often requires a structural alignment between the model and the true known structure to assess the model's quality. Structural alignments are especially useful in analyzing data from
structural genomics Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling ...
and
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. I ...
efforts, and they can be used as comparison points to evaluate alignments produced by purely sequence-based
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
methods. The outputs of a structural alignment are a superposition of the atomic coordinate sets and a minimal
root mean square In mathematics and its applications, the root mean square of a set of numbers x_i (abbreviated as RMS, or rms and denoted in formulas as either x_\mathrm or \mathrm_x) is defined as the square root of the mean square (the arithmetic mean of the ...
deviation (
RMSD The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents ...
) between the structures. The RMSD of two aligned structures indicates their divergence from one another. Structural alignment can be complicated by the existence of multiple
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of ...
s within one or more of the input structures, because changes in relative orientation of the domains between two structures to be aligned can artificially inflate the RMSD.


Data produced by structural alignment

The minimum information produced from a successful structural alignment is a set of residues that are considered equivalent between the structures. This set of equivalences is then typically used to superpose the three-dimensional coordinates for each input structure. (Note that one input element may be fixed as a reference and therefore its superposed coordinates do not change.) The fitted structures can be used to calculate mutual RMSD values, as well as other more sophisticated measures of structural similarity such as the global distance test (GDT, the metric used in
CASP Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...
). The structural alignment also implies a corresponding one-dimensional
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Al ...
from which a sequence identity, or the percentage of residues that are identical between the input structures, can be calculated as a measure of how closely the two sequences are related.


Types of comparisons

Because protein structures are composed of
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s whose
side chain In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called the "main chain" or backbone. The side chain is a hydrocarbon branching element of a molecule that is attached to a ...
s are linked by a common protein backbone, a number of different possible subsets of the atoms that make up a protein macromolecule can be used in producing a structural alignment and calculating the corresponding RMSD values. When aligning structures with very different sequences, the side chain atoms generally are not taken into account because their identities differ between many aligned residues. For this reason it is common for structural alignment methods to use by default only the backbone atoms included in the
peptide bond In organic chemistry, a peptide bond is an amide type of covalent chemical bond linking two consecutive alpha-amino acids from C1 (carbon number one) of one alpha-amino acid and N2 (nitrogen number two) of another, along a peptide or protein cha ...
. For simplicity and efficiency, often only the
alpha carbon In the nomenclature of organic chemistry, a locant is a term to indicate the position of a functional group or substituent within a molecule. Numeric locants The International Union of Pure and Applied Chemistry (IUPAC) recommends the us ...
positions are considered, since the peptide bond has a minimally variant planar conformation. Only when the structures to be aligned are highly similar or even identical is it meaningful to align side-chain atom positions, in which case the RMSD reflects not only the conformation of the protein backbone but also the rotameric states of the side chains. Other comparison criteria that reduce noise and bolster positive matches include
secondary structure Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary struct ...
assignment,
native contact In protein folding, a native contact is a contact between the side chains of two amino acids that are not neighboring in the amino acid sequence (i.e., they are more than four residues apart in the primary sequence in order to remove trivial i to ...
maps or residue interaction patterns, measures of side chain packing, and measures of
hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
retention.


Structural superposition

The most basic possible comparison between protein structures makes no attempt to align the input structures and requires a precalculated alignment as input to determine which of the residues in the sequence are intended to be considered in the RMSD calculation. Structural superposition is commonly used to compare multiple conformations of the same protein (in which case no alignment is necessary, since the sequences are the same) and to evaluate the quality of alignments produced using only sequence information between two or more sequences whose structures are known. This method traditionally uses a simple least-squares fitting algorithm, in which the optimal rotations and translations are found by minimizing the sum of the squared distances among all structures in the superposition. More recently, maximum likelihood and Bayesian methods have greatly increased the accuracy of the estimated rotations, translations, and covariance matrices for the superposition. Algorithms based on multidimensional rotations and modified
quaternion In mathematics, the quaternion number system extends the complex numbers. Quaternions were first described by the Irish mathematician William Rowan Hamilton in 1843 and applied to mechanics in three-dimensional space. Hamilton defined a quater ...
s have been developed to identify topological relationships between protein structures without the need for a predetermined alignment. Such algorithms have successfully identified canonical folds such as the four-helix bundle. Th
SuperPose
method is sufficiently extensible to correct for relative domain rotations and other structural pitfalls.


Evaluating similarity

Often the purpose of seeking a structural superposition is not so much the superposition itself, but an evaluation of the similarity of two structures or a confidence in a remote alignment. A subtle but important distinction from maximal structural superposition is the conversion of an alignment to a meaningful similarity score. Most methods output some sort of "score" indicating the quality of the superposition. However, what one actually wants is ''not'' merely an ''estimated'' "Z-score" or an ''estimated'' E-value of seeing the observed superposition by chance but instead one desires that the ''estimated'' E-value is tightly correlation to the true E-value. Critically, even if a method's estimated E-value is precisely correct ''on average'', if it lacks a low standard deviation on its estimated value generation process, then the rank ordering of the relative similarities of a query protein to a comparison set will rarely agree with the "true" ordering. Different methods will superimpose different numbers of residues because they use different quality assurances and different definitions of "overlap"; some only include residues meeting multiple local and global superposition criteria and others are more greedy, flexible, and promiscuous. A greater number of atoms superposed can mean more similarity but it may not always produce the best E-value quantifying the unlikeliness of the superposition and thus not as useful for assessing similarity, especially in remote homologs.


Algorithmic complexity


Optimal solution

The optimal " threading" of a protein sequence onto a known structure and the production of an optimal multiple sequence alignment have been shown to be
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...
. However, this does not imply that the structural alignment problem is NP-complete. Strictly speaking, an optimal solution to the protein structure alignment problem is only known for certain protein structure similarity measures, such as the measures used in protein structure prediction experiments, GDT_TS and MaxSub. These measures can be rigorously optimized using an algorithm capable of maximizing the number of atoms in two proteins that can be superimposed under a predefined distance cutoff. Unfortunately, the algorithm for optimal solution is not practical, since its running time depends not only on the lengths but also on the intrinsic geometry of input proteins.


Approximate solution

Approximate
polynomial-time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
algorithms for structural alignment that produce a family of "optimal" solutions within an approximation parameter for a given scoring function have been developed. Although these algorithms theoretically classify the approximate protein structure alignment problem as "tractable", they are still computationally too expensive for large-scale protein structure analysis. As a consequence, practical algorithms that converge to the global solutions of the alignment, given a scoring function, do not exist. Most algorithms are, therefore, heuristic, but algorithms that guarantee the convergence to at least local maximizers of the scoring functions, and are practical, have been developed.


Representation of structures

Protein structures have to be represented in some coordinate-independent space to make them comparable. This is typically achieved by constructing a sequence-to-sequence matrix or series of matrices that encompass comparative metrics: rather than absolute distances relative to a fixed coordinate space. An intuitive representation is the
distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...
, which is a two-dimensional
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
containing all pairwise distances between some subset of the atoms in each structure (such as the
alpha carbon In the nomenclature of organic chemistry, a locant is a term to indicate the position of a functional group or substituent within a molecule. Numeric locants The International Union of Pure and Applied Chemistry (IUPAC) recommends the us ...
s). The matrix increases in dimensionality as the number of structures to be simultaneously aligned increases. Reducing the protein to a coarse metric such as
secondary structure Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary struct ...
elements (SSEs) or structural fragments can also produce sensible alignments, despite the loss of information from discarding distances, as
noise Noise is unwanted sound considered unpleasant, loud or disruptive to hearing. From a physics standpoint, there is no distinction between noise and desired sound, as both are vibrations through a medium, such as air or water. The difference aris ...
is also discarded.Mount DM. (2004). ''Bioinformatics: Sequence and Genome Analysis'' 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY Choosing a representation to facilitate computation is critical to developing an efficient alignment mechanism.


Methods

Structural alignment techniques have been used in comparing individual structures or sets of structures as well as in the production of "all-to-all" comparison databases that measure the divergence between every pair of structures present in the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cr ...
(PDB). Such databases are used to classify proteins by their fold.


DALI

A common and popular structural alignment method is the DALI, or Distance-matrix ALIgnment method, which breaks the input structures into hexapeptide fragments and calculates a distance matrix by evaluating the contact patterns between successive fragments.
Secondary structure Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary struct ...
features that involve residues that are contiguous in sequence appear on the matrix's
main diagonal In linear algebra, the main diagonal (sometimes principal diagonal, primary diagonal, leading diagonal, major diagonal, or good diagonal) of a matrix A is the list of entries a_ where i = j. All off-diagonal elements are zero in a diagonal matri ...
; other diagonals in the matrix reflect spatial contacts between residues that are not near each other in the sequence. When these diagonals are parallel to the main diagonal, the features they represent are parallel; when they are perpendicular, their features are antiparallel. This representation is memory-intensive because the features in the square matrix are symmetrical (and thus redundant) about the main diagonal. When two proteins' distance matrices share the same or similar features in approximately the same positions, they can be said to have similar folds with similar-length loops connecting their secondary structure elements. DALI's actual alignment process requires a similarity search after the two proteins' distance matrices are built; this is normally conducted via a series of overlapping submatrices of size 6x6. Submatrix matches are then reassembled into a final alignment via a standard score-maximization algorithm — the original version of DALI used a
Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
simulation to maximize a structural similarity score that is a function of the distances between putative corresponding atoms. In particular, more distant atoms within corresponding features are exponentially downweighted to reduce the effects of noise introduced by loop mobility, helix torsions, and other minor structural variations. Because DALI relies on an all-to-all distance matrix, it can account for the possibility that structurally aligned features might appear in different orders within the two sequences being compared. The DALI method has also been used to construct a database known as FSSP (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins) in which all known protein structures are aligned with each other to determine their structural neighbors and fold classification. There is a
searchable database
based on DALI as well as

an
web search
based on a standalone version known as DaliLite.


Combinatorial extension

The combinatorial extension (CE) method is similar to DALI in that it too breaks each structure in the query set into a series of fragments that it then attempts to reassemble into a complete alignment. A series of pairwise combinations of fragments called aligned fragment pairs, or AFPs, are used to define a similarity matrix through which an optimal path is generated to identify the final alignment. Only AFPs that meet given criteria for local similarity are included in the matrix as a means of reducing the necessary search space and thereby increasing efficiency. A number of similarity metrics are possible; the original definition of the CE method included only structural superpositions and inter-residue distances but has since been expanded to include local environmental properties such as secondary structure, solvent exposure, hydrogen-bonding patterns, and
dihedral angle A dihedral angle is the angle between two intersecting planes or half-planes. In chemistry, it is the clockwise angle between half-planes through two sets of three atoms, having two atoms in common. In solid geometry, it is defined as the un ...
s. An alignment path is calculated as the optimal path through the similarity matrix by linearly progressing through the sequences and extending the alignment with the next possible high-scoring AFP pair. The initial AFP pair that nucleates the alignment can occur at any point in the sequence matrix. Extensions then proceed with the next AFP that meets given distance criteria restricting the alignment to low gap sizes. The size of each AFP and the maximum gap size are required input parameters but are usually set to empirically determined values of 8 and 30 respectively. Like DALI and SSAP, CE has been used to construct an all-to-all fold classificatio
database
from the known protein structures in the PDB. The RCSB PDB has recently released an updated version of CE, Mammoth, and FATCAT as part of th
RCSB PDB Protein Comparison Tool
It provides a new variation of CE that can detect circular permutations in protein structures.


Mammoth

MAMMOTH approaches the alignment problem from a different objective than almost all other methods. Rather than trying to find an alignment that maximally superimposes the largest number of residues, it seeks the subset of the structural alignment least likely to occur by chance. To do this it marks a local motif alignment with flags to indicate which residues simultaneously satisfy more stringent criteria: 1) Local structure overlap 2) regular secondary structure 3) 3D-superposition 4) same ordering in primary sequence. It converts the statistics of the number of residues with high-confidence matches and the size of the protein to compute an Expectation value for the outcome by chance. It excels at matching remote homologs, particularly structures generated by ab initio structure prediction to structure families such as SCOP, because it emphasizes extracting a statistically reliable sub alignment and not in achieving the maximal sequence alignment or maximal 3D superposition. For every overlapping window of 7 consecutive residues it computes the set of displacement direction unit vectors between adjacent C-alpha residues. All-against-all local motifs are compared based on the URMS score. These values becomes the pair alignment score entries for dynamic programming which produces a seed pair-wise residue alignment. The second phase uses a modified MaxSub algorithm: a single 7 reside aligned pair in each proteins is used to orient the two full length protein structures to maximally superimpose these just these 7 C-alpha, then in this orientation it scans for any additional aligned pairs that are close in 3D. It re-orients the structures to superimpose this expanded set and iterates until no more pairs coincide in 3D. This process is restarted for every 7 residue window in the seed alignment. The output is the maximal number of atoms found from any of these initial seeds. This statistic is converted to a calibrated E-value for the similarity of the proteins. Mammoth makes no attempt to re-iterate the initial alignment or extend the high quality sub-subset. Therefore, the seed alignment it displays can't be fairly compared to DALI or TM align as it was formed simply as a heuristic to prune the search space. (It can be used if one wants an alignment based solely on local structure-motif similarity agnostic of long range rigid body atomic alignment.) Because of that same parsimony, it is well over ten times faster than DALI, CE and TM-align. It is often used in conjunction with these slower tools to pre-screen large data bases to extract the just the best E-value related structures for more exhaustive superposition or expensive calculations. It has been particularly successful at analyzing "decoy" structures from ab initio structure prediction. These decoys are notorious for getting local fragment motif structure correct, and forming some kernels of correct 3D tertiary structure but getting the full length tertiary structure wrong. In this twilight remote homology regime, Mammoth's e-values for the CASP protein structure prediction evaluation have been show to be significantly more correlated with human ranking than SSAP or DALI. Mammoths ability to extract the multi-criteria partial overlaps with proteins of known structure and rank these with proper E-values, combined with its speed facilitates scanning vast numbers of decoy models against the PDB data base for identifying the most likely correct decoys based on their remote homology to known proteins.


SSAP

The SSAP (Sequential Structure Alignment Program) method uses double
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
to produce a structural alignment based on atom-to-atom vectors in structure space. Instead of the alpha carbons typically used in structural alignment, SSAP constructs its vectors from the beta carbons for all residues except glycine, a method which thus takes into account the rotameric state of each residue as well as its location along the backbone. SSAP works by first constructing a series of inter-residue distance vectors between each residue and its nearest non-contiguous neighbors on each protein. A series of matrices are then constructed containing the vector differences between neighbors for each pair of residues for which vectors were constructed. Dynamic programming applied to each resulting matrix determines a series of optimal local alignments which are then summed into a "summary" matrix to which dynamic programming is applied again to determine the overall structural alignment. SSAP originally produced only pairwise alignments but has since been extended to multiple alignments as well. It has been applied in an all-to-all fashion to produce a hierarchical fold classification scheme known as
CATH The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and coll ...
(Class, Architecture, Topology, Homology), which has been used to construct th
CATH Protein Structure Classification
database.


Recent developments

Improvements in structural alignment methods constitute an active area of research, and new or modified methods are often proposed that are claimed to offer advantages over the older and more widely distributed techniques. A recent example, TM-align, uses a novel method for weighting its distance matrix, to which standard
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
is then applied. The weighting is proposed to accelerate the convergence of dynamic programming and correct for effects arising from alignment lengths. In a benchmarking study, TM-align has been reported to improve in both speed and accuracy over DALI and CE. Other promising methods of structural alignment are local structural alignment methods. These provide comparison of pre-selected parts of proteins (e.g. binding sites, user-defined structural motifs) against binding sites or whole-protein structural databases. The MultiBind and MAPPIS servers allow the identification of common spatial arrangements of physicochemical properties such as H-bond donor, acceptor, aliphatic, aromatic or hydrophobic in a set of user provided protein binding sites defined by interactions with small molecules (MultiBind) or in a set of user-provided protein–protein interfaces (MAPPIS). Others provide comparison of entire protein structures against a number of user submitted structures or against a large database of protein structures in reasonable time ( ProBiS). Unlike global alignment approaches, local structural alignment approaches are suited to detection of locally conserved patterns of functional groups, which often appear in binding sites and have significant involvement in ligand binding. As an example, comparing G-Losa, a local structure alignment tool, with TM-align, a global structure alignment based method. While G-Losa predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, the overall success rate of TM-align is better. However, as algorithmic improvements and computer performance have erased purely technical deficiencies in older approaches, it has become clear that there is no one universal criterion for the 'optimal' structural alignment. TM-align, for instance, is particularly robust in quantifying comparisons between sets of proteins with great disparities in sequence lengths, but it only indirectly captures hydrogen bonding or secondary structure order conservation which might be better metrics for alignment of evolutionarily related proteins. Thus recent developments have focused on optimizing particular attributes such as speed, quantification of scores, correlation to alternative gold standards, or tolerance of imperfection in structural data or ab initio structural models. An alternative methodology that is gaining popularity is to use the ''consensus'' of various methods to ascertain proteins structural similarities.


RNA structural alignment

Structural alignment techniques have traditionally been applied exclusively to proteins, as the primary biological
macromolecule A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
s that assume characteristic three-dimensional structures. However, large RNA molecules also form characteristic tertiary structures, which are mediated primarily by
hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
s formed between
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
s as well as base stacking. Functionally similar noncoding RNA molecules can be especially difficult to extract from
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data because structure is more strongly conserved than sequence in RNA as well as in proteins, and the more limited alphabet of RNA decreases the
information content In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative wa ...
of any given
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecu ...
at any given position. However, because of the increasing interest in RNA structures and because of the growth of the number of experimentally determined 3D RNA structures, few RNA structure similarity methods have been developed recently. One of those methods is, e.g., SETTER which decomposes each RNA structure into smaller parts called general secondary structure units (GSSUs). GSSUs are subsequently aligned and these partial alignments are merged into the final RNA structure alignment and scored. The method has been implemented into th
SETTER webserver
A recent method for pairwise structural alignment of RNA sequences with low sequence identity has been published and implemented in the progra
FOLDALIGN
However, this method is not truly analogous to protein structural alignment techniques because it computationally predicts the structures of the RNA input sequences rather than requiring experimentally determined structures as input. Although computational prediction of the
protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...
process has not been particularly successful to date, RNA structures without
pseudoknot __NOTOC__ A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the Turnip yellow m ...
s can often be sensibly predicted using free energy-based scoring methods that account for base pairing and stacking.


Software

Choosing a software tool for structural alignment can be a challenge due to the large variety of available packages that differ significantly in methodology and reliability. A partial solution to this problem was presented in and made publicly accessible through the ProCKSI webserver. A more complete list of currently available and freely distributed structural alignment software can be found in
structural alignment software This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment. Structural comparison and alignment Key map: * Class: :* C� ...
. Properties of some structural alignment servers and software packages are summarized and tested with examples a
Structural Alignment Tools in Proteopedia.Org


See also

*
Multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
*
List of sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database searc ...
* Structural Classification of Proteins * SuperPose *
Protein superfamily A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similar ...


References


Further reading

* Bourne PE, Shindyalov IN. (2003): ''Structure Comparison and Alignment''. In: Bourne, P.E., Weissig, H. (Eds): ''Structural Bioinformatics''. Hoboken NJ: Wiley-Liss. * Yuan X, Bystroff C. (2004) "Non-sequential Structure-based Alignments Reveal Topology-independent Core Packing Arrangements in Proteins", ''Bioinformatics''. Nov 5, 2004 * * * {{DEFAULTSORT:Structural Alignment Protein methods