Protein secondary structure is the three dimensional
form
Form is the shape, visual appearance, or configuration of an object. In a wider sense, the form is the way something happens.
Form also refers to:
*Form (document), a document (printed or electronic) with spaces in which to write or enter data
* ...
of ''local segments'' of
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s. The two most common
secondary structural elements are
alpha helices and
beta sheets, though
beta turns and
omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein
folds into its three dimensional
tertiary structure.
Secondary structure is formally defined by the pattern of
hydrogen bonds between the
amino hydrogen and
carboxyl oxygen atoms in the peptide
backbone. Secondary structure may alternatively be defined based on the regular pattern of backbone
dihedral angles in a particular region of the
Ramachandran plot regardless of whether it has the correct hydrogen bonds.
The concept of secondary structure was first introduced by
Kaj Ulrik Linderstrøm-Lang at
Stanford in 1952.
Other types of
biopolymers such as
nucleic acids also possess characteristic
secondary structures
Secondary may refer to: Science and nature
* Secondary emission, of particles
** Secondary electrons, electrons generated as ionization products
* The secondary winding, or the electrical or electronic circuit connected to the secondary winding i ...
.
Types
The most common secondary structures are
alpha helices and
beta sheets. Other helices, such as the
310 helix and
π helix
A pi helix (or π-helix) is a type of secondary structure found in proteins. Discovered by crystallographer Barbara Low in 1952 and once thought to be rare, short π-helices are found in 15% of known protein structures and are believed to be an ...
, are calculated to have energetically favorable hydrogen-bonding patterns but are rarely observed in natural proteins except at the ends of α helices due to unfavorable backbone packing in the center of the helix. Other extended structures such as the
polyproline helix and
alpha sheet are rare in
native state proteins but are often hypothesized as important
protein folding
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...
intermediates. Tight
turns and loose, flexible loops link the more "regular" secondary structure elements. The
random coil is not a true secondary structure, but is the class of conformations that indicate an absence of regular secondary structure.
Amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s vary in their ability to form the various secondary structure elements.
Proline and
glycine are sometimes known as "helix breakers" because they disrupt the regularity of the α helical backbone conformation; however, both have unusual conformational abilities and are commonly found in
turns. Amino acids that prefer to adopt
helical conformations in proteins include
methionine,
alanine,
leucine,
glutamate and
lysine ("MALEK" in
amino-acid 1-letter codes); by contrast, the large aromatic residues (
tryptophan,
tyrosine
-Tyrosine or tyrosine (symbol Tyr or Y) or 4-hydroxyphenylalanine is one of the 20 standard amino acids that are used by cells to synthesize proteins. It is a non-essential amino acid with a polar side group. The word "tyrosine" is from the G ...
and
phenylalanine) and C
β-branched amino acids (
isoleucine,
valine
Valine (symbol Val or V) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α- amino group (which is in the protonated −NH3+ form under biological conditions), an α- carboxylic acid group (which is in the deprotona ...
, and
threonine) prefer to adopt
β-strand conformations. However, these preferences are not strong enough to produce a reliable method of predicting secondary structure from sequence alone.
Low frequency collective vibrations are thought to be sensitive to local rigidity within proteins, revealing beta structures to be generically more rigid than alpha or disordered proteins. Neutron scattering measurements have directly connected the spectral feature at ~1 THz to collective motions of the secondary structure of beta-barrel protein GFP.
Hydrogen bonding patterns in secondary structures may be significantly distorted, which makes automatic determination of secondary structure difficult. There are several methods for formally defining protein secondary structure (e.g.,
DSSP, DEFINE,
STRIDE, ScrewFit
SSTref name=":0">).
DSSP classification
The Dictionary of Protein Secondary Structure, in short DSSP, is commonly used to describe the protein secondary structure with single letter codes. The secondary structure is assigned based on hydrogen bonding patterns as those initially proposed by Pauling et al. in 1951 (before any
protein structure had ever been experimentally determined). There are eight types of secondary structure that DSSP defines:
* G = 3-turn helix (
310 helix). Min length 3 residues.
* H = 4-turn helix (
α helix
The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues ea ...
). Minimum length 4 residues.
* I = 5-turn helix (
π helix
A pi helix (or π-helix) is a type of secondary structure found in proteins. Discovered by crystallographer Barbara Low in 1952 and once thought to be rare, short π-helices are found in 15% of known protein structures and are believed to be an ...
). Minimum length 5 residues.
* T = hydrogen bonded turn (3, 4 or 5 turn)
* E = extended strand in parallel and/or anti-parallel
β-sheet
The beta sheet, (β-sheet) (also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gen ...
conformation. Min length 2 residues.
* B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)
* S = bend (the only non-hydrogen-bond based assignment).
* C = coil (residues which are not in any of the above conformations).
'Coil' is often codified as ' ' (space), C (coil) or '–' (dash). The helices (G, H and I) and sheet conformations are all required to have a reasonable length. This means that 2 adjacent residues in the primary structure must form the same hydrogen bonding pattern. If the helix or sheet hydrogen bonding pattern is too short they are designated as T or B, respectively. Other protein secondary structure assignment categories exist (sharp turns,
Omega loops, etc.), but they are less frequently used.
Secondary structure is defined by
hydrogen bonding, so the exact definition of a hydrogen bond is critical. The standard hydrogen-bond definition for secondary structure is that of
DSSP, which is a purely electrostatic model. It assigns charges of ±''q''
1 ≈ 0.42
''e'' to the carbonyl carbon and oxygen, respectively, and charges of ±''q''
2 ≈ 0.20''e'' to the amide hydrogen and nitrogen, respectively. The electrostatic energy is
:
According to DSSP, a hydrogen-bond exists if and only if ''E'' is less than . Although the DSSP formula is a relatively crude approximation of the ''physical'' hydrogen-bond energy, it is generally accepted as a tool for defining secondary structure.
SST classification
SSTis a Bayesian method to assign secondary structure to protein coordinate data using the Shannon information criterion of Minimum Message Length (
MML) inference.
SST treats any assignment of secondary structure as a potential hypothesis that attempts to explain (
compress) given protein coordinate data. The core idea is that the ''best'' secondary structural assignment is the one that can explain (
compress) the coordinates of a given protein coordinates in the most economical way, thus linking the inference of secondary structure to
lossless data compression. SST accurately delineates any protein chain into regions associated with the following assignment types:
* E = (Extended) strand of a
β-pleated sheet
* G = Right-handed
310 helix
* H = Right-handed
α-helix
* I = Right-handed
π-helix
* g = Left-handed
310 helix
* h = Left-handed
α-helix
* i = Left-handed
π-helix
* 3 = 3
10-like
Turn
* 4 = α-like
Turn
* 5 = π-like
Turn
* T = Unspecified
Turn
* C = Coil
* - = Unassigned residue
SST detects π and 3
10 helical caps to standard α-helices, and automatically assembles the various extended strands into consistent β-pleated sheets. It provides a readable output of dissected secondary structural elements, and a corresponding
PyMol-loadable script to visualize the assigned secondary structural elements individually.
Experimental determination
The rough secondary-structure content of a biopolymer (e.g., "this protein is 40%
α-helix
The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues ...
and 20%
β-sheet
The beta sheet, (β-sheet) (also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gen ...
.") can be estimated
spectroscopically.
For proteins, a common method is far-ultraviolet (far-UV, 170–250 nm)
circular dichroism. A pronounced double minimum at 208 and 222 nm indicate α-helical structure, whereas a single minimum at 204 nm or 217 nm reflects random-coil or β-sheet structure, respectively. A less common method is
infrared spectroscopy
Infrared spectroscopy (IR spectroscopy or vibrational spectroscopy) is the measurement of the interaction of infrared radiation with matter by absorption, emission, or reflection. It is used to study and identify chemical substances or functi ...
, which detects differences in the bond oscillations of amide groups due to hydrogen-bonding. Finally, secondary-structure contents may be estimated accurately using the
chemical shifts of an initially unassigned
NMR spectrum.
Prediction
Predicting protein tertiary structure from only its amino sequence is a very challenging problem (see
protein structure prediction), but using the simpler secondary structure definitions is more tractable.
Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. The first widely used techniques to predict protein secondary structure from the amino acid sequence were the
Chou–Fasman method and the
GOR method
The GOR method (short for Garnier–Osguthorpe–Robson) is an information theory-based method for the prediction of secondary structures in proteins. It was developed in the late 1970s shortly after the simpler Chou–Fasman method. Like Chou– ...
.
Although such methods claimed to achieve ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts, blind computing assessments later showed that the actual accuracy was much lower.
A significant increase in accuracy (to nearly ~80%) was made by exploiting
multiple sequence alignment; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughout
evolution
Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
provides a much better picture of the structural tendencies near that position.
For illustration, a given protein might have a
glycine at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the average
hydrophobicity at that and nearby positions, the same alignment might also suggest a pattern of residue
solvent accessibility consistent with an α-helix. Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, including
neural networks,
hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...
s and
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborat ...
s. Modern prediction methods also provide a confidence score for their predictions at every position.
Secondary-structure prediction methods were evaluated by th
Critical Assessment of protein Structure Prediction (CASP) experimentsand continuously benchmarked, e.g. by
EVA (benchmark) EVA was a continuously running benchmark project for assessing the quality and value of protein structure prediction and secondary structure prediction methods. Methods for predicting both secondary structure and tertiary structure - including homo ...
. Based on these tests, the most accurate methods were
Psipred, SAM,
PORTER,
PROF,
and SABLE.
The chief area for improvement appears to be the prediction of β-strands; residues confidently predicted as β-strand are likely to be so, but the methods are apt to overlook some β-strand segments (false negatives). There is likely an upper limit of ~90% prediction accuracy overall, due to the idiosyncrasies of the standard method (
DSSP) for assigning secondary-structure classes (helix/strand/coil) to PDB structures, against which the predictions are benchmarked.
Accurate secondary-structure prediction is a key element in the prediction of
tertiary structure, in all but the simplest (
homology modeling) cases. For example, a confidently predicted pattern of six secondary structure elements βαββαβ is the signature of a
ferredoxin fold.
Applications
Both protein and nucleic acid secondary structures can be used to aid in
multiple sequence alignment. These alignments can be made more accurate by the inclusion of secondary structure information in addition to simple sequence information. This is sometimes less useful in RNA because base pairing is much more highly conserved than sequence. Distant relationships between proteins whose primary structures are unalignable can sometimes be found by secondary structure.
It has been shown that α-helices are more stable, robust to mutations and designable than β-strands in natural proteins, thus designing functional all-α proteins is likely to be easier that designing proteins with both helices and strands; this has been recently confirmed experimentally.
See also
*
Folding (chemistry)
*
Nucleic acid secondary structure
*
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
*
Structural motif
*
Protein circular dichroism data bank
*
WHAT IF software
*
List of protein secondary structure prediction programs List of notable protein secondary structure prediction programs
See also
* List of protein structure prediction software
* Protein structure prediction
Protein structure prediction is the inference of the three-dimensional structure of a pr ...
References
Further reading
*
* (The original beta-sheet conformation article.)
* (alpha- and pi-helix conformations, since they predicted that
helices would not be possible.)
External links
NetSurfP – Secondary Structure and Surface Accessibility predictorPROFScrewFitPSSpredA multiple neural network training program for protein secondary structure prediction
Genesilico metaserverMetaserver which allows to run over 20 different secondary structure predictors by one click
SST webserver: An information-theoretic (compression-based) secondary structural assignment.
{{Biomolecular structure
Protein structure 2
Stereochemistry