In
molecular biology
Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
, an intrinsically disordered protein (IDP) is a
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
that lacks a fixed or ordered
three-dimensional structure,
typically in the absence of its
macromolecular
A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
interaction partners, such as other proteins or
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
. IDPs range from fully unstructured to partially structured and include
random coil
In polymer chemistry, a random coil is a conformation of polymers where the monomer subunits are oriented randomly while still being bonded to adjacent units. It is not one specific shape, but a statistical distribution of shapes for all the cha ...
,
molten globule
In molecular biology, the term molten globule (MG) refers to protein states that are more or less compact (hence the "globule"), but are lacking the specific tight packing of amino acid residues which creates the solid state-like tertiary structu ...
-like
aggregates, or flexible linkers in large multi-
domain
Domain may refer to:
Mathematics
*Domain of a function, the set of input values for which the (total) function is defined
**Domain of definition of a partial function
**Natural domain of a partial function
**Domain of holomorphy of a function
* Do ...
proteins. They are sometimes considered as a separate class of proteins along with
globular
A globular cluster is a spheroidal conglomeration of stars. Globular clusters are bound together by gravity, with a higher concentration of stars towards their centers. They can contain anywhere from tens of thousands to many millions of member ...
,
fibrous
Fiber or fibre (from la, fibra, links=no) is a natural or artificial substance that is significantly longer than it is wide. Fibers are often used in the manufacture of other materials. The strongest engineering materials often incorporate ...
and
membrane protein
Membrane proteins are common proteins that are part of, or interact with, biological membranes. Membrane proteins fall into several broad categories depending on their location. Integral membrane proteins are a permanent part of a cell membrane ...
s.
IDPs are a very large and functionally important class of proteins and their discovery has disproved the idea that three-dimensional structures of proteins must be fixed to accomplish their
biological function
In evolutionary biology, function is the reason some object or process occurred in a system that evolved through natural selection. That reason is typically that it achieves some result, such as that chlorophyll helps to capture the energy of sunl ...
s. For example, IDPs have been identified to participate in weak
multivalent
In chemistry, polyvalency (or polyvalence, multivalency) is the property of chemical species (generally atoms or molecules) that exhibit more than one valence by forming multiple chemical bonds (Fig. 1). A bivalent species can form two bonds; a ...
interactions that are highly cooperative and dynamic, lending them importance in
DNA regulation and in
cell signaling
In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...
. Many IDPs can also adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinctive function, structure,
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
, interactions, evolution and regulation.
History
In the 1930s-1950s, the first
protein structure
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer ma ...
s were solved by
protein crystallography
X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
. These early structures suggested that a fixed
three-dimensional structure might be generally required to mediate biological functions of proteins. These publications solidified the
central dogma of molecular biology
The central dogma of molecular biology is an explanation of the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning. It was first stated by ...
in that the amino acid sequence of a protein determines its structure which, in turn, determines its function. In 1950, Karush wrote about 'Configurational Adaptability' contradicting this assumption. He was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s,
Levinthal's paradox
Levinthal's paradox is a thought experiment, also constituting a self-reference in the theory of protein folding. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the m ...
suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales (i.e. microseconds to minutes). Curiously, for many (small) proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in
Anfinsen's Dogma
Anfinsen's dogma, also known as the thermodynamic hypothesis, is a postulate in molecular biology. It states that, at least for a small globular protein in its standard physiological environment, the native structure is determined only by the pro ...
from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure (the amino acid sequence), is kinetically accessible and stable under a range of (near) physiological conditions, and can therefore be considered as the native state of such "ordered" proteins.
During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in
electron density
In quantum chemistry, electron density or electronic density is the measure of the probability of an electron being present at an infinitesimal element of space surrounding any given point. It is a scalar quantity depending upon three spatial va ...
maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were "disordered".
Nuclear magnetic resonance spectroscopy of proteins Nuclear magnetic resonance spectroscopy of proteins (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and ...
also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles.
In 2001, Dunker questioned whether the newly found information was ignored for 50 years with more quantitative analyses becoming available in the 2000s.
In the 2010s it became clear that IDPs are common among disease-related proteins, such as
alpha-synuclein
Alpha-synuclein is a protein that, in humans, is encoded by the ''SNCA'' gene. Alpha-synuclein is a neuronal protein that regulates synaptic vesicle trafficking and subsequent neurotransmitter release.
It is abundant in the brain, while smaller a ...
and
tau
Tau (uppercase Τ, lowercase τ, or \boldsymbol\tau; el, ταυ ) is the 19th letter of the Greek alphabet, representing the voiceless dental or alveolar plosive . In the system of Greek numerals, it has a value of 300.
The name in English ...
.
Abundance
It is now generally accepted that proteins exist as an ensemble of similar structures with some regions more constrained than others. IDPs occupy the extreme end of this spectrum of flexibility and include proteins of considerable local structure tendency or flexible multidomain assemblies.
Bioinformatic
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combine ...
predictions indicated that intrinsic disorder is more common in
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
s and
proteome
The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. ...
s than in known structures in the
protein database. Based on DISOPRED2 prediction, long (>30 residue) disordered segments occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins,
including certain disease-related proteins.
Biological roles
Highly dynamic disordered regions of proteins have been linked to functionally important phenomena such as
allosteric regulation
In biochemistry, allosteric regulation (or allosteric control) is the regulation of an enzyme by binding an effector molecule at a site other than the enzyme's active site.
The site to which the effector binds is termed the ''allosteric site ...
and
enzyme catalysis
Enzyme catalysis is the increase in the rate of a process by a biological molecule, an "enzyme". Most enzymes are proteins, and most such processes are chemical reactions. Within the enzyme, generally catalysis occurs at a localized site, calle ...
.
Many disordered proteins have the binding affinity with their receptors regulated by
post-translational modification
Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosome ...
, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling, transcription and
chromatin
Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in r ...
remodeling functions. Genes that have recently been born
de novo tend to have higher disorder.
Flexible linkers
Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged
amino acids
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via
protein domain dynamics. They also allow their binding partners to induce larger scale
conformational change
In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors.
A macromolecule is usually flexible and dynamic. Its shape can change in response to changes in its environment or oth ...
s by long-range
allostery
In biochemistry, allosteric regulation (or allosteric control) is the regulation of an enzyme by binding an effector molecule at a site other than the enzyme's active site.
The site to which the effector binds is termed the ''allosteric site ...
.
The flexible linker of FBP25 which connects two domains of FKBP25 is important for the binding of FKBP25 with DNA.
Linear motifs
Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules (RNA, DNA, sugars etc.). Many roles of linear motifs are associated with cell regulation, for instance in control of cell shape, subcellular localisation of individual proteins and regulated protein turnover. Often, post-translational modifications such as phosphorylation tune the affinity (not rarely by several orders of magnitude) of individual linear motifs for specific interactions. Relatively rapid evolution and a relatively small number of structural restraints for establishing novel (low-affinity) interfaces make it particularly challenging to detect linear motifs but their widespread biological roles and the fact that many viruses mimick/hijack linear motifs to efficiently recode infected cells underlines the timely urgency of research on this very challenging and exciting topic. Unlike globular proteins, IDPs do not have spatially-disposed active pockets. Nevertheless, in 80% of IDPs (~3 dozens) subjected to detailed structural characterization by NMR there are linear motifs termed PreSMos (pre-structured motifs) that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PreSMos are the putative active sites in IDPs.
Coupled folding and binding
Many unstructured proteins undergo transitions to more ordered states upon binding to their targets (e.g.
Molecular Recognition Features (MoRFs)). The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions etc.
The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example
Short Linear Motifs
Short may refer to:
Places
* Short (crater), a lunar impact crater on the near side of the Moon
* Short, Mississippi, an unincorporated community
* Short, Oklahoma, a census-designated place
People
* Short (surname)
* List of people known as t ...
are over-represented in disordered proteins. Disordered proteins and short linear motifs are particularly abundant in many
RNA virus
An RNA virus is a virusother than a retrovirusthat has ribonucleic acid (RNA) as its genetic material. The nucleic acid is usually single-stranded RNA ( ssRNA) but it may be double-stranded (dsRNA). Notable human diseases caused by RNA viruses ...
es such as
Hendra virus
Hendra virus (HeV), scientific name ''Hendra henipavirus'', is a bat-borne virus that is associated with a highly fatal infection in horses and humans. Numerous disease outbreaks in Australia among horses have been caused by Hendra virus. The Hend ...
,
HCV,
HIV-1
The subtypes of HIV include two major types, HIV type 1 (HIV-1) and HIV type 2 (HIV-2). HIV-1 is related to viruses found in chimpanzees and gorillas living in western Africa, while HIV-2 viruses are related to viruses found in the sooty mangabey, ...
and
human papillomaviruses
Human papillomavirus infection (HPV infection) is caused by a DNA virus from the ''Papillomaviridae'' family. Many HPV infections cause no symptoms and 90% resolve spontaneously within two years. In some cases, an HPV infection persists and res ...
. This enables such viruses to overcome their informationally limited genomes by facilitating binding, and manipulation of, a large number of
host cell
In biology and medicine, a host is a larger organism that harbours a smaller organism; whether a parasitic, a mutualistic, or a commensalist ''guest'' (symbiont). The guest is typically provided with nourishment and shelter. Examples include a ...
proteins.
Disorder in the bound state (fuzzy complexes)
Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In
fuzzy complex
Fuzzy complexes are protein complexes, where structural ambiguity or multiplicity exists and is required for biological function.Fuxreiter, M. & Tompa, P. (2011) Fuzziness: Structural Disorder in Protein Complexes Austin, New York. Alteration, tr ...
es structural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The
conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing. Some fuzzy complexes may exhibit high binding affinity, although other studies showed different affinity values for the same system in a different concentration regime.
Structural aspects
Intrinsically disordered proteins adapt many different structures in vivo according to the cell's conditions, creating a structural or conformational ensemble.
Therefore, their structures are strongly function-related. However, only few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions (IDRs) within an otherwise well-structured protein. The term intrinsically disordered protein (IDP) therefore includes proteins that contain IDRs as well as fully disordered proteins.
The existence and kind of protein disorder is encoded in its amino acid sequence.
In general, IDPs are characterized by a low content of bulky
hydrophobic
In chemistry, hydrophobicity is the physical property of a molecule that is seemingly repelled from a mass of water (known as a hydrophobe). In contrast, hydrophiles are attracted to water.
Hydrophobic molecules tend to be nonpolar and, th ...
amino acids and a high proportion of polar and charged amino acids, usually referred to as low hydrophobicity.
This property leads to good interactions with water. Furthermore, high net charges promote disorder because of electrostatic repulsion resulting from equally charged residues.
Thus disordered sequences cannot sufficiently bury a hydrophobic core to fold into stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding (refer to
biological roles). Many disordered proteins reveal regions without any regular secondary structure. These regions can be termed as flexible, compared to structured loops. While the latter are rigid and contain only one set of Ramachandran angles, IDPs involve multiple sets of angles.
The term flexibility is also used for well-structured proteins, but describes a different phenomenon in the context of disordered proteins. Flexibility in structured proteins is bound to an equilibrium state, while it is not so in IDPs.
Many disordered proteins also reveal
low complexity sequences, i.e. sequences with over-representation of a few
residue
Residue may refer to:
Chemistry and biology
* An amino acid, within a peptide chain
* Crop residue, materials left after agricultural processes
* Pesticide residue, refers to the pesticides that may remain on or in food after they are applied ...
s. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted
secondary structure
Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
.
Experimental validation
IDPs can be validated in several contexts. Most approaches for experimental validation of IDPs are restricted to extracted or purified proteins while some new experimental strategies aim to explore ''in vivo'' conformations and structural variations of IDPs inside intact living cells and systematic comparisons between their dynamics ''in vivo'' and ''in vitro''.
''In vivo'' approaches
The first direct evidence for ''in vivo'' persistence of intrinsic disorder has been achieved by in-cell NMR upon electroporation of a purified IDP and recovery of cells to an intact state.
Larger-scale ''in vivo'' validation of IDR predictions is now possible using biotin 'painting'.
''In vitro'' approaches
Intrinsically unfolded proteins, once purified, can be identified by various experimental methods. The primary method to obtain information on disordered regions of a protein is
NMR spectroscopy
Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fiel ...
. The lack of electron density in
X-ray crystallographic studies may also be a sign of disorder.
Folded proteins have a high density (partial specific volume of 0.72-0.74 mL/g) and commensurately small
radius of gyration ''Radius of gyration'' or gyradius of a body about the axis of rotation is defined as the radial distance to a point which would have a moment of inertia the same as the body's actual distribution of mass, if the total mass of the body were concentr ...
. Hence, unfolded proteins can be detected by methods that are sensitive to molecular size, density or
hydrodynamic drag
In fluid dynamics, drag (sometimes called air resistance, a type of friction, or fluid resistance, another type of friction or fluid friction) is a force acting opposite to the relative motion of any object moving with respect to a surrounding flu ...
, such as
size exclusion chromatography
Size-exclusion chromatography (SEC), also known as molecular sieve chromatography, is a chromatographic method in which molecules in solution are separated by their size, and in some cases molecular weight. It is usually applied to large molecules ...
,
analytical ultracentrifugation Analytical ultracentrifugation is an analytical technique which combines an ultracentrifuge with optical monitoring systems.
In an analytical ultracentrifuge (commonly abbreviated as AUC), a sample’s sedimentation profile is monitored in real tim ...
,
small angle X-ray scattering (SAXS)
Small-angle X-ray scattering (SAXS) is a small-angle scattering technique by which nanoscale density differences in a sample can be quantified. This means that it can determine nanoparticle size distributions, resolve the size and shape of (monodis ...
, and measurements of the
diffusion constant
Fick's laws of diffusion describe diffusion and were derived by Adolf Fick in 1855. They can be used to solve for the diffusion coefficient, . Fick's first law can be used to derive his second law which in turn is identical to the diffusion equ ...
. Unfolded proteins are also characterized by their lack of
secondary structure
Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
, as assessed by far-UV (170-250 nm)
circular dichroism (esp. a pronounced minimum at ~200 nm) or
infrared
Infrared (IR), sometimes called infrared light, is electromagnetic radiation (EMR) with wavelengths longer than those of visible light. It is therefore invisible to the human eye. IR is generally understood to encompass wavelengths from around ...
spectroscopy. Unfolded proteins also have exposed backbone
peptide
Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides.
A ...
groups exposed to solvent, so that they are readily cleaved by
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
s, undergo rapid
hydrogen-deuterium exchange and exhibit a small dispersion (<1 ppm) in their 1H amide
chemical shift
In nuclear magnetic resonance (NMR) spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a magnetic field. Often the position and number of chemical shifts are diagnostic of the structure of ...
s as measured by
NMR
Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
. (Folded proteins typically show dispersions as large as 5 ppm for the amide protons.)
Recently, new methods including
Fast parallel proteolysis (FASTpp)
Fast parallel proteolysis (FASTpp) is a method to determine the thermostability of proteins by measuring which fraction of protein resists rapid proteolytic digestion.
History and background
Proteolysis is widely used in biochemistry and cell b ...
have been introduced, which allow to determine the fraction folded/disordered without the need for purification.
Even subtle differences in the stability of missense mutations, protein partner binding and (self)polymerisation-induced folding of (e.g.) coiled-coils can be detected using FASTpp as recently demonstrated using the tropomyosin-troponin protein interaction.
Fully unstructured protein regions can be experimentally validated by their hypersusceptibility to proteolysis using short digestion times and low protease concentrations.
Bulk methods to study IDP structure and dynamics include
SAXS
Small-angle X-ray scattering (SAXS) is a small-angle scattering technique by which nanoscale density differences in a sample can be quantified. This means that it can determine nanoparticle size distributions, resolve the size and shape of (monodis ...
for ensemble shape information,
NMR
Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
for atomistic ensemble refinement,
Fluorescence
Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation. It is a form of luminescence. In most cases, the emitted light has a longer wavelength, and therefore a lower photon energy, tha ...
for visualising molecular interactions and conformational transitions, x-ray crystallography to highlight more mobile regions in otherwise rigid protein crystals, cryo-EM to reveal less fixed parts of proteins, light scattering to monitor size distributions of IDPs or their aggregation kinetics,
NMR
Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
chemical shift
In nuclear magnetic resonance (NMR) spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a magnetic field. Often the position and number of chemical shifts are diagnostic of the structure of ...
and
Circular Dichroism to monitor secondary structure of IDPs.
Single-molecule methods to study IDPs include spFRET to study conformational flexibility of IDPs and the kinetics of structural transitions,
optical tweezers
Optical tweezers (originally called single-beam gradient force trap) are scientific instruments that use a highly focused laser beam to hold and move microscopic and sub-microscopic objects like atoms, nanoparticles and droplets, in a manner simila ...
for high-resolution insights into the ensembles of IDPs and their oligomers or aggregates, nanopores to reveal global shape distributions of IDPs, magnetic tweezers to study structural transitions for long times at low forces, high-speed
AFM to visualise the spatio-temporal flexibility of IDPs directly.
Disorder annotation
Intrinsic disorder can be either annotated from experimental information or predicted with specialized software.
Disorder prediction algorithms can predict Intrinsic Disorder (ID) propensity with high accuracy (approaching around 80%) based on primary sequence composition, similarity to unassigned segments in protein x-ray datasets, flexible regions in NMR studies and physico-chemical properties of amino acids.
Disorder databases
Databases have been established to annotate protein sequences with intrinsic disorder information. The
DisProt
DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs). DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and function ...
database contains a collection of manually curated protein segments which have been experimentally determined to be disordered.
MobiDB is a database combining experimentally curated disorder annotations (e.g. from DisProt) with data derived from missing residues in X-ray crystallographic structures and flexible regions in NMR structures.
Predicting IDPs by sequence
Separating disordered from ordered proteins is essential for disorder prediction. One of the first steps to find a factor that distinguishes IDPs from non-IDPs is to specify biases within the amino acid composition. The following hydrophilic, charged amino acids A, R, G, Q, S, P, E and K have been characterized as disorder-promoting amino acids, while order-promoting amino acids W, C, F, I, Y, V, L, and N are hydrophobic and uncharged. The remaining amino acids H, M, T and D are ambiguous, found in both ordered and unstructured regions.
A more recent analysis ranked amino acids by their propensity to form disordered regions as follows (order promoting to disorder promoting): W, F, Y, I, M, L, V, N, C, T, A, G, R, D, H, Q, K, S, E, P.
This information is the basis of most sequence-based predictors. Regions with little to no secondary structure, also known as NORS (NO Regular Secondary structure) regions, and low-complexity regions can easily be detected. However, not all disordered proteins contain such low complexity sequences.
Prediction methods
Determining disordered regions from biochemical methods is very costly and time-consuming. Due to the variable nature of IDPs, only certain aspects of their structure can be detected, so that a full characterization requires a large number of different methods and experiments. This further increases the expense of IDP determination. In order to overcome this obstacle, computer-based methods are created for predicting protein structure and function. It is one of the main goals of bioinformatics to derive knowledge by prediction. Predictors for IDP function are also being developed, but mainly use structural information such as
linear motif sites.
There are different approaches for predicting IDP structure, such as
neural networks
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
or matrix calculations, based on different structural and/or biophysical properties.
Many computational methods exploit sequence information to predict whether a protein is disordered. Notable examples of such software include IUPRED and Disopred. Different methods may use different definitions of disorder. Meta-predictors show a new concept, combining different primary predictors to create a more competent and exact predictor.
Due to the different approaches of predicting disordered proteins, estimating their relative accuracy is fairly difficult. For example,
neural networks
A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
are often trained on different datasets. The disorder prediction category is a part of biannual
CASP
Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...
experiment that is designed to test methods according accuracy in finding regions with missing 3D structure (marked in
PDB files as REMARK465, missing electron densities in X-ray structures).
Disorder and disease
Intrinsically unstructured proteins have been implicated in a number of diseases.
Aggregation of misfolded proteins is the cause of many
synuclein
Synucleins are a family of soluble proteins common to vertebrates, primarily expressed in neural tissue and in certain tumors.
The name is a blend of the words "synapse" and "nucleus", as it was first found in the synapses in the electromotor n ...
opathies and toxicity as those proteins start binding to each other randomly and can lead to cancer or cardiovascular diseases. Thereby, misfolding can happen spontaneously because millions of copies of proteins are made during the lifetime of an organism. The aggregation of the intrinsically unstructured protein
α-synuclein
Alpha-synuclein is a protein that, in humans, is encoded by the ''SNCA'' gene. Alpha-synuclein is a neuronal protein that regulates synaptic vesicle trafficking and subsequent neurotransmitter release.
It is abundant in the brain, while smaller a ...
is thought to be responsible. The structural flexibility of this protein together with its susceptibility to modification in the cell leads to misfolding and aggregation. Genetics, oxidative and nitrative stress as well as mitochondrial impairment impact the structural flexibility of the unstructured α-synuclein protein and associated disease mechanisms.
Many key
tumour suppressor
A tumor suppressor gene (TSG), or anti-oncogene, is a gene that regulates a cell during cell division and replication. If the cell grows uncontrollably, it will result in cancer. When a tumor suppressor gene is mutated, it results in a loss or red ...
s have large intrinsically unstructured regions, for example p53 and BRCA1. These regions of the proteins are responsible for mediating many of their interactions. Taking the cell's native defense mechanisms as a model drugs can be developed, trying to block the place of noxious substrates and inhibiting them, and thus counteracting the disease.
Computer simulations
Owing to high structural heterogeneity, NMR/SAXS experimental parameters obtained will be an average over a large number of highly diverse and disordered states (an ensemble of disordered states). Hence, to understand the structural implications of these experimental parameters, there is a necessity for accurate representation of these ensembles by computer simulations. All-atom molecular dynamic simulations can be used for this purpose but their use is limited by the accuracy of current force-fields in representing disordered proteins. Nevertheless, some force-fields have been explicitly developed for studying disordered proteins by optimising force-field parameters using available NMR data for disordered proteins. (examples are CHARMM 22*, CHARMM 32, Amber ff03* etc.)
MD simulations restrained by experimental parameters (restrained-MD) have also been used to characterise disordered proteins. In principle, one can sample the whole conformational space given an MD simulation (with accurate Force-field) is run long enough. Because of very high structural heterogeneity, the time scales that needs to be run for this purpose are very large and are limited by computational power. However, other computational techniques such as accelerated-MD simulations,
replica exchange
Parallel tempering in physics and statistics, is a computer simulation method typically used to find the lowest free energy state of a system of many interacting particles at low temperature. That is, the one expected to be observed in reality. ...
simulations,
metadynamics
Metadynamics (MTD; also abbreviated as METAD or MetaD) is a computer simulation method in computational physics, chemistry and biology. It is used to estimate the free energy and other state functions of a system, where ergodicity is hindered by ...
,
multicanonical MD simulations, or methods using
coarse-grained
Granularity (also called graininess), the condition of existing in granules or grains, refers to the extent to which a material or system is composed of distinguishable pieces. It can either refer to the extent to which a larger entity is subd ...
representation with implicit and explicit solvents have been used to sample broader conformational space in smaller time scales.
Moreover, various protocols and methods of analyzing IDPs, such as studies based on quantitative analysis of GC content in genes and their respective chromosomal bands, have been used to understand functional IDP segments.
See also
*
IDPbyNMR
The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europea ...
*
DisProt database
*
MobiDB database
*
Molten globule
In molecular biology, the term molten globule (MG) refers to protein states that are more or less compact (hence the "globule"), but are lacking the specific tight packing of amino acid residues which creates the solid state-like tertiary structu ...
*
Random coil
In polymer chemistry, a random coil is a conformation of polymers where the monomer subunits are oriented randomly while still being bonded to adjacent units. It is not one specific shape, but a statistical distribution of shapes for all the cha ...
*
Dark proteome The dark proteome is defined as proteins with no defined three-dimensional structure. It can not be detected or analyzed with the use of homologous modeling or analytical quantification for the molecular conformation is unknown.Perdigão, Nelson.Dar ...
References
{{reflist, 32em
External links
Intrinsically disordered protein at ProteopediaMobiDB: a comprehensive database of intrinsic protein disorder annotationsIDEAL - Intrinsically Disordered proteins with Extensive Annotations and LiteratureD2P2 Database of Disordered Protein PredictionsFirst IDP journal covering all topics of IDP researchIDP JournalDatabase of experimentally validated IDPsIDP ensemble database
Proteins by structure
Protein structure