Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a
molecule
A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
of
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
,
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
, or
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual
atom
Atoms are the basic particles of the chemical elements. An atom consists of a atomic nucleus, nucleus of protons and generally neutrons, surrounded by an electromagnetically bound swarm of electrons. The chemical elements are distinguished fr ...
s to the relationships among entire
protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various
hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
s. This leads to several recognizable ''domains'' of
protein structure
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...
and
nucleic acid structure, including such secondary-structure features as
alpha helix
An alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix).
The alpha helix is the most common structural arrangement in the Protein secondary structure, secondary structure of proteins. It is al ...
es and
beta sheet
The beta sheet (β-sheet, also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gene ...
s for proteins, and
hairpin loops, bulges, and internal loops for nucleic acids.
The terms ''primary'', ''secondary'', ''tertiary'', and ''quaternary structure'' were introduced by
Kaj Ulrik Linderstrøm-Lang Kaj Ulrik Linderstrøm-Lang (29 November 1896 – 25 May 1959) was a Danish protein scientist, who was the director of the Carlsberg Laboratory from 1939 until his death.
His most notable scientific contributions were the development of sundry phy ...
in his 1951 Lane Medical Lectures at
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
.
Primary structure
The primary structure of a
biopolymer
Biopolymers are natural polymers produced by the cells of living organisms. Like other polymers, biopolymers consist of monomeric units that are covalently bonded in chains to form larger molecules. There are three main classes of biopolymers, ...
is the exact specification of its atomic composition and the chemical bonds connecting those atoms (including
stereochemistry
Stereochemistry, a subdiscipline of chemistry, studies the spatial arrangement of atoms that form the structure of molecules and their manipulation. The study of stereochemistry focuses on the relationships between stereoisomers, which are defined ...
). For a typical unbranched, un-crosslinked
biopolymer
Biopolymers are natural polymers produced by the cells of living organisms. Like other polymers, biopolymers consist of monomeric units that are covalently bonded in chains to form larger molecules. There are three main classes of biopolymers, ...
(such as a
molecule
A molecule is a group of two or more atoms that are held together by Force, attractive forces known as chemical bonds; depending on context, the term may or may not include ions that satisfy this criterion. In quantum physics, organic chemi ...
of a typical intracellular
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
, or of
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
or
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
), the primary structure is equivalent to specifying the sequence of its
monomer
A monomer ( ; ''mono-'', "one" + '' -mer'', "part") is a molecule that can react together with other monomer molecules to form a larger polymer chain or two- or three-dimensional network in a process called polymerization.
Classification
Chemis ...
ic subunits, such as
amino acids
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the Proteinogenic amino acid, 22 α-amino acids incorporated into p ...
or
nucleotides
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
.
The
primary structure of a protein is reported starting from the amino
N-terminus
The N-terminus (also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide, referring to the free amine group (-NH2) located at the end of a polypeptide. Within a peptide, the amin ...
to the carboxyl
C-terminus
The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, carboxy tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein
Proteins are large biomolecules and macromolecules that comp ...
, while the primary structure of DNA or RNA molecule is known as the
nucleic acid sequence
A nucleic acid sequence is a succession of Nucleobase, bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the orde ...
reported from the
5' end
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-r ...
to the
3' end
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-ri ...
.
The nucleic acid sequence refers to the exact sequence of nucleotides that comprise the whole molecule. Often, the primary structure encodes
sequence motif
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as ''A ...
s that are of functional importance. Some examples of such motifs are: the C/D
and H/ACA boxes
of
snoRNAs,
LSm binding site found in spliceosomal RNAs such as
U1,
U2,
U4,
U5,
U6,
U12 and
U3, the
Shine-Dalgarno sequence,
the
Kozak consensus sequence
The Kozak consensus sequence (Kozak consensus or Kozak sequence) is a Nucleic acid sequence, nucleic acid motif that functions as the protein Translation (biology), translation initiation site in most eukaryotic Messenger RNA, mRNA transcripts. Reg ...
and the
RNA polymerase III terminator.
Secondary structure

The
secondary structure of a protein is the pattern of hydrogen bonds in a biopolymer. These determine the general three-dimensional form of ''local segments'' of the biopolymers, but does not describe the global structure of specific atomic positions in three-dimensional space, which are considered to be
tertiary structure
Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the ...
. Secondary structure is formally defined by the hydrogen bonds of the biopolymer, as observed in an atomic-resolution structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amine and carboxyl groups (sidechain–mainchain and sidechain–sidechain hydrogen bonds are irrelevant), where the
DSSP definition of a hydrogen bond is used.
The
secondary structure of a nucleic acid is defined by the hydrogen bonding between the nitrogenous bases.
For proteins, however, the hydrogen bonding is correlated with other structural features, which has given rise to less formal definitions of secondary structure. For example, helices can adopt backbone
dihedral angles in some regions of the
Ramachandran plot
In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a �,ψplot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regio ...
; thus, a segment of residues with such dihedral angles is often called a ''helix'', regardless of whether it has the correct hydrogen bonds. Many other less formal definitions have been proposed, often applying concepts from the
differential geometry
Differential geometry is a Mathematics, mathematical discipline that studies the geometry of smooth shapes and smooth spaces, otherwise known as smooth manifolds. It uses the techniques of Calculus, single variable calculus, vector calculus, lin ...
of curves, such as
curvature
In mathematics, curvature is any of several strongly related concepts in geometry that intuitively measure the amount by which a curve deviates from being a straight line or by which a surface deviates from being a plane. If a curve or su ...
and
torsion. Structural biologists solving a new atomic-resolution structure will sometimes assign its secondary structure ''by eye'' and record their assignments in the corresponding
Protein Data Bank
The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
(PDB) file.
The
secondary structure of a nucleic acid molecule refers to the
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
ing interactions within one molecule or set of interacting molecules. The secondary structure of biological RNA's can often be uniquely decomposed into stems and loops. Often, these elements or combinations of them can be further classified, e.g.
tetraloops,
pseudoknots and
stem loops. There are many secondary structure elements of functional importance to biological RNA. Famous examples include the
Rho-independent terminator stem loops and the
transfer RNA
Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
(tRNA) cloverleaf. There is a minor industry of researchers attempting to determine the secondary structure of RNA molecules. Approaches include both
experimental
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs whe ...
and
computational
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms.
Mechanical or electronic devices (or, historic ...
methods (see also the
List of RNA structure prediction software).
Tertiary structure
The ''
tertiary structure
Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the ...
'' of a
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
or any other
macromolecule
A macromolecule is a "molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass." Polymers are physi ...
is its three-dimensional structure, as defined by the atomic coordinates. Proteins and nucleic acids fold into complex three-dimensional structures which result in the molecules' functions. While such structures are diverse and complex, they are often composed of recurring, recognizable tertiary structure motifs and domains that serve as molecular building blocks. Tertiary structure is considered to be largely determined by the biomolecule's
primary structure
Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthe ...
(its sequence of
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
s or
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s).
Quaternary structure
The ''protein quaternary structure'' refers to the number and arrangement of multiple protein molecules in a multi-subunit complex.
For nucleic acids, the term is less common, but can refer to the higher-level organization of DNA in
chromatin
Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
, including its interactions with
histone
In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei and in most Archaeal phyla. They act as spools around which DNA winds to create structural units called nucleosomes ...
s, or to the interactions between separate RNA units in the
ribosome
Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
or
spliceosome
A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs ( snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to sp ...
.
Virus
A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es, in general, can be regarded as molecular machines.
Bacteriophage T4 is a particularly well studied virus and its
protein quaternary structure is relatively well defined. A study by Floor (1970)
showed that, during the ''in vivo'' construction of the virus by specific
morphogenetic proteins, these proteins need to be produced in balanced proportions for proper assembly of the virus to occur. Insufficiency (due to
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
) in the production of one particular morphogenetic protein (e.g. a critical tail fiber protein), can lead to the production of progeny viruses almost all of which have too few of the particular protein component to properly function, i.e. to infect host cells.
[ However, a second mutation that reduces another morphogenetic component (e.g. in the base plate or head of the phage) could in some cases restore a balance such that a higher proportion of the virus particles produced are able to function.][ Thus it was found that a mutation that reduces expression of one gene, whose product is employed in morphogenesis, may be partially suppressed by a mutation that reduces expression of a second morphogenetic gene resulting in a more balanced production of the virus gene products. The concept that, ''in vivo'', a balanced availability of components is necessary for proper molecular morphogenesis may have general applicability for understanding the assembly of protein molecular machines.
]
Structure determination
Structure probing is the process by which biochemical techniques are used to determine biomolecular structure. This analysis can be used to define the patterns that can be used to infer the molecular structure, experimental analysis of molecular structure and function, and further understanding on development of smaller molecules for further biological research. Structure probing analysis can be done through many different methods, which include chemical probing, hydroxyl radical probing, nucleotide analog interference mapping (NAIM), and in-line probing.
Protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
and nucleic acid
Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...
structures can be determined using either nuclear magnetic resonance spectroscopy (NMR
Nuclear magnetic resonance (NMR) is a physical phenomenon in which atomic nucleus, nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near and far field, near field) and respond by producing ...
) or X-ray crystallography
X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
or single-particle cryo electron microscopy (cryoEM
Cryogenic electron microscopy (cryo-EM) is a transmission electron microscopy technique applied to samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of phases of ice#Cry ...
). The first published reports for DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
(by Rosalind Franklin
Rosalind Elsie Franklin (25 July 192016 April 1958) was a British chemist and X-ray crystallographer. Her work was central to the understanding of the molecular structures of DNA (deoxyribonucleic acid), RNA (ribonucleic acid), viruses, coal ...
and Raymond Gosling
Raymond George Gosling (15 July 1926 – 18 May 2015) was a British scientist. While a PhD student at King's College, London he worked under the supervision of Maurice Wilkins and Rosalind Franklin. The crystallographic experiments of Frankl ...
in 1953) of A-DNA X-ray diffraction patterns—and also B-DNA—used analyses based on Patterson function The Patterson function is used to solve the phase problem in X-ray crystallography
X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam ...
transforms that provided only a limited amount of structural information for oriented fibers of DNA isolated from calf thymus
The thymus (: thymuses or thymi) is a specialized primary lymphoid organ of the immune system. Within the thymus, T cells mature. T cells are critical to the adaptive immune system, where the body adapts to specific foreign invaders. The thymus ...
. An alternate analysis was then proposed by Wilkins et al. in 1953 for B-DNA X-ray diffraction and scattering patterns of hydrated, bacterial-oriented DNA fibers and trout sperm heads in terms of squares of Bessel function
Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation
x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0
for an arbitrary complex ...
s. Although the ''B-DNA form' is most common under the conditions found in cells, it is not a well-defined conformation but a family or fuzzy set of DNA conformations that occur at the high hydration levels present in a wide variety of living cells. Their corresponding X-ray diffraction & scattering patterns are characteristic of molecular paracrystals with a significant degree of disorder (over 20%), and the structure is not tractable using only the standard analysis.
In contrast, the standard analysis, involving only Fourier transform
In mathematics, the Fourier transform (FT) is an integral transform that takes a function as input then outputs another function that describes the extent to which various frequencies are present in the original function. The output of the tr ...
s of Bessel function
Bessel functions, named after Friedrich Bessel who was the first to systematically study them in 1824, are canonical solutions of Bessel's differential equation
x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0
for an arbitrary complex ...
s and DNA molecular model
A molecular model is a physical model of an atomistic system that represents molecules and their processes. They play an important role in understanding chemistry and generating and testing hypotheses. The creation of mathematical models of mole ...
s, is still routinely used to analyze A-DNA and Z-DNA X-ray diffraction patterns.
Structure prediction
Biomolecular structure prediction is the prediction of the three-dimensional structure of a protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
from its amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
sequence, or of a nucleic acid
Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...
from its nucleobase
Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
(base) sequence. In other words, it is the prediction of secondary and tertiary structure from its primary structure. Structure prediction is the inverse of biomolecular design, as in rational design
In chemical biology and biomolecular engineering, rational design (RD) is an umbrella term which invites the strategy of creating new molecules with a certain functionality, based upon the ability to predict how the molecule's structure (specific ...
, protein design, nucleic acid design, and biomolecular engineering Biomolecular engineering is the application of engineering principles and practices to the purposeful manipulation of molecules of biological origin. Biomolecular engineers integrate knowledge of biological processes with the core knowledge of chemi ...
.
Protein structure prediction is one of the most important goals pursued by bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and theoretical chemistry
Theoretical chemistry is the branch of chemistry which develops theoretical generalizations that are part of the theoretical arsenal of modern chemistry: for example, the concepts of chemical bonding, chemical reaction, valence, the surface ...
. Protein structure prediction is of high importance in medicine
Medicine is the science and Praxis (process), practice of caring for patients, managing the Medical diagnosis, diagnosis, prognosis, Preventive medicine, prevention, therapy, treatment, Palliative care, palliation of their injury or disease, ...
(for example, in drug design
Drug design, often referred to as rational drug design or simply rational design, is the invention, inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic compound, organi ...
) and biotechnology
Biotechnology is a multidisciplinary field that involves the integration of natural sciences and Engineering Science, engineering sciences in order to achieve the application of organisms and parts thereof for products and services. Specialists ...
(for example, in the design of novel enzyme
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
s). Every two years, the performance of current methods is assessed in the ''Critical Assessment of protein Structure Prediction'' ( CASP) experiment.
There has also been a significant amount of bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
research directed at the RNA structure prediction problem. A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given only the nucleic acid sequence. However, in the case of RNA, much of the final structure is determined by the secondary structure
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
or intra-molecular base-pairing interactions of the molecule. This is shown by the high conservation of base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
ings across diverse species.
Secondary structure of small nucleic acid molecules is determined largely by strong, local interactions such as hydrogen bond
In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
s and base stacking. Summing the free energy for such interactions, usually using a nearest-neighbor method, provides an approximation for the stability of given structure. The most straightforward way to find the lowest free energy structure would be to generate all possible structures and calculate the free energy for them, but the number of possible structures for a sequence increases exponentially with the length of the molecule. For longer molecules, the number of possible secondary structures is vast.
Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. These methods analyze the covariation of individual base sites in evolution
Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
; maintenance at two widely separated sites of a pair of base-pairing nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
s indicates the presence of a structurally required hydrogen bond between those positions. The general problem of pseudoknot prediction has been shown to be NP-complete
In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''.
Somewhat more precisely, a problem is NP-complete when:
# It is a decision problem, meaning that for any ...
.
Design
Biomolecular design can be considered the inverse of structure prediction. In structure prediction, the structure is determined from a known sequence, whereas, in protein or nucleic acid design, a sequence that will form a desired structure is generated.
Other biomolecules
Other biomolecules, such as polysaccharide
Polysaccharides (), or polycarbohydrates, are the most abundant carbohydrates found in food. They are long-chain polymeric carbohydrates composed of monosaccharide units bound together by glycosidic linkages. This carbohydrate can react with wat ...
s, polyphenol
Polyphenols () are a large family of naturally occurring phenols. They are abundant in plants and structurally diverse. Polyphenols include phenolic acids, flavonoids, tannic acid, and ellagitannin, some of which have been used historically as ...
s and lipid
Lipids are a broad group of organic compounds which include fats, waxes, sterols, fat-soluble vitamins (such as vitamins A, D, E and K), monoglycerides, diglycerides, phospholipids, and others. The functions of lipids include storing ...
s, can also have higher-order structure of biological consequence.
See also
* Biomolecular
* Comparison of nucleic acid simulation software
* Gene structure
* List of RNA structure prediction software
* Non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally imp ...
Notes
References
{{DEFAULTSORT:Biomolecular Structure
Biomolecules