Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a

molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioche ...

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...

, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual

atom Every atom is composed of a nucleus and one or more electrons bound to the nucleus. The nucleus is made of one or more protons and a number of neutrons. Only the most common variety of hydrogen has no neutrons. Every solid, liquid, gas, ...

s to the relationships among entire

protein subunits In structural biology, a protein subunit is a polypeptide chain or single protein molecule that assembles (or "''coassembles''") with others to form a protein complex. Large assemblies of proteins such as viruses often use a small number of ty ...

. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable ''domains'' of

protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monom ...

and

nucleic acid structure Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quater ...

, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms ''primary'', ''secondary'', ''tertiary'', and ''quaternary structure'' were introduced by

Kaj Ulrik Linderstrøm-Lang Kaj Ulrik Linderstrøm-Lang (29 November 1896 – 25 May 1959) was a Danish protein scientist, who was the director of the Carlsberg Laboratory from 1939 until his death. His most notable scientific contributions were the development of sundry phys ...

in his 1951 Lane Medical Lectures at Stanford University.

Primary structure

The primary structure of a biopolymer is the exact specification of its atomic composition and the chemical bonds connecting those atoms (including stereochemistry). For a typical unbranched, un-crosslinked biopolymer (such as a

of a typical intracellular

, or of DNA or RNA), the primary structure is equivalent to specifying the sequence of its

monomer In chemistry, a monomer ( ; '' mono-'', "one" + ''-mer'', "part") is a molecule that can react together with other monomer molecules to form a larger polymer chain or three-dimensional network in a process called polymerization. Classification ...

ic subunits, such as amino acids or

nucleotides Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules with ...

. The primary structure of a protein is reported starting from the amino N-terminus to the carboxyl C-terminus, while the primary structure of DNA or RNA molecule is known as the nucleic acid sequence reported from the 5' end to the

3' end Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-r ...

. The nucleic acid sequence refers to the exact sequence of nucleotides that comprise the whole molecule. Often, the primary structure encodes sequence motifs that are of functional importance. Some examples of such motifs are: the C/D and H/ACA boxes of

snoRNA In molecular biology, Small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs. There are two main classes of snoRNA, ...

LSm In molecular biology, LSm proteins are a family of RNA-binding proteins found in virtually every cellular organism. LSm is a contraction of 'like Sm', because the first identified members of the LSm protein family were the Sm proteins. LSm pr ...

binding site found in spliceosomal RNAs such as U1, U2, U4, U5, U6, U12 and U3, the Shine-Dalgarno sequence, the

Kozak consensus sequence The Kozak consensus sequence (Kozak consensus or Kozak sequence) is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in ...

and the RNA polymerase III terminator.

Secondary structure

The secondary structure of a protein is the pattern of hydrogen bonds in a biopolymer. These determine the general three-dimensional form of ''local segments'' of the biopolymers, but does not describe the global structure of specific atomic positions in three-dimensional space, which are considered to be

tertiary structure Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may i ...

. Secondary structure is formally defined by the hydrogen bonds of the biopolymer, as observed in an atomic-resolution structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amine and carboxyl groups (sidechain–mainchain and sidechain–sidechain hydrogen bonds are irrelevant), where the DSSP definition of a hydrogen bond is used. The secondary structure of a nucleic acid is defined by the hydrogen bonding between the nitrogenous bases. For proteins, however, the hydrogen bonding is correlated with other structural features, which has given rise to less formal definitions of secondary structure. For example, helices can adopt backbone

dihedral angle A dihedral angle is the angle between two intersecting planes or half-planes. In chemistry, it is the clockwise angle between half-planes through two sets of three atoms, having two atoms in common. In solid geometry, it is defined as the un ...

s in some regions of the

Ramachandran plot In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a �,ψplot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions ...

; thus, a segment of residues with such dihedral angles is often called a ''helix'', regardless of whether it has the correct hydrogen bonds. Many other less formal definitions have been proposed, often applying concepts from the differential geometry of curves, such as curvature and

torsion Torsion may refer to: Science * Torsion (mechanics), the twisting of an object due to an applied torque * Torsion of spacetime, the field used in Einstein–Cartan theory and ** Alternatives to general relativity * Torsion angle, in chemistry Bi ...

. Structural biologists solving a new atomic-resolution structure will sometimes assign its secondary structure ''by eye'' and record their assignments in the corresponding

Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, ...

(PDB) file. The secondary structure of a nucleic acid molecule refers to the base pairing interactions within one molecule or set of interacting molecules. The secondary structure of biological RNA's can often be uniquely decomposed into stems and loops. Often, these elements or combinations of them can be further classified, e.g. tetraloops, pseudoknots and

stem loop Stem-loop intramolecular base pairing is a pattern that can occur in single-stranded RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence whe ...

s. There are many secondary structure elements of functional importance to biological RNA. Famous examples include the Rho-independent terminator stem loops and the transfer RNA (tRNA) cloverleaf. There is a minor industry of researchers attempting to determine the secondary structure of RNA molecules. Approaches include both

experimental An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a ...

and

computational Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An espe ...

methods (see also the

List of RNA structure prediction software This list of RNA structure prediction software is a compilation of software tools and web portals used for RNA structure prediction. Single sequence secondary structure prediction. Single sequence tertiary structure prediction Comparative me ...

Tertiary structure

The ''

'' of a

or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates. Proteins and nucleic acids fold into complex three-dimensional structures which result in the molecules' functions. While such structures are diverse and complex, they are often composed of recurring, recognizable tertiary structure motifs and domains that serve as molecular building blocks. Tertiary structure is considered to be largely determined by the biomolecule's primary structure (its sequence of

amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha a ...

s or

nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...

s).

Quaternary structure

The ''protein quaternary structure'' refers to the number and arrangement of multiple protein molecules in a multi-subunit complex. For nucleic acids, the term is less common, but can refer to the higher-level organization of DNA in

chromatin Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in r ...

, including its interactions with

histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn a ...

s, or to the interactions between separate RNA units in the ribosome or

spliceosome A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs ( snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to specif ...

Structure determination

Structure probing is the process by which biochemical techniques are used to determine biomolecular structure. This analysis can be used to define the patterns that can be used to infer the molecular structure, experimental analysis of molecular structure and function, and further understanding on development of smaller molecules for further biological research. Structure probing analysis can be done through many different methods, which include chemical probing, hydroxyl radical probing, nucleotide analog interference mapping (NAIM), and in-line probing.

Protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...

and nucleic acid structures can be determined using either nuclear magnetic resonance spectroscopy (

NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with ...

) or

X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...

or single-particle cryo electron microscopy (

cryoEM Cryogenic electron microscopy (cryo-EM) is a cryomicroscopy technique applied on samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of vitreous ice. An aqueous sample s ...

). The first published reports for DNA (by Rosalind Franklin and

Raymond Gosling Raymond George Gosling (15 July 1926 – 18 May 2015) was a British scientist. While a PhD student at King's College, London he worked under the supervision of Maurice Wilkins and Rosalind Franklin. The crystallographic experiments of Frankli ...

in 1953) of A-DNA X-ray diffraction patterns—and also B-DNA—used analyses based on

Patterson function The Patterson function is used to solve the phase problem in X-ray crystallography. It was introduced in 1935 by Arthur Lindo Patterson while he was a visiting researcher in the laboratory of Bertram Eugene Warren at MIT. The Patterson function is ...

transforms that provided only a limited amount of structural information for oriented fibers of DNA isolated from calf

thymus The thymus is a specialized primary lymphoid organ of the immune system. Within the thymus, thymus cell lymphocytes or ''T cells'' mature. T cells are critical to the adaptive immune system, where the body adapts to specific foreign invaders. ...

. An alternate analysis was then proposed by Wilkins et al. in 1953 for B-DNA X-ray diffraction and scattering patterns of hydrated, bacterial-oriented DNA fibers and trout sperm heads in terms of squares of

Bessel function Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions of Bessel's differential equation x^2 \frac + x \frac + \left(x^2 - \alpha^2 \right)y = 0 for an arbitrar ...

s. Although the ''B-DNA form' is most common under the conditions found in cells, it is not a well-defined conformation but a family or fuzzy set of DNA conformations that occur at the high hydration levels present in a wide variety of living cells. Their corresponding X-ray diffraction & scattering patterns are characteristic of molecular paracrystals with a significant degree of disorder (over 20%), and the structure is not tractable using only the standard analysis. In contrast, the standard analysis, involving only Fourier transforms of

s and DNA

molecular model A molecular model is a physical model of an atomistic system that represents molecules and their processes. They play an important role in understanding chemistry and generating and testing hypotheses. The creation of mathematical models of molecu ...

s, is still routinely used to analyze A-DNA and Z-DNA X-ray diffraction patterns.

Structure prediction

Biomolecular structure prediction is the prediction of the three-dimensional structure of a

from its

sequence, or of a nucleic acid from its

nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic b ...

(base) sequence. In other words, it is the prediction of secondary and tertiary structure from its primary structure. Structure prediction is the inverse of biomolecular design, as in

rational design In chemical biology and biomolecular engineering, rational design (RD) is an umbrella term which invites the strategy of creating new molecules with a certain functionality, based upon the ability to predict how the molecule's structure (specific ...

protein design Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch (''de novo'' design) or by making calcul ...

, nucleic acid design, and

biomolecular engineering Biomolecular engineering is the application of engineering principles and practices to the purposeful manipulation of molecules of biological origin. Biomolecular engineers integrate knowledge of biological processes with the core knowledge of chemi ...

. Protein structure prediction is one of the most important goals pursued by bioinformatics and

theoretical chemistry Theoretical chemistry is the branch of chemistry which develops theoretical generalizations that are part of the theoretical arsenal of modern chemistry: for example, the concepts of chemical bonding, chemical reaction, valence, the surface o ...

. Protein structure prediction is of high importance in

medicine Medicine is the science and practice of caring for a patient, managing the diagnosis, prognosis, prevention, treatment, palliation of their injury or disease, and promoting their health. Medicine encompasses a variety of health care pr ...

(for example, in

drug design Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organic small molecule that acti ...

) and

biotechnology Biotechnology is the integration of natural sciences and engineering sciences in order to achieve the application of organisms, cells, parts thereof and molecular analogues for products and services. The term ''biotechnology'' was first used ...

(for example, in the design of novel

enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...

s). Every two years, the performance of current methods is assessed in the ''Critical Assessment of protein Structure Prediction'' (

CASP Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP prov ...

) experiment. There has also been a significant amount of bioinformatics research directed at the RNA structure prediction problem. A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given only the nucleic acid sequence. However, in the case of RNA, much of the final structure is determined by the secondary structure or intra-molecular base-pairing interactions of the molecule. This is shown by the high conservation of base pairings across diverse species. Secondary structure of small nucleic acid molecules is determined largely by strong, local interactions such as hydrogen bonds and base stacking. Summing the free energy for such interactions, usually using a nearest-neighbor method, provides an approximation for the stability of given structure. The most straightforward way to find the lowest free energy structure would be to generate all possible structures and calculate the free energy for them, but the number of possible structures for a sequence increases exponentially with the length of the molecule. For longer molecules, the number of possible secondary structures is vast. Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. These methods analyze the covariation of individual base sites in

evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...

; maintenance at two widely separated sites of a pair of base-pairing

s indicates the presence of a structurally required hydrogen bond between those positions. The general problem of pseudoknot prediction has been shown to be

NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying ...

Design

Biomolecular design can be considered the inverse of structure prediction. In structure prediction, the structure is determined from a known sequence, whereas, in protein or nucleic acid design, a sequence that will form a desired structure is generated.

Other biomolecules

Other biomolecules, such as polysaccharides,

polyphenol Polyphenols () are a large family of naturally occurring organic compounds characterized by multiples of phenol units. They are abundant in plants and structurally diverse. Polyphenols include flavonoids, tannic acid, and ellagitannin, some o ...

s and

lipid Lipids are a broad group of naturally-occurring molecules which includes fats, waxes, sterols, fat-soluble vitamins (such as vitamins A, D, E and K), monoglycerides, diglycerides, phospholipids, and others. The functions of lipids includ ...

s, can also have higher-order structure of biological consequence.

Notes

References

{{DEFAULTSORT:Biomolecular Structure Biomolecules