Non-canonical base pairing occurs when

nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic b ...

hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a ...

, or

base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...

, to one another in schemes other than the standard Watson-Crick base pairs (which are

adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its derivati ...

(A) --

thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nu ...

(T) in DNA, adenine (A) --

uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by ...

(U) in

RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...

, and

guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is called ...

(G) --

cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an am ...

hydrogen bonds In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a ...

, those having interactions among C−H and O/N groups, and those that have hydrogen bonds between the bases themselves. The first discovered non-canonical base pairs are Hoogsteen base pairs, which were first described by American biochemist Karst Hoogsteen. Non-canonical base pairings commonly occur in the

secondary structure Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...

of RNA (e.g. pairing of G with U), and in

tRNA Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino ac ...

recognition. They are typically less stable than standard base pairings. The presence of non-canonical base pairs in double stranded DNA results in a disrupted

double helix A double is a look-alike or doppelgänger; one person or being that resembles another. Double, The Double or Dubble may also refer to: Film and television * Double (filmmaking), someone who substitutes for the credited actor of a character * ...

History

James Watson James Dewey Watson (born April 6, 1928) is an American molecular biologist, geneticist, and zoologist. In 1953, he co-authored with Francis Crick the academic paper proposing the double helix structure of the DNA molecule. Watson, Crick and ...

and

Francis Crick Francis Harry Compton Crick (8 June 1916 – 28 July 2004) was an English molecular biologist, biophysicist, and neuroscientist. He, James Watson, Rosalind Franklin, and Maurice Wilkins played crucial roles in deciphering the helical struc ...

published the double helical structure of DNA and proposed the canonical Watson-Crick base pairs in 1953. Ten years later, in 1963, Karst Hoogsteen reported that he had used single crystal X-ray diffraction to investigate alternative base pair structures, and he found an alternative structure for the nucelobase pair adenine-thymine in which the

purine Purine is a heterocyclic compound, heterocyclic aromatic organic compound that consists of two rings (pyrimidine and imidazole) fused together. It is water-soluble. Purine also gives its name to the wider class of molecules, purines, which includ ...

(A) takes on an alternative conformation with respect to the

pyrimidine Pyrimidine (; ) is an aromatic, heterocyclic, organic compound similar to pyridine (). One of the three diazines (six-membered heterocyclics with two nitrogen atoms in the ring), it has nitrogen atoms at positions 1 and 3 in the ring. The other ...

(T). Five years after Hoogsteen proposed the A-T

Hoogsteen base pair A Hoogsteen base pair is a variation of base-pairing in nucleic acids such as the A•T pair. In this manner, two nucleobases, one on each strand, can be held together by hydrogen bonds in the major groove. A Hoogsteen base pair applies the N7 pos ...

, optical rotary dispersion spectra which provided evidence for a G-C Hoogsteen base pair were reported. The G-C Hoogsteen base pair was first observed via X-ray crystallography years later, in 1986, by co-crystallizing DNA with triostin A (an

antibiotic An antibiotic is a type of antimicrobial substance active against bacteria. It is the most important type of antibacterial agent for fighting bacterial infections, and antibiotic medications are widely used in the treatment and prevention of ...

). Ultimately, after years of studying both Watson-Crick and Hoogsteen base pairs, it has been determined that both occur naturally in DNA, and that they exist in equilibrium with one another; the conditions in which the DNA exists ultimately determine which form will be favored. Since the structures of the canonical Watson-Crick and non-canonical Hoogsteen base pairs were determined, many other types of non-canonical base pairs have been presented and described.

Structure

Base pairing

An estimated 60% of bases in structured RNA participate in canonical Watson-Crick base pairs. Base pairing occurs when two bases form hydrogen bonds with each other. These hydrogen bonds can be either polar or non-polar interactions. The polar hydrogen bonds are formed by N-H...O/N and/or O-H...O/N interactions. Non-polar hydrogen bonds are formed between C-H...O/N.

Edge interactions

Each base has three potential edges where it can interact with another base. The Purine bases have 3 edges which are able to hydrogen bond. Those are known as the Watson-Crick edge(WC), the Hoogsteen edge(H), and the Sugar edge(S). Pyrimidine bases also have three hydrogen-bonding edges. Like the purine, there is the Watson-Crick edge(WC) and the Sugar edge(S) but the third edge is referred to as the "C-H" edge(H) on the pyrimidine bases. This C-H edge is sometimes also referred to as the Hoogsteen edge for simplicity. The various edges for the purine and pyrimidine bases are shown in Figure 2. Cis_Trans_orientations_of_glycosidic_Bond

Cis_Trans_orientations_of_glycosidic_Bond

Besides the three edges of interaction, base pairs can also vary in their cis/trans forms. The cis and trans structures depend on the orientation of the ribose sugar as opposed to the hydrogen bond interaction. These various orientations are shown in Figure 3. Therefore, with the cis/trans forms and the 3 hydrogen bond edges, there are 12 basic types of base pairing geometries which can be found in RNA structures. Those 12 types are WC:WC (cis/trans), W:HC (cis/trans), WC:S (cis/trans), H:S (cis/trans), H:H (cis/trans), and S:S (cis/trans).

Classification

These 12 types can be further divided into more subgroups which are dependent on the directionality of the glycosidic bonds and steric extensions. With all of the various base pair combinations there are 169 theoretically possible base pair combinations. The actual number of

base-pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...

combinations is lower because some combinations result in non-favorable interactions. This number of possible non-canonical base pairs is still being determined as it depends strongly on base pairing criteria . Understanding base pair configuration is similarly difficult since the pairing is depends on the bases surroundings. These surroundings can consist of adjacent base pairs, adjacent loops, or third interactions (such as a base triple). Non-canonical_base_pairing_Fig5

The bonds between various bases are well defined because of their rigid and planar shape. The spatial interactions between the two bases can be classified in 6 rigid-body parameters or intra-base pair parameters (3 translational, 3 rotational) as shown in Figure 4. These parameters describe the base pairs' three dimensional conformation. The three translational arrangements are known as shear, stretch, and stagger. These three parameters are directly related to the proximity and direction of the hydrogen bonds. The rotational arrangements are buckle, propeller, and opening. Rotational arrangements relate to the non-planar confirmation (as compared to the ideal coplanar geometry). Intra-base pair parameters are used to determine the structure and stabilities of non-canonical base pairs and were originally created for the base pairings in DNA, but were found to also fit the non-canonical base models.

Types

The most common non-canonical base pairs are trans A:G Hoogsteen/sugar edge, A:U Hoogsteen/WC, and G:U Wobble pairs.

Hoogsteen base pairs

s occur between adenine (A) and thymine (T); and guanine (G) and cytosine(C); similarly to Watson-Crick base pairs. However, the

(A and G) takes on an alternative conformation with respect to the

. In the A-U Hoogsteen base pair, the adenine is rotated 180° about the

glycosidic bond A glycosidic bond or glycosidic linkage is a type of covalent bond that joins a carbohydrate (sugar) molecule to another group, which may or may not be another carbohydrate. A glycosidic bond is formed between the hemiacetal or hemiketal group ...

, resulting in an alternative hydrogen bonding scheme which has one hydrogen bond in common with the Watson-Crick base pair (adenine N6 and thymine N4), while the other hydrogen bond, instead of occurring between adenine N1 and thymine N3 as in the Watson-Crick base pair, occurs between adenine N7 and thymine N3. The A-U base pair is shown in Figure 5. In the G-C Watson-Crick base pair, like the A-T Hoogsteen base pair, the purine (guanine) is rotated 180° about the glycosidic bond while the pyrimidine (cytosine) remains in place. One hydrogen bond from the Watson-Crick base pair is maintained (guanine O6 and cytosine N4) and the other occurs between guanine N7 and a protonated cytosine N3 (note that the Hoogsteen G-C base pair has two hydrogen bonds, while the Watson-Crick G-C base pair has three). Wobble

Wobble base pairs

Wobble base pair A wobble base pair is a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. The four main wobble base pairs are guanine-uracil (G-U), hypoxanthine-uracil (I-U), hypoxanthine-adenine (I-A), and ...

ing occur between two nucleotides that are not Watson-Crick base pairs and was proposed by Watson in 1966. The 4 main examples are guanine-uracil (G-U),

hypoxanthine Hypoxanthine is a naturally occurring purine derivative. It is occasionally found as a constituent of nucleic acids, where it is present in the anticodon of tRNA in the form of its nucleoside inosine. It has a tautomer known as 6-hydroxypurine. Hyp ...

-uracil (I-U), hypoxanthine-adenine (I-A), and hypoxanthine-cytosine (I-C). These wobble base pairs are very important in tRNA. Most organisms have less than 45 tRNA molecules even though 61 tRNA molecules would technically be necessary to canonically pair to the codon. Wobble base pairing allows for the 5' anticodon to bond to a non-standard base pair. Examples of wobble base pairs are given in Figure 6.

3-D Structure

The secondary and three-dimensional structures of RNA are formed and stabilized through non-canonical base pairs. Base pairs make up many secondary structural blocks which aid the folding of RNA complexes and three dimensional structures. The overall folded RNA is stabilized by the tertiary and secondary structures canonically base pairing together. Due to the many possible non-canonical base pairs, there are an unlimited amount of structures, which allows for the diverse functions of RNA. The arrangement of the non-canonical bases also allow long-range RNA interactions, recognition of proteins and other molecules, and structural stabilizing elements. Many of the common non-canonical base pairs can be added to a stacked RNA stem without disturbing its helical character.

Secondary

Basic secondary structural elements of RNA include bulges, double helices, hairpin loops, and internal loops. An example of a hairpin loop of RNA is given in Figure 7. As shown in the figure, hairpin loops and internal loops require a sudden change in backbone direction. Non-canonical base pairing allows for the increased flexibility at junctions or turns required in the secondary structure.

Tertiary

Three-dimensional structures are formed through the long-range intra-molecular interactions between the secondary structures. This leads to the formation of pseudoknots, ribose zippers, kissing hairpin loops, or co-axial pseudocontinuous helices. The three-dimensional structures of RNA are primarily determined through molecular simulations or computationally guided measurements. An example of a Pseudoknot is given in Figure 8.

Experimental Methods

Watson-Crick canonical base pairing is not the only edge-to-edge conformation possible for the nucleotide since non-canonical pairing can take place as well. Sugar-phosphate backbone has an ionic character, which makes the bases sensitive to their environment, leading to conformational changes, such as non-canonical pairing. There are various methods of prediction for these conformations, such as

NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with ...

structure determination and

X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...

Applications

RNA has a multitude of purposes throughout the cell including regulating many important steps in

gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...

. Various conformations of the non-Watson-Crick base pairs allow for a multitude of biological functions such as

mRNA splicing RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns (non-coding regions of RNA) and ''splicing'' ba ...

siRNA Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA at first non-coding RNA molecules, typically 20-24 (normally 21) base pairs in length, similar to miRNA, and operating wi ...

, transport, protein recognition, protein binding, and

translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...

. One common example of a biological application of non-canonical base pairs is the kink turn. A kink-turn is found throughout many functional RNA species. It consists of a three-nucleotide bulge due to three Hoogsteen base pairs. This kink-turn acts as a marker where various proteins such as the human 15-5k protein or proteins in the L7Ae family can bind. A similar scenario is described in the binding of the HIV-1 Rev-response element (RRE) RNA. RRE RNA has an extra wide deep groove that is caused by cis Watson-Crick G:A pair followed by a trans Watson-Crick G:G. The HIV-1 Rev-response element is then able to bind due to the deepened groove.

References

{{reflist Molecular genetics Nucleic acids