HOME

TheInfoList



OR:

A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two
nucleobases Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the bas ...
bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" (or "Watson–Crick–Franklin") base pairs (
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is ...
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached ( ...
and
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its deriv ...
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidin ...
) allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the
genetic information A nucleic acid sequence is a succession of bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. By convention, sequences are u ...
encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to crea ...
replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes. Intramolecular base pairs can occur within single-stranded nucleic acids. This is particularly important in RNA molecules (e.g., transfer RNA), where Watson–Crick base pairs (guanine–cytosine and adenine–
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced b ...
) permit the formation of short double-stranded helices, and a wide variety of non–Watson–Crick interactions (e.g., G–U or A–A) allow RNAs to fold into a vast range of specific three-dimensional structures. In addition, base-pairing between transfer RNA (tRNA) and messenger RNA (mRNA) forms the basis for the molecular recognition events that result in the nucleotide sequence of mRNA becoming
translated Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
into the amino acid sequence of
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s via the genetic code. The size of an individual
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
or an organism's entire
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions of telomeres). The haploid human genome (23
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins ar ...
s) is estimated to be about 3.2 billion bases long and to contain 20,000–25,000 distinct protein-coding genes. A kilobase (kb) is a unit of measurement in
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and phys ...
equal to 1000 base pairs of DNA or RNA. The total number of DNA base pairs on Earth is estimated at 5.0 with a weight of 50 billion
tonne The tonne ( or ; symbol: t) is a unit of mass equal to 1000  kilograms. It is a non-SI unit accepted for use with SI. It is also referred to as a metric ton to distinguish it from the non-metric units of the short ton ( United State ...
s. In comparison, the total
mass Mass is an intrinsic property of a body. It was traditionally believed to be related to the quantity of matter in a physical body, until the discovery of the atom and particle physics. It was found that different atoms and different ele ...
of the biosphere has been estimated to be as much as 4  TtC (trillion tons of
carbon Carbon () is a chemical element with the symbol C and atomic number 6. It is nonmetallic and tetravalent—its atom making four electrons available to form covalent chemical bonds. It belongs to group 14 of the periodic table. Carbon ma ...
).


Hydrogen bonding and stability

Top, a G.C base pair with three hydrogen bonds. Bottom, an A.T base pair with two hydrogen bonds. Non-covalent hydrogen bonds between the bases are shown as dashed lines. The wiggly lines stand for the connection to the pentose sugar and point in the direction of the minor groove.
Hydrogen bonding is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low GC-content. But, contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly; stabilization is mainly due to stacking interactions. The bigger
nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basi ...
s, adenine and guanine, are members of a class of double-ringed chemical structures called
purine Purine is a heterocyclic aromatic organic compound that consists of two rings ( pyrimidine and imidazole) fused together. It is water-soluble. Purine also gives its name to the wider class of molecules, purines, which include substituted purines ...
s; the smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures called pyrimidines. Purines are complementary only with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine-purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. Purine-pyrimidine base-pairing of AT or GC or UA (in RNA) results in proper duplex structure. The only other purine-pyrimidine pairings would be AC and GT and UG (in RNA); these pairings are mismatches because the patterns of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often in RNA (see wobble base pair). Paired DNA and RNA molecules are comparatively stable at room temperature, but the two nucleotide strands will separate above a
melting point The melting point (or, rarely, liquefaction point) of a substance is the temperature at which it changes state from solid to liquid. At the melting point the solid and liquid phase exist in equilibrium. The melting point of a substance depen ...
that is determined by the length of the molecules, the extent of mispairing (if any), and the GC content. Higher GC content results in higher melting temperatures; it is, therefore, unsurprising that the genomes of
extremophile An extremophile (from Latin ' meaning "extreme" and Greek ' () meaning "love") is an organism that is able to live (or in some cases thrive) in extreme environments, i.e. environments that make survival challenging such as due to extreme tem ...
organisms such as ''
Thermus thermophilus ''Thermus thermophilus'' is a Gram-negative bacterium used in a range of biotechnological applications, including as a model organism for genetic manipulation, structural genomics, and systems biology. The bacterium is extremely thermophilic, ...
'' are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often- transcribed genes — are comparatively GC-poor (for example, see TATA box). GC content and melting temperature must also be taken into account when designing primers for PCR reactions.


Examples

The following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the 5′-end to the 3′-end; thus, the bottom strand is written 3′ to 5′. :A base-paired DNA sequence: :: :: :The corresponding RNA sequence, in which
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced b ...
is substituted for thymine in the RNA strand: :: ::


Base analogs and intercalators

Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors (mostly point mutations) in
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritan ...
and DNA transcription. This is due to their isosteric chemistry. One common mutagenic base analog is 5-bromouracil, which resembles thymine but can base-pair to guanine in its
enol In organic chemistry, alkenols (shortened to enols) are a type of reactive structure or intermediate in organic chemistry that is represented as an alkene ( olefin) with a hydroxyl group attached to one end of the alkene double bond (). T ...
form. Other chemicals, known as DNA intercalators, fit into the gap between adjacent bases on a single strand and induce
frameshift mutation A frameshift mutation (also called a framing error or a reading frame shift) is a genetic mutation caused by indels ( insertions or deletions) of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature ...
s by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site. Most intercalators are large polyaromatic compounds and are known or suspected
carcinogen A carcinogen is any substance, radionuclide, or radiation that promotes carcinogenesis (the formation of cancer). This may be due to the ability to damage the genome or to the disruption of cellular metabolic processes. Several radioactive sub ...
s. Examples include
ethidium bromide Ethidium bromide (or homidium bromide, chloride salt homidium chloride) is an intercalating agent commonly used as a fluorescent tag ( nucleic acid stain) in molecular biology laboratories for techniques such as agarose gel electrophoresis. It ...
and acridine.


Mismatch repair

Mismatched base pairs can be generated by errors of
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritan ...
and as intermediates during
homologous recombination Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in cellular organisms but may ...
. The process of mismatch repair ordinarily must recognize and correctly repair a small number of base mispairs within a long sequence of normal DNA base pairs. To repair mismatches formed during DNA replication, several distinctive repair processes have evolved to distinguish between the template strand and the newly formed strand so that only the newly inserted incorrect nucleotide is removed (in order to avoid generating a mutation). The proteins employed in mismatch repair during DNA replication, and the clinical significance of defects in this process are described in the article DNA mismatch repair. The process of mispair correction during recombination is described in the article
gene conversion Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces a ...
.


Unnatural base pair (UBP)

An unnatural base pair (UBP) is a designed subunit (or
nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basi ...
) of DNA which is created in a laboratory and does not occur in nature. DNA sequences have been described which use newly created nucleobases to form a third base pair, in addition to the two base pairs found in nature, A-T (
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its deriv ...
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidin ...
) and G-C (
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is ...
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached ( ...
). A few research groups have been searching for a third base pair for DNA, including teams led by
Steven A. Benner Steven Albert Benner (born October 23, 1954) has been a professor at Harvard University, ETH Zurich, and the University of Florida where he was the V.T. & Louise Jackson Distinguished Professor of Chemistry. In 2005, he founded The Westheimer In ...
,
Philippe Marliere Philippe is a masculine sometimes feminin given name, cognate to Philip. It may refer to: * Philippe of Belgium (born 1960), King of the Belgians (2013–present) * Philippe (footballer) (born 2000), Brazilian footballer * Prince Philippe, Count ...
,
Floyd E. Romesberg Floyd E. Romesberg is an American biotechnologist, biochemist, and geneticist formerly at Scripps Research in San Diego, California. He is known for leading the team that created the first Unnatural Base Pair (UBP), thus expanding the genetic ...
and Ichiro Hirao. Some new base pairs based on alternative hydrogen bonding, hydrophobic interactions and metal coordination have been reported. In 1989 Steven Benner (then working at the Swiss Federal Institute of Technology in Zurich) and his team led with modified forms of cytosine and guanine into DNA molecules ''in vitro''. The nucleotides, which encoded RNA and proteins, were successfully replicated ''in vitro''. Since then, Benner's team has been trying to engineer cells that can make foreign bases from scratch, obviating the need for a feedstock. In 2002, Ichiro Hirao's group in Japan developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in transcription and translation, for the site-specific incorporation of non-standard amino acids into proteins. In 2006, they created 7-(2-thienyl)imidazo ,5-byridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription. Afterward, Ds and 4-
-(6-aminohexanamido)-1-propynyl The hyphen-minus is the most commonly used type of hyphen, widely used in digital documents. It is the only character that looks like a minus sign or a dash in many character sets such as ASCII or on most keyboards, so it is also used as such. ...
2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification. In 2013, they applied the Ds-Px pair to DNA aptamer generation by ''in vitro'' selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins. In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the Scripps Research Institute in San Diego, California, published that his team designed an unnatural base pair (UBP). The two new artificial nucleotides or ''Unnatural Base Pair'' (UBP) were named d5SICS and dNaM. More technically, these artificial
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecu ...
s bearing hydrophobic
nucleobase Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basi ...
s, feature two fused aromatic rings that form a (d5SICS–dNaM) complex or base pair in DNA. His team designed a variety of ''in vitro'' or "test tube" templates containing the unnatural base pair and they confirmed that it was efficiently replicated with high fidelity in virtually all sequence contexts using the modern standard ''in vitro'' techniques, namely PCR amplification of DNA and PCR-based applications. Their results show that for PCR and PCR-based applications, the d5SICS–dNaM unnatural base pair is functionally equivalent to a natural base pair, and when combined with the other two natural base pairs used by all organisms, A–T and G–C, they provide a fully functional and expanded six-letter "genetic alphabet". In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a plasmid containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed and inserted it into cells of the common bacterium '' E. coli'' that successfully replicated the unnatural base pairs through multiple generations. The transfection did not hamper the growth of the ''E. coli'' cells and showed no sign of losing its unnatural base pairs to its natural
DNA repair DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA d ...
mechanisms. This is the first known example of a living organism passing along an expanded genetic code to subsequent generations. Romesberg said he and his colleagues created 300 variants to refine the design of nucleotides that would be stable enough and would be replicated as easily as the natural ones when the cells divide. This was in part achieved by the addition of a supportive algal gene that expresses a nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into ''E. coli'' bacteria. Then, the natural bacterial replication pathways use them to accurately replicate a plasmid containing d5SICS–dNaM. Other researchers were surprised that the bacteria replicated these human-made DNA subunits. The successful incorporation of a third base pair is a significant breakthrough toward the goal of greatly expanding the number of
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s which can be encoded by DNA, from the existing 20 amino acids to a theoretically possible 172, thereby expanding the potential for living organisms to produce novel
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
s. The artificial strings of DNA do not encode for anything yet, but scientists speculate they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses. Experts said the synthetic DNA incorporating the unnatural base pair raises the possibility of life forms based on a different DNA code.


Non-canonical base pairing

In addition to the canonical pairing, some conditions can also favour base-pairing with alternative base orientation, and number and geometry of hydrogen bonds. These pairings are accompanied by alterations to the local backbone shape. The most common of these is the wobble base pairing that occurs between
tRNA Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino ...
s and
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
s at the third base position of many
codon The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
s during transcription and during the charging of tRNAs by some tRNA synthetases. They have also been observed in the secondary structures of some RNA sequences. Additionally, Hoogsteen base pairing (typically written as A•U/T and G•C) can exist in some DNA sequences (e.g. CA and TA dinucleotides) in dynamic equilibrium with standard Watson–Crick pairing. They have also been observed in some protein–DNA complexes. In addition to these alternative base pairings, a wide range of base-base hydrogen bonding is observed in RNA secondary and tertiary structure. These bonds are often necessary for the precise, complex shape of an RNA, as well as its binding to interaction partners.


Length measurements

The following abbreviations are commonly used to describe the length of a D/R NA molecule: * bp = base pair—one bp corresponds to approximately 3.4  Å (340  pm) of length along the strand, and to roughly 618 or 643 daltons for DNA and RNA respectively. * kb (= kbp) = kilo–base-pair = 1,000 bp * Mb (= Mbp) = mega–base-pair = 1,000,000 bp * Gb = giga–base-pair = 1,000,000,000 bp. For single-stranded DNA/RNA, units of
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecu ...
s are used—abbreviated nt (or knt, Mnt, Gnt)—as they are not paired. To distinguish between units of
computer storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
and bases, kbp, Mbp, Gbp, etc. may be used for base pairs. The centimorgan is also often used to imply distance along a chromosome, but the number of base pairs it corresponds to varies widely. In the human genome, the centimorgan is about 1 million base pairs.


See also

*
List of Y-DNA single-nucleotide polymorphisms {, class="wikitable" !Mutation number !Nucleotide change !Position (base pair) !Total size (base pairs) !Position Forward 5′→3′ !Reverse 5′→3′ , - !M1 (YAP) , 291bp insertion , , , , , - !M2 , A to G , 168 , 209 , , , - !M3 , ...
*
Non-canonical base pairing Non-canonical base pairing occurs when nucleobases hydrogen bond, or base pair, to one another in schemes other than the standard Watson-Crick base pairs (which are adenine (A) -- thymine (T) in DNA, adenine (A) -- uracil (U) in RNA, and guanin ...
* Chargaff's rules


References


Further reading

* (See esp. ch. 6 and 9) * * *


External links


DAN
��webserver version of the EMBOSS tool for calculating melting temperatures {{DEFAULTSORT:Base Pair Nucleobases Molecular genetics Nucleic acids