DNA synthesis is the natural or artificial creation of
deoxyribonucleic acid (DNA) molecules. DNA is a
macromolecule
A macromolecule is a "molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass." Polymers are physi ...
made up of
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
units, which are linked by
covalent bonds
A covalent bond is a chemical bond that involves the sharing of electrons to form electron pairs between atoms. These electron pairs are known as shared pairs or bonding pairs. The stable balance of attractive and repulsive forces between atom ...
and
hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially (''in vitro'') or naturally (''in vivo''). Nucleotide units are made up of a nitrogenous base (cytosine, guanine, adenine or thymine), pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing (adenine and thymine, guanine and cytosine) occurs naturally when hydrogen bonds form between the nucleotide bases.
There are several different definitions for DNA synthesis: it can refer to
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
- DNA biosynthesis (''in vivo'' DNA amplification),
polymerase chain reaction
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA (or a part of it) sufficiently to enable detailed st ...
- enzymatic DNA synthesis (''in vitro'' DNA amplification) or
gene synthesis - physically creating
artificial gene sequences. Though each type of synthesis is very different, they do share some features.
Nucleotides that have been joined to form
polynucleotides can act as a DNA template for one form of DNA synthesis - PCR - to occur. DNA replication also works by using a DNA template, the DNA double helix unwinds during replication, exposing unpaired bases for new nucleotides to hydrogen bond to. Gene synthesis, however, does not require a DNA template and genes are assembled ''de novo''.
DNA synthesis occurs in all
eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s and
prokaryote
A prokaryote (; less commonly spelled procaryote) is a unicellular organism, single-celled organism whose cell (biology), cell lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Gree ...
s, as well as some
virus
A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
es. The accurate synthesis of DNA is important in order to avoid mutations to DNA. In humans, mutations could lead to diseases such as cancer so DNA synthesis, and the machinery involved ''in vivo'', has been studied extensively throughout the decades. In the future these studies may be used to develop technologies involving DNA synthesis, to be used in data storage.
DNA replication

In nature, DNA molecules are synthesised by all living
cells through the process of
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
. This typically occurs as a part of
cell division
Cell division is the process by which a parent cell (biology), cell divides into two daughter cells. Cell division usually occurs as part of a larger cell cycle in which the cell grows and replicates its chromosome(s) before dividing. In eukar ...
.
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
occurs so, during cell division, each daughter cell contains an accurate copy of the genetic material of the cell. ''In vivo'' DNA synthesis (
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
) is dependent on a complex set of
enzymes
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as pro ...
which have evolved to act during the
S phase of the cell cycle, in a concerted fashion. In both
eukaryotes
The eukaryotes ( ) constitute the domain of Eukaryota or Eukarya, organisms whose cells have a membrane-bound nucleus. All animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes. They constitute a major group of ...
and
prokaryotes
A prokaryote (; less commonly spelled procaryote) is a single-celled organism whose cell lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Ancient Greek (), meaning 'before', and (), meaning 'nut' ...
,
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all life, living organisms, acting as the most essential part of heredity, biolog ...
occurs when specific
topoisomerases
DNA topoisomerases (or topoisomerases) are enzymes that catalyze changes in the topological state of DNA, interconverting relaxed and supercoiled forms, linked (catenated) and unlinked species, and knotted and unknotted DNA. Topological issues in ...
,
helicases and
gyrases (replication initiator proteins)
uncoil the double-stranded DNA, exposing the nitrogenous bases.
These enzymes, along with accessory proteins, form a macromolecular machine which ensures accurate duplication of DNA sequences. Complementary base pairing takes place, forming a new double-stranded DNA molecule. This is known as semi-conservative replication since one strand of the new DNA molecule is from the 'parent' strand.
Continuously, eukaryotic enzymes encounter DNA damage which can perturb DNA replication. This damage is in the form of DNA lesions that arise spontaneously or due to DNA damaging agents. DNA replication machinery is therefore highly controlled in order to prevent collapse when encountering damage. Control of the DNA replication system ensures that the genome is replicated only once per cycle; over-replication induces DNA damage. Deregulation of DNA replication is a key factor in genomic instability during cancer development.
This highlights the specificity of DNA synthesis machinery ''in vivo''. Various means exist to artificially stimulate the replication of naturally occurring DNA, or to create artificial gene sequences. However, DNA synthesis ''in vitro'' can be a very error-prone process.
DNA repair synthesis
Damaged DNA is subject to repair by several different
enzymatic repair processes, where each individual process is specialized to repair particular types of damage. The DNA of humans is subject to damage from multiple natural sources and insufficient repair is associated with disease and
premature aging.
[Tiwari V, Wilson DM 3rd. DNA Damage and Associated DNA Repair Defects in Disease and Premature Aging. Am J Hum Genet. 2019 Aug 1;105(2):237-257. doi: 10.1016/j.ajhg.2019.06.005. Review. PMID 31374202] Most DNA repair processes form single-strand gaps in DNA during an intermediate stage of the repair, and these gaps are filled in by repair synthesis.
[ The specific repair processes that require gap filling by DNA synthesis include nucleotide excision repair, base excision repair, mismatch repair, ]homologous recombination
Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in Cell (biology), cellular organi ...
al repair, non-homologous end joining
Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA. It is called "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology directed repair ...
and microhomology-mediated end joining.
Reverse Transcription
Reverse transcription is part of the replication cycle of particular virus families, including retrovirus
A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. After invading a host cell's cytoplasm, the virus uses its own reverse transcriptase e ...
es. It involves copying RNA into double-stranded complementary DNA (cDNA), using reverse transcriptase
A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, by retrotransposon mobi ...
enzymes. In retroviruses, viral RNA is inserted into a host cell nucleus. There, a viral reverse transcriptase enzyme adds DNA nucleotides onto the RNA sequence, generating cDNA that is inserted into the host cell genome by the enzyme integrase
Retroviral integrase (IN) is an enzyme
An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme ...
, encoding viral proteins.
Polymerase chain reaction
A polymerase chain reaction
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA (or a part of it) sufficiently to enable detailed st ...
is a form of enzymatic DNA synthesis in the laboratory, using cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA.
DNA synthesis during PCR is very similar to living cells but has very specific reagents and conditions. During PCR, DNA is chemically extracted from host chaperone proteins then heated, causing thermal dissociation of the DNA strands. Two new cDNA strands are built from the original strand, these strands can be split again to act as the template for further PCR products. The original DNA is multiplied through many rounds of PCR.[ More than a billion copies of the original DNA strand can be made.
]
Random mutagenesis
For many experiments, such as structural and evolutionary studies, scientists need to produce a large library of variants of a particular DNA sequence. Random mutagenesis takes place in vitro, when mutagenic replication with a low fidelity DNA polymerase is combined with selective PCR amplification to produce many copies of mutant DNA.
RT-PCR
RT-PCR differs from conventional PCR as it synthesizes cDNA from mRNA, rather than template DNA. The technique couples a reverse transcription reaction with PCR-based amplification, as an RNA sequence acts as a template for the enzyme, reverse transcriptase. RT-PCR is often used to test gene expression in particular tissue or cell types at various developmental stages or to test for genetic disorders.
Gene synthesis
Artificial gene synthesis is the process of synthesizing a gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
''in vitro
''In vitro'' (meaning ''in glass'', or ''in the glass'') Research, studies are performed with Cell (biology), cells or biological molecules outside their normal biological context. Colloquially called "test-tube experiments", these studies in ...
'' without the need for initial template DNA samples.
In 2010 J. Craig Venter and his team were the first to use entirely synthesized DNA to create a self-replicating microbe, dubbed Mycoplasma laboratorium.
Oligonucleotide synthesis
Oligonucleotide synthesis is the chemical synthesis of sequences of nucleic acids. The majority of biological research and bioengineering involves synthetic DNA, which can include oligonucleotide
Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, Recombinant DNA, research, and Forensic DNA, forensics. Commonly made in the laboratory by Oligonucleotide synthesis, solid-phase ...
s, synthetic genes, or even chromosome
A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
s. Today, most synthetic DNA is custom-built using the phosphoramidite method by Marvin H. Caruthers. Oligos are synthesized from building blocks which replicate natural bases. Other techniques for synthesising DNA have been commercially made available, including Short Oligo Ligation Assembly. The process has been automated since the late 1970s and can be used to form desired genetic sequences as well as for other uses in medicine and molecular biology. However, creating sequences chemically is impractical beyond 200-300 bases, and is an environmentally hazardous process. These oligos, of around 200 bases, can be connected using DNA assembly methods, creating larger DNA molecules.
Some studies have explored the possibility of enzymatic synthesis using terminal deoxynucleotidyl transferase (TdT), a DNA polymerase that requires no template. However, this method is not yet as effective as chemical synthesis, and is not commercially available.
With advances in artificial DNA synthesis, the possibility of DNA data storage is being explored. With its ultrahigh storage density and long-term stability, synthetic DNA is an interesting option to store large amounts of data. Although information can be retrieved very quickly from DNA through next generation sequencing technologies, de novo synthesis of DNA is a major bottleneck in the process. Only one nucleotide can be added per cycle, with each cycle taking seconds, so the overall synthesis is very time-consuming, as well as very error prone. However, if biotechnology improves, synthetic DNA could one day be used in data storage.
Base pair synthesis
It has been reported that new nucleobase
Nucleotide bases (also nucleobases, nitrogenous bases) are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic building blocks of nuc ...
pairs can be synthesized, as well as A-T (adenine
Adenine (, ) (nucleoside#List of nucleosides and corresponding nucleobases, symbol A or Ade) is a purine nucleotide base that is found in DNA, RNA, and Adenosine triphosphate, ATP. Usually a white crystalline subtance. The shape of adenine is ...
- thymine
Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
) and G-C (guanine
Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
- cytosine
Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
). Synthetic nucleotides can be used to expand the genetic alphabet and allow specific modification of DNA sites. Even just a third base pair would expand the number of amino acids that can be encoded by DNA from the existing 20 amino acids to a possible 172.
Hachimoji DNA is built from eight nucleotide letters, forming four possible base pairs. It therefore doubles the information density of natural DNA. In studies, RNA has even been produced from hachimoji DNA. This technology could also be used to allow data storage in DNA.
References
{{reflist
DNA replication