Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode protein
s are said to produce messenger RNA
(mRNA). Other segments of DNA are copied into RNA molecules called non-coding RNA
s (ncRNAs). Averaged over multiple cell
types in a given tissue, the quantity of mRNA is more than 10 times the quantity of ncRNA (though in particular single cell types ncRNAs may exceed mRNAs).
The general preponderance of mRNA in cells is valid even though less than 2% of the human genome can be transcribed into mRNA (Human genome#Coding vs. noncoding DNA
), while at least 80% of mammalian genomic DNA can be actively transcribed (in one or more types of cells), with the majority of this 80% considered to be ncRNA.
Both DNA and RNA are nucleic acid
s, which use base pair
s of nucleotide
s as a complementary
language. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary, antiparallel
RNA strand called a primary transcript
Transcription proceeds in the following general steps:
#RNA polymerase, together with one or more general transcription factors
, binds to promoter DNA
#RNA polymerase generates a transcription bubble
, which separates the two strands of the DNA helix. This is done by breaking the hydrogen bond
s between complementary DNA nucleotides.
#RNA polymerase adds RNA nucleotides
(which are complementary to the nucleotides of one DNA strand).
#RNA sugar-phosphate backbone forms with assistance from RNA polymerase to form an RNA strand.
#Hydrogen bonds of the RNA–DNA helix break, freeing the newly synthesized RNA strand.
#If the cell has a nucleus
, the RNA may be further processed. This may include polyadenylation
, and splicing
# The RNA may remain in the nucleus or exit to the cytoplasm
through the nuclear pore
If the stretch of DNA is transcribed into an RNA molecule that encodes a protein
, the RNA is termed messenger RNA
(mRNA); the mRNA, in turn, serves as a template for the protein's synthesis through translation
. Other stretches of DNA may be transcribed into small non-coding RNA
s such as microRNA
, transfer RNA
(tRNA), small nucleolar RNA
(snoRNA), small nuclear RNA
(snRNA), or enzymatic RNA molecules called ribozyme
[Eldra P. Solomon, Linda R. Berg, Diana W. Martin. ''Biology, 8th Edition, International Student Edition''. Thomson Brooks/Cole. ]
as well as larger non-coding RNAs such as ribosomal RNA
(rRNA), and long non-coding RNA
(lncRNA). Overall, RNA helps synthesize, regulate, and process proteins; it therefore plays a fundamental role in performing functions within a cell.
, the term transcription may also be used when referring to mRNA synthesis from an RNA molecule (i.e., equivalent to RNA replication). For instance, the genome
of a negative-sense
single-stranded RNA (ssRNA -) virus may be a template for a positive-sense single-stranded RNA (ssRNA +). This is because the positive-sense strand contains the sequence information needed to translate the viral proteins needed for viral replication
. This process is catalyzed by a viral RNA replicase
A DNA transcription unit encoding for a protein may contain both a ''coding sequence'', which will be translated into the protein, and ''regulatory sequences'', which direct and regulate the synthesis of that protein. The regulatory sequence before ("upstream
" from) the coding sequence is called the five prime untranslated region
(5'UTR); the sequence after ("downstream
" from) the coding sequence is called the three prime untranslated region
As opposed to DNA replication
, transcription results in an RNA complement that includes the nucleotide uracil
(U) in all instances where thymine
(T) would have occurred in a DNA complement.
Only one of the two DNA strands serve as a template for transcription. The antisense
strand of DNA is read by RNA polymerase from the 3' end to the 5' end during transcription (3' → 5'). The complementary RNA is created in the opposite direction, in the 5' → 3' direction, matching the sequence of the sense strand with the exception of switching uracil for thymine. This directionality is because RNA polymerase can only add nucleotides to the 3' end of the growing mRNA chain. This use of only the 3' → 5' DNA strand eliminates the need for the Okazaki fragment
s that are seen in DNA replication.
This also removes the need for an RNA primer
to initiate RNA synthesis, as is the case in DNA replication.
The ''non''-template (sense) strand of DNA is called the coding strand
, because its sequence is the same as the newly created RNA transcript (except for the substitution of uracil for thymine). This is the strand that is used by convention when presenting a DNA sequence.
Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA. As a result, transcription has a lower copying fidelity than DNA replication.
Transcription is divided into ''initiation'', ''promoter escape'', ''elongation,'' and ''termination''.
Setting up for transcription
Enhancers, transcription factors, Mediator complex and DNA loops in mammalian transcription
for transcription in mammals is regulated by many cis-regulatory element
s, including core promoter and promoter-proximal elements
that are located near the transcription start sites
of genes. Core promoters combined with general transcription factor
s are sufficient to direct transcription initiation, but generally have low basal activity.
Other important cis-regulatory modules are localized in DNA regions that are distant from the transcription start sites. These include enhancers
and tethering elements.
Among this constellation of elements, enhancers and their associated transcription factors
have a leading role in the initiation of gene transcription.
An enhancer localized in a DNA region distant from the promoter of a gene can have a very large effect on gene transcription, with some genes undergoing up to 100-fold increased transcription due to an activated enhancer.
Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene transcription programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes.
While there are hundreds of thousands of enhancer DNA regions,
for a particular type of tissue only specific enhancers are brought into proximity with the promoters that they regulate. In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to their target promoters.
Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and can coordinate with each other to control transcription of their common target gene.
The schematic illustration in this section shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of CTCF
), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration).
Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell
) generally bind to specific motifs on an enhancer
and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern level of transcription of the target gene. Mediator
(a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter.
Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two enhancer RNA
s (eRNAs) as illustrated in the Figure.
An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration).
An activated enhancer begins transcription of its RNA before activating transcription of messenger RNA from its target gene.
Transcription begins with the binding of RNA polymerase, together with one or more general transcription factors
, to a specific DNA sequence referred to as a "promoter
" to form an RNA polymerase-promoter "closed complex". In the "closed complex" the promoter DNA is still fully double-stranded.
RNA polymerase, assisted by one or more general transcription factors, then unwinds approximately 14 base pairs of DNA to form an RNA polymerase-promoter "open complex". In the "open complex" the promoter DNA is partly unwound and single-stranded. The exposed, single-stranded DNA is referred to as the "transcription bubble."
RNA polymerase, assisted by one or more general transcription factors, then selects a transcription start site in the transcription bubble, binds to an initiating NTP
and an extending NTP
(or a short RNA primer
and an extending NTP) complementary to the transcription start site sequence, and catalyzes bond formation to yield an initial RNA product.
, RNA polymerase holoenzyme
consists of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. In bacteria, there is one general RNA transcription factor known as a sigma factor
. RNA polymerase core enzyme binds to the bacterial general transcription (sigma) factor to form RNA polymerase holoenzyme and then binds to a promoter.
(RNA polymerase is called a holoenzyme when sigma subunit is attached to the core enzyme which is consist of 2 α subunits, 1 β subunit, 1 β' subunit only).
, RNA polymerase contains subunits homologous
to each of the five RNA polymerase subunits in bacteria and also contains additional subunits. In archaea and eukaryotes, the functions of the bacterial general transcription factor sigma are performed by multiple general transcription factors that work together.
In archaea, there are three general transcription factors: TBP
, and TFE
. In eukaryotes, in RNA polymerase II
-dependent transcription, there are six general transcription factors: TFIIA
of archaeal TFB), TFIID
(a multisubunit factor in which the key subunit, TBP
, is an ortholog
of archaeal TBP), TFIIE
of archaeal TFE), TFIIF
, and TFIIH
. The TFIID is the first component to bind to DNA due to binding of TBP, while TFIIH is the last component to be recruited. In archaea and eukaryotes, the RNA polymerase-promoter closed complex is usually referred to as the "preinitiation complex
Transcription initiation is regulated by additional proteins, known as activators
s, and, in some cases, associated coactivator
s or corepressors
, which modulate formation and function of the transcription initiation complex.
After the first bond is synthesized, the RNA polymerase must escape the promoter. During this time there is a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive initiation
, and is common for both eukaryotes and prokaryotes. Abortive initiation continues to occur until an RNA product of a threshold length of approximately 10 nucleotides is synthesized, at which point promoter escape occurs and a transcription elongation complex is formed.
Mechanistically, promoter escape occurs through DNA scrunching
, providing the energy needed to break interactions between RNA polymerase holoenzyme and the promoter.
In bacteria, it was historically thought that the sigma factor
is definitely released after promoter clearance occurs. This theory had been known as the ''obligate release model.'' However, later data showed that upon and following promoter clearance, the sigma factor is released according to a stochastic model
known as the ''stochastic release model''.
In eukaryotes, at an RNA polymerase II-dependent promoter, upon promoter clearance, TFIIH phosphorylates serine 5 on the carboxy terminal domain of RNA polymerase II, leading to the recruitment of capping enzyme (CE). The exact mechanism of how CE induces promoter clearance in eukaryotes is not yet known.
One strand of the DNA, the ''template strand'' (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy (which elongates during the traversal). Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymine
s are replaced with uracil
s, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one fewer oxygen atom) in its sugar-phosphate backbone).
mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA molecules can be rapidly produced from a single copy of a gene. The characteristic elongation rates in prokaryotes and eukaryotes are about 10-100 nts/sec. In eukaryotes, however, nucleosome
s act as major barriers to transcribing polymerases during transcription elongation.
In these organisms, the pausing induced by nucleosomes can be regulated by transcription elongation factors such as TFIIS.
Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.
Bacteria use two different strategies for transcription termination – Rho-independent termination and Rho-dependent termination. In Rho-independent transcription termination
, RNA transcription stops when the newly synthesized RNA molecule forms a G-C-rich hairpin loop
followed by a run of Us. When the hairpin forms, the mechanical stress breaks the weak rU-dA bonds, now filling the DNA–RNA hybrid. This pulls the poly-U transcript out of the active site of the RNA polymerase, terminating transcription. In the "Rho-dependent" type of termination, a protein factor called "Rho
" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex.
Transcription termination in eukaryotes is less well understood than in bacteria, but involves cleavage of the new transcript followed by template-independent addition of adenines at its new 3' end, in a process called polyadenylation
Role of RNA Polymerase in Post-Transcriptional changes in RNA
RNA polymerase plays a very crucial role in all steps including post-transcriptional changes in RNA.
As shown in the image in the right it is evident that the CTD (C Terminal Domain) is a tail that changes its shape; this tail will be used as a carrier of splicing, capping and polyadenylation
, as shown in the image on the left.
Transcription inhibitors can be used as antibiotic
s against, for example, pathogenic bacteria
s) and fungi
). An example of such an antibacterial is rifampicin
, which inhibits bacterial transcription
of DNA into mRNA by inhibiting DNA-dependent RNA polymerase
by binding its beta-subunit, while 8-hydroxyquinoline
is an antifungal transcription inhibitor. The effects of histone methylation
may also work to inhibit the action of transcription. Potent, bioactive natural products like triptolide that inhibit mammalian transcription via inhibition of the XPB subunit of the general transcription factor TFIIH has been recently reported as a glucose conjugate for targeting hypoxic cancer cells with increased glucose transporter expression.
In vertebrates, the majority of gene promoters
contain a CpG island
with numerous CpG site
When many of a gene's promoter CpG sites are methylated
the gene becomes inhibited (silenced).
Colorectal cancers typically have 3 to 6 driver
mutations and 33 to 66 hitchhiker
or passenger mutations.
However, transcriptional inhibition (silencing) may be of more importance than mutation in causing progression to cancer. For example, in colorectal cancers about 600 to 800 genes are transcriptionally inhibited by CpG island methylation (see regulation of transcription in cancer
). Transcriptional repression in cancer can also occur by other epigenetic
mechanisms, such as altered expression of microRNAs
In breast cancer, transcriptional repression of BRCA1
may occur more frequently by over-expressed microRNA-182 than by hypermethylation of the BRCA1 promoter (see Low expression of BRCA1 in breast and ovarian cancers
Active transcription units are clustered in the nucleus, in discrete sites called transcription factories
. Such sites can be visualized by allowing engaged polymerases to extend their transcripts in tagged precursors (Br-UTP or Br-U) and immuno-labeling the tagged nascent RNA. Transcription factories can also be localized using fluorescence in situ hybridization or marked by antibodies directed against polymerases. There are ~10,000 factories in the nucleoplasm of a HeLa cell
, among which are ~8,000 polymerase II factories and ~2,000 polymerase III factories. Each polymerase II factory contains ~8 polymerases. As most active transcription units are associated with only one polymerase, each factory usually contains ~8 different transcription units. These units might be associated through promoters and/or enhancers, with loops forming a "cloud" around the factor.
A molecule that allows the genetic material to be realized as a protein was first hypothesized by François Jacob
and Jacques Monod
. Severo Ochoa
won a Nobel Prize in Physiology or Medicine
in 1959 for developing a process for synthesizing RNA ''in vitro
'' with polynucleotide phosphorylase
, which was useful for cracking the genetic code
. RNA synthesis by RNA polymerase
was established ''in vitro'' by several laboratories by 1965; however, the RNA synthesized by these enzymes had properties that suggested the existence of an additional factor needed to terminate transcription correctly.
In 1972, Walter Fiers
became the first person to actually prove the existence of the terminating enzyme.
Roger D. Kornberg
won the 2006 Nobel Prize in Chemistry
"for his studies of the molecular basis of eukaryotic transcription
Measuring and detecting
Transcription can be measured and detected in a variety of ways:
* G-Less Cassette
transcription assay: measures promoter strength
* Run-off transcription
assay: identifies transcription start sites (TSS)
* Nuclear run-on
assay: measures the relative abundance of newly formed transcripts
: measures single-stranded DNA generated by RNA polymerases; can work with 1,000 cells.
* RNase protection assay
: detect active transcription sites
: measures the absolute abundance of total or nuclear RNA levels, which may however differ from transcription rates
* DNA microarrays
: measures the relative abundance of the global total or nuclear RNA levels; however, these may differ from transcription rates
* In situ hybridization
: detects the presence of a transcript
* MS2 tagging
: by incorporating RNA stem loops
, such as MS2, into a gene, these become incorporated into newly synthesized RNA. The stem loops can then be detected using a fusion of GFP and the MS2 coat protein, which has a high affinity, sequence-specific interaction with the MS2 stem loops. The recruitment of GFP to the site of transcription is visualized as a single fluorescent spot. This new approach has revealed that transcription occurs in discontinuous bursts, or pulses (see Transcriptional bursting
). With the notable exception of in situ techniques, most other methods provide cell population averages, and are not capable of detecting this fundamental property of genes.
* Northern blot
: the traditional method, and until the advent of RNA-Seq
, the most quantitative
: applies next-generation sequencing techniques to sequence whole transcriptome
s, which allows the measurement of relative abundance of RNA, as well as the detection of additional variations such as fusion genes, post-transcriptional edits and novel splice sites
*Single cell RNA-Seq
: amplifies and reads partial transcriptomes from isolated cells, allowing for detailed analyses of RNA in tissues, embryos, and cancers
(such as HIV
, the cause of AIDS
), have the ability to transcribe RNA into DNA. HIV has an RNA genome that is ''reverse transcribed'' into DNA. The resulting DNA can be merged with the DNA genome of the host cell. The main enzyme responsible for synthesis of DNA from an RNA template is called reverse transcriptase
In the case of HIV, reverse transcriptase is responsible for synthesizing a complementary DNA
strand (cDNA) to the viral RNA genome. The enzyme ribonuclease H
then digests the RNA strand, and reverse transcriptase synthesises a complementary strand of DNA to form a double helix DNA structure ("cDNA"). The cDNA is integrated into the host cell's genome by the enzyme integrase
, which causes the host cell to generate viral proteins that reassemble into new viral particles. In HIV, subsequent to this, the host cell undergoes programmed cell death, or apoptosis
of T cell
However, in other retroviruses, the host cell remains intact as the virus buds out of the cell.
Some eukaryotic cells contain an enzyme with reverse transcription activity called telomerase
. Telomerase is a reverse transcriptase that lengthens the ends of linear chromosomes. Telomerase carries an RNA template from which it synthesizes a repeating sequence of DNA, or "junk" DNA. This repeated sequence of DNA is called a telomere
and can be thought of as a "cap" for a chromosome. It is important because every time a linear chromosome is duplicated, it is shortened. With this "junk" DNA or "cap" at the ends of chromosomes, the shortening eliminates some of the non-essential, repeated sequence rather than the protein-encoding DNA sequence, that is farther away from the chromosome end.
Telomerase is often activated in cancer cells to enable cancer cells to duplicate their genomes indefinitely without losing important protein-coding DNA sequence. Activation of telomerase could be part of the process that allows cancer cells to become ''immortal''. The immortalizing factor of cancer via telomere lengthening due to telomerase has been proven to occur in 90% of all carcinogenic tumors ''in vivo
'' with the remaining 10% using an alternative telomere maintenance route called ALT or Alternative Lengthening of Telomeres.
* Cell (biology)
* Cell division
* gene regulation
* gene expression
* Crick's central dogma
, in which the product of transcription, mRNA, is translated
to form polypeptide
s, and where it is asserted that the reverse processes never occur
* Gene regulation
* Long non-coding RNA
* Missense mRNA
- process of removing intron
s from precursor messenger RNA (pre-mRNA
) to make messenger RNA (mRNA
* Translation (biology)
Interactive Java simulation of transcription initiation.
FroCenter for Models of Life
at the Niels Bohr Institute.
From ttp://cmol.nbi.dk/ Center for Models of Life
at the Niels Bohr Institute.
Virtual Cell Animation Collection, Introducing Transcription