The 5′ untranslated region (also known as 5′ UTR, leader sequence, transcript leader, or leader RNA) is the region of a
messenger RNA (mRNA) that is directly
upstream from the
initiation codon. This region is important for the regulation of
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
of a transcript by differing mechanisms in
virus
A virus is a wikt:submicroscopic, submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and ...
es,
prokaryotes and
eukaryotes
Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bact ...
. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
product. This product can then regulate the translation of the main
coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex
secondary structure
Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary struct ...
to regulate translation.
The 5′ UTR has been found to interact with proteins relating to metabolism, and within the 5′ UTR. In addition, this region has been involved in
transcription regulation, such as the
sex-lethal
Sex-lethal (''Sxl'') is a gene found in Dipteran insects, named for its mutation phenotype in ''Drosophila melanogaster'' (). It is most closely related to the ELAV/HUD subfamily of splicing factors.
In fruit flies, this protein participates in a ...
gene in ''
Drosophila
''Drosophila'' () is a genus of flies, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or (less frequently) pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many s ...
''.
Regulatory elements within 5′ UTRs have also been linked to mRNA export.
General structure
Length
The 5′ UTR begins at the
transcription start site and ends one
nucleotide
Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecul ...
(nt) before the
initiation sequence (usually AUG) of the coding region. In prokaryotes, the length of the 5′ UTR tends to be 3–10 nucleotides long, while in eukaryotes it tends to be anywhere from 100 to several thousand nucleotides long. For example, the ''ste11'' transcript in ''
Schizosaccharomyces pombe
''Schizosaccharomyces pombe'', also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measur ...
'' has a 2273 nucleotide 5′ UTR while the
''lac'' operon in ''
Escherichia coli
''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Esc ...
'' only has seven nucleotides in its 5′ UTR.
The differing sizes are likely due to the complexity of the eukaryotic regulation which the 5′ UTR holds as well as the larger
pre-initiation complex that must form to begin translation.
The 5′ UTR can also be completely missing, in the case of leaderless mRNAs.
Ribosomes
Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
of all three
domains of life accept and translate such mRNAs. Such sequences are naturally found in all three domains of life. Humans have many pressure-related genes under a 2–3 nucleotide leader. Mammals also have other types of ultra-short leaders like the
TISU sequence.
Elements
The elements of a eukaryotic and prokaryotic 5′ UTR differ greatly. The prokaryotic 5′ UTR contains a
ribosome binding site (RBS), also known as the
Shine–Dalgarno sequence (AGGAGGU), which is usually 3–10
base pairs
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both D ...
upstream from the initiation codon.
In contrast, the eukaryotic 5′ UTR contains the
Kozak consensus sequence (ACCAUGG), which contains the initiation codon.
The eukaryotic 5′ UTR also contains
''cis''-acting regulatory elements called
upstream open reading frames (uORFs) and upstream AUGs (uAUGs) and termination codons, which have a great impact on the regulation of translation (
see below). Unlike prokaryotes, 5′ UTRs can harbor
introns in eukaryotes. In humans, ~35% of all genes harbor introns within the 5′ UTR.
Secondary structure
As the 5′ UTR has high
GC content,
secondary structures often occur within it.
Hairpin loop
Stem-loop intramolecular base pairing is a pattern that can occur in single-stranded RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence whe ...
s are one such secondary structure that can be located within the 5′ UTR. These secondary structures also impact the regulation of
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
.
Role in translational regulation
Prokaryotes
In
bacteria
Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
, the initiation of translation occurs when
IF-3, along with the
30S ribosomal subunit
The prokaryotic small ribosomal subunit, or 30 S subunit, is the smaller subunit of the 70S ribosome found in prokaryotes. It is a complex of the 16S ribosomal RNA (rRNA) and 19 proteins. This complex is implicated in the binding of transfer RN ...
, bind to the Shine–Dalgarno (SD) sequence of the 5′ UTR.
This then recruits many other proteins, such as the
50S ribosomal subunit
50 S is the larger subunit of the 70S ribosome of prokaryotes, i.e. bacteria and archaea. It is the site of inhibition for antibiotics such as macrolides, chloramphenicol, clindamycin, and the pleuromutilins. It includes the 5S ribosomal RNA a ...
, which allows for translation to begin. Each of these steps regulates the initiation of translation.
Initiation in
Archaea is less understood. SD sequences are much rarer, and the initiation factors have more in common with eukaryotic ones. There is no homolog of bacterial IF3.
Some mRNAs are leaderless.
In both domains, genes without Shine–Dalgarno sequences are also translated in a less understood manner. A requirement seems to be a lack of secondary structure near the initiation codon.
Eukaryotes
Pre-initiation complex regulation
The regulation of translation in eukaryotes is more complex than in prokaryotes. Initially, the
eIF4F complex is recruited to the
5′ cap
In molecular biology, the five-prime cap (5′ cap) is a specially altered nucleotide on the 5′ end of some primary transcripts such as precursor messenger RNA. This process, known as mRNA capping, is highly regulated and vital in the creation o ...
, which in turn recruits the ribosomal complex to the 5′ UTR. Both
eIF4E
Eukaryotic translation initiation factor 4E, also known as eIF4E, is a protein that in humans is encoded by the ''EIF4E'' gene.
Structure and function
Most eukaryotic cellular mRNAs are blocked at their 5'-ends with the 7-methyl- guanosine ...
and
eIF4G bind the 5′ UTR, which limits the rate at which translational initiation can occur. However, this is not the only regulatory step of
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
that involves the 5′ UTR.
RNA-binding protein
RNA-binding proteins (often abbreviated as RBPs) are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes.
RBPs contain various structural motifs, such as RNA recognition moti ...
s sometimes serve to prevent the pre-initiation complex from forming. An example is regulation of the ''msl2'' gene. The protein SXL attaches to an intron segment located within the 5′ UTR segment of the primary transcript, which leads to the inclusion of the intron after processing. This sequence allows the recruitment of proteins that bind simultaneously to both the 5′ and
3′ UTR
In molecular genetics, the three prime untranslated region (3′-UTR) is the section of messenger RNA (mRNA) that immediately follows the translation termination codon. The 3′-UTR often contains regulatory regions that post-transcriptionally ...
, not allowing translation proteins to assemble. However, it has also been noted that SXL can also repress translation of RNAs that do not contain a
poly(A) tail, or more generally, 3′ UTR.
Closed-loop regulation
Another important regulator of translation is the interaction between 3′ UTR and the 5′ UTR.
The closed-loop structure inhibits translation. This has been observed in ''
Xenopus laevis'', in which eIF4E bound to the 5′ cap interacts with Maskin bound to
CPEB on the 3′ UTR, creating translationally inactive
transcripts. This translational inhibition is lifted once CPEB is
phosphorylated, displacing the Maskin binding site, allowing for the
polymerization
In polymer chemistry, polymerization (American English), or polymerisation (British English), is a process of reacting monomer molecules together in a chemical reaction to form polymer chains or three-dimensional networks. There are many fo ...
of the PolyA tail, which can recruit the translational machinery by means of
PABP. However, it is important to note that this mechanism has been under great scrutiny.
Ferritin regulation
Iron levels in cells are maintained by translation regulation of many proteins involved in iron storage and metabolism. The 5′ UTR has the ability to form a hairpin loop secondary structure (known as the
iron response element or IRE) that is recognized by iron-regulatory proteins (IRP1 and IRP2). In low levels of iron, the ORF of the target mRNA is blocked as a result of
steric hindrance from the binding of IRP1 and IRP2 to the IRE. When iron is high, then the two iron-regulatory proteins do not bind as strongly and allow proteins to be expressed that have a role in iron concentration control. This function has gained some interest after it was revealed that the translation of
amyloid precursor protein may be disrupted due to a single-nucleotide polymorphism to the IRE found in the 5′ UTR of its
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
, leading to a spontaneous increased risk of
Alzheimer's disease.
uORFs and reinitiation
Another form of translational regulation in eukaryotes comes from unique elements on the 5′ UTR called upstream open reading frames (uORF). These elements are fairly common, occurring in 35–49% of all human genes. A uORF is a coding sequence located in the 5′ UTR located upstream of the coding sequences initiation site. These uORFs contain their own initiation codon, known as an upstream AUG (uAUG). This
codon
The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
can be scanned for by ribosomes and then translated to create a product, which can regulate the translation of the main protein coding sequence or other uORFs that may exist on the same transcript.
The translation of the protein within the main ORF after a uORF sequence has been translated is known as reinitiation.
The process of reinitiation is known to reduce the translation of the ORF protein. Control of protein regulation is determined by the distance between the uORF and the first codon in the main ORF.
A uORF has been found to increase reinitiation with the longer distance between its uAUG and the start codon of the main ORF, which indicates that the ribosome needs to reacquire translation factors before it can carry out translation of the main protein.
For example, ''
ATF4'' regulation is performed by two uORFs further upstream, named uORF1 and uORF2, which contain three amino acids and fifty-nine amino acids, respectively. The location of uORF2 overlaps with the ''ATF4'' ORF. During normal conditions, the uORF1 is translated, and then translation of uORF2 occurs only after
eIF2-TC has been reacquired. Translation of the uORF2 requires that the ribosomes pass by the ''ATF4'' ORF, whose start codon is located within uORF2. This leads to its repression. However, during stress conditions, the
40S
The eukaryotic small ribosomal subunit (40S) is the smaller subunit of the eukaryotic 80S ribosomes, with the other major component being the large ribosomal subunit (60S). The "40S" and "60S" names originate from the convention that ribosomal pa ...
ribosome will bypass uORF2 because of a decrease in concentration of eIF2-TC, which means the ribosome does not acquire one in time to translate uORF2. Instead, ''ATF4'' is translated.
= Other mechanisms
=
In addition to reinitiation, uORFs contribute to translation initiation based on:
* The nucleotides of an uORF may code for a codon that leads to a highly structured mRNA, causing the ribosome to stall.
* cis- and trans- regulation on translation of the main protein coding sequence.
* Interactions with
IRES sites.
Internal ribosome entry sites and viruses
Viral
Viral means "relating to viruses" (small infectious agents).
Viral may also refer to:
Viral behavior, or virality
Memetic behavior likened that of a virus, for example:
* Viral marketing, the use of existing social networks to spread a marke ...
(as well as some eukaryotic) 5′ UTRs contain
internal ribosome entry site An internal ribosome entry site, abbreviated IRES, is an RNA element that allows for translation initiation in a cap-independent manner, as part of the greater process of protein synthesis. In eukaryotic translation, initiation typically occurs at ...
s, which is a cap-independent method of translational activation. Instead of building up a complex at the 5′ cap, the IRES allows for direct binding of the ribosomal complexes to the transcript to begin translation.
The IRES enables the viral transcript to translate more efficiently due to the lack of needing a preinitation complex, allowing the virus to replicate quickly.
Role in transcriptional regulation
''msl-2'' transcript
Transcription of the ''
msl-2'' transcript is regulated by multiple binding sites for fly ''
Sxl'' at the 5′ UTR.
In particular, these poly-
uracil
Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced ...
sites are located close to a small intron that is spliced in males, but kept in females through splicing inhibition. This splicing inhibition is maintained by ''Sxl''.
When present, ''Sxl'' will repress the translation of ''msl2'' by increasing translation of a start codon located in a uORF in the 5′ UTR (
see above for more information on uORFs). Also, ''Sxl'' outcompetes TIA-1 to a poly(U) region and prevents snRNP (a step in
alternative splicing) recruitment to the 5′ splice site.
See also
*
Three prime untranslated region
*
UORF
*
Iron-responsive element-binding protein
*
Iron response element
*
Trans-splicing
*
UTRdb
References
{{Reflist
RNA
Gene expression