ORF1ab (also ORF1a/b) refers collectively to two
open reading frame
In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
s (ORFs), ORF1a and ORF1b, that are conserved in the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
s of
nidovirus
''Nidovirales'' is an order of enveloped, positive-strand RNA viruses which infect vertebrates and invertebrates. Host organisms include mammals, birds, reptiles, amphibians, fish, arthropods, molluscs, and helminths. The order includes the f ...
es, a group of viruses that includes
coronavirus
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the com ...
es. The
gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s express large
polyprotein
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
s that undergo
proteolysis
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
to form several
nonstructural proteins with various functions in the
viral life cycle
Viruses are only able to Replicate (biology), replicate themselves by commandeering the reproductive apparatus of cells and making them reproduce the virus's genetic structure and virion, particles instead. How viruses do this depends mainly on t ...
, including
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
s and the components of the
replicase-transcriptase complex (RTC).
Together the two ORFs are sometimes referred to as the replicase gene.
They are related by a
programmed ribosomal frameshift that allows the
ribosome
Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
to continue
translating
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transl ...
past the
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
at the end of ORF1a, in a -1
reading frame
In molecular biology, a reading frame is a way of dividing the nucleic acid sequence, sequence of nucleotides in a nucleic acid (DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or ...
. The resulting polyproteins are known as pp1a and pp1ab.
Expression
ORF1a is the first
open reading frame
In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
at the
5' end
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-ri ...
of the genome. Together ORF1ab occupies about two thirds of the genome, with the remaining third at the
3' end encoding the
structural proteins
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respond ...
and
accessory proteins.
It is translated from a
5' capped RNA by
cap-dependent translation
Eukaryotic translation is the biological process by which messenger RNA is translated into proteins in eukaryotes. It consists of four phases: gene translation, elongation, termination, and recapping.
Initiation
Translation initiation is the p ...
.
Nidoviruses have a complex system of discontinuous
subgenomic RNA production to enable expression of genes in their relatively large RNA genomes (typically 27-32
kb for coronaviruses
), but ORF1ab is translated directly from the genomic RNA.
ORF1ab sequences have been observed in noncanonical subgenomic RNAs, though their functional significance is unclear.
A
programmed ribosomal frameshift allows reading through the
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
that terminates ORF1a to continue in a -1
reading frame
In molecular biology, a reading frame is a way of dividing the nucleic acid sequence, sequence of nucleotides in a nucleic acid (DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or ...
, producing the longer polyprotein pp1ab. The frameshift occurs at a
slippery sequence
A slippery sequence is a small section of codon nucleotide sequences (usually UUUAAAC) that controls the rate and chance of ribosomal frameshifting. A slippery sequence causes a faster ribosomal transfer which in turn can cause the reading ribosom ...
which is followed by a
pseudoknot
__NOTOC__
A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the turnip yellow ...
RNA secondary structure.
This has been measured at between 20-50% efficiency for
murine coronavirus
Murine coronavirus (M-CoV) is a virus in the genus ''Betacoronavirus'' that infects mice. Belonging to the subgenus ''Embecovirus'', murine coronavirus strains are enterotropic or polytropic. Enterotropic strains include mouse hepatitis virus (M ...
,
or 45-70% in
SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a ...
yielding a
stoichiometry
Stoichiometry refers to the relationship between the quantities of reactants and products before, during, and following chemical reactions.
Stoichiometry is founded on the law of conservation of mass where the total mass of the reactants equal ...
of roughly 1.5 to 2 times as much pp1a as pp1ab protein expressed.
Processing
The
polyprotein
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
s pp1a and pp1ab contain about 13 to 17
nonstructural proteins.
They undergo auto-
proteolysis
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
to release the nonstructural proteins due to the actions of internal
cysteine protease
Cysteine proteases, also known as thiol proteases, are hydrolase enzymes that degrade proteins. These proteases share a common catalytic mechanism that involves a nucleophilic cysteine thiol in a catalytic triad or dyad.
Discovered by Gopal Chund ...
domains.
In coronaviruses, there are a total of 16 nonstructural proteins; pp1a protein contains
nonstructural proteins nsp1-11 and the pp1ab protein contains nsp1-10 and nsp12-16. Proteolytic processing is performed by two proteases: the
papain-like protease
Papain-like proteases (or papain-like (cysteine) peptidases; abbreviated PLP or PLCP) are a large protein family of cysteine protease enzymes that share structural and enzymatic properties with the group's namesake member, papain. They are found i ...
protein domain
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of s ...
located in the multidomain protein nsp3 cleaves up to nsp4, and the
3CL protease (also known as the main protease, nsp5) performs the remaining cleavages of nsp5 through the polyprotein
C-terminus
The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When the protein is ...
.
Proteins nsp12-16, the C-terminal components of the pp1ab polyprotein, contain the core
enzymatic
Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. A ...
activities necessary for
viral replication
Viral replication is the formation of biological viruses during the infection process in the target host cells. Viruses must first get into the cell before viral replication can occur. Through the generation of abundant copies of its genome an ...
.
After proteolytic processing, several of the nonstructural proteins assemble into a large
protein complex
A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multienzyme complexes, in which multiple catalytic domains are found in a single polypeptide chain.
Protein c ...
known as the
replicase-transcriptase complex (RTC) which performs genome replication and
transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, the fir ...
.
Components
Core replicase domains
A set of five
conserved "core replicase"
protein domain
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of s ...
s are present in all nidovirus lineages (
arteriviruses,
mesoniviruses,
roniviruses, and
coronavirus
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the com ...
es): from ORF1a, the
main protease flanked on either end by
transmembrane domain
A transmembrane domain (TMD) is a membrane-spanning protein domain. TMDs generally adopt an alpha helix topological conformation, although some TMDs such as those in porins can adopt a different conformation. Because the interior of the lipid bil ...
s; and from ORF1b, a
nucleotidyltransferase
Nucleotidyltransferases are transferase enzymes of phosphorus-containing groups, e.g., substituents of nucleotidylic acids or simply nucleoside monophosphates. The general reaction of transferring a nucleoside monophosphate moiety from A to B, can ...
domain known as
NiRAN
Na'aran ( he, נערן), formerly known as Niran () is an Israeli settlement organized as a kibbutz in the West Bank. Located in Area C of the Jordan Valley near Jericho, ,
RNA-dependent RNA polymerase
RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
(RdRp), a
zinc
Zinc is a chemical element with the symbol Zn and atomic number 30. Zinc is a slightly brittle metal at room temperature and has a shiny-greyish appearance when oxidation is removed. It is the first element in group 12 (IIB) of the periodi ...
-binding domain, and a
helicase
Helicases are a class of enzymes thought to be vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separatin ...
.
(This is sometimes considered seven domains, counting the transmembrane regions separately.) In addition, an endoribonuclease
An endoribonuclease is a ribonuclease endonuclease. It cleaves either single-stranded or double-stranded RNA, depending on the enzyme. Example includes both single proteins such as RNase III, RNase A, RNase T1, RNase T2 and RNase H and also complex ...
domain is found in all nidoviruses that infect vertebrate
Vertebrates () comprise all animal taxa within the subphylum Vertebrata () ( chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, ...
hosts. Arteriviruses, which have smaller genomes than the other nidovirus lineages, also lack methyltransferase
Methyltransferases are a large group of enzymes that all methylate their substrates but can be split into several subclasses based on their structural features. The most common class of methyltransferases is class I, all of which contain a Rossm ...
s as well as a proofreading exoribonuclease
An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3 ...
, a domain that is conserved in nidoviruses with larger genomes. This proofreading functionality is thought to be required for sufficient fidelity to replicate large RNA genomes, but may also play additional roles in some viruses.
Coronaviruses
In coronaviruses, pp1a and pp1ab together contain sixteen nonstructural proteins, which have the following functions:
Evolution
The structure and organization of the genome, including ORF1a, ORF1b, and the frameshift
Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can ...
separating them, is conserved among nidoviruses. Some "non-canonical" nidovirus structures have been described, mainly involving gene fusion A fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplas ...
s. The largest known nidovirus, planarian secretory cell nidovirus (PSCNV), with a 41kb genome, has a non-canonical genome structure in which ORF1a, ORF1b, and downstream ORFs containing structural proteins are fused and expressed as a single large ORF encoding a polyprotein of over 13,000 amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s. In these non-canonical genomes, other frameshift locations or stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
readthrough may be used to regulate the stoichiometry
Stoichiometry refers to the relationship between the quantities of reactants and products before, during, and following chemical reactions.
Stoichiometry is founded on the law of conservation of mass where the total mass of the reactants equal ...
of viral proteins.
Nidoviruses vary widely in genome size, from arteriviruses with typically 12-15kb genomes to coronavirus
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the com ...
es at 27-32kb. Their evolutionary history has been of research interest in understanding the replication of very large RNA genomes despite the relatively low-fidelity replication mechanism of the viral RNA-dependent RNA polymerase
RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
(RdRp). The larger nidovirus genomes (above around 20kb) encode a proofreading exoribonuclease
An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3 ...
( nsp14 in coronaviruses) thought to be required for replication fidelity.
Among coronavirus
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the com ...
es, ORF1ab is more highly conserved than the 3' ORFs encoding structural protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respond ...
s. Throughout the COVID-19 pandemic
The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identif ...
, the genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
of SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a ...
viruses has been sequenced
In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which suc ...
many times, resulting in identification of thousands of distinct variants
Variant may refer to:
In arts and entertainment
* ''Variant'' (magazine), a former British cultural magazine
* Variant cover, an issue of comic books with varying cover art
* ''Variant'' (novel), a novel by Robison Wells
* " The Variant", 2021 e ...
. In a World Health Organization
The World Health Organization (WHO) is a specialized agency of the United Nations responsible for international public health. The WHO Constitution states its main objective as "the attainment by all peoples of the highest possible level of h ...
analysis from July 2020, ORF1ab was the most frequently mutated
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitos ...
gene, followed by the S gene encoding the spike protein
In virology, a spike protein or peplomer protein is a protein that forms a large structure known as a spike or peplomer projecting from the surface of an enveloped virus. as cited in The proteins are usually glycoproteins that form dimers or ...
. The most commonly mutated protein within ORF1ab was papain-like protease
Papain-like proteases (or papain-like (cysteine) peptidases; abbreviated PLP or PLCP) are a large protein family of cysteine protease enzymes that share structural and enzymatic properties with the group's namesake member, papain. They are found i ...
(nsp3), and the single most commonly observed missense mutation
In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution.
Substitution of protein from DNA mutations
Missense m ...
was in RNA-dependent RNA polymerase
RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
. Some PCR tests that detect COVID-19 analyze the specimen for the ORF1ab gene, among others.
References
{{Viral proteins
Coronavirus proteins