ORF1ab (also ORF1a/b) refers collectively to two
open reading frame
In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a Prokaryote, prokaryotic DNA sequence, where only one of the #Six-fra ...
s (ORFs), ORF1a and ORF1b, that are conserved in the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
s of
nidoviruses, a group of viruses that includes
coronaviruses. The
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s express large
polyprotein
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called proteas ...
s that undergo
proteolysis
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
to form several
nonstructural proteins with various functions in the
viral life cycle
Viruses are only able to replicate themselves by commandeering the reproductive apparatus of cells and making them reproduce the virus's genetic structure and particles instead. How viruses do this depends mainly on the type of nucleic acid DNA ...
, including
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the form ...
s and the components of the
replicase-transcriptase complex (RTC).
Together the two ORFs are sometimes referred to as the replicase gene.
They are related by a
programmed ribosomal frameshift
Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can ...
that allows the
ribosome to continue
translating
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
past the
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
at the end of ORF1a, in a -1
reading frame
In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid ( DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during ...
. The resulting polyproteins are known as pp1a and pp1ab.
[
]
Expression
ORF1a is the first open reading frame
In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a Prokaryote, prokaryotic DNA sequence, where only one of the #Six-fra ...
at the 5' end
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar- ...
of the genome. Together ORF1ab occupies about two thirds of the genome, with the remaining third at the 3' end
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar- ...
encoding the structural proteins and accessory proteins.[ It is translated from a 5' capped RNA by cap-dependent translation.][ Nidoviruses have a complex system of discontinuous subgenomic RNA production to enable expression of genes in their relatively large RNA genomes (typically 27-32 kb for coronaviruses][), but ORF1ab is translated directly from the genomic RNA.] ORF1ab sequences have been observed in noncanonical subgenomic RNAs, though their functional significance is unclear.[
A ]programmed ribosomal frameshift
Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can ...
allows reading through the stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
that terminates ORF1a to continue in a -1 reading frame
In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid ( DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during ...
, producing the longer polyprotein pp1ab. The frameshift occurs at a slippery sequence which is followed by a pseudoknot RNA secondary structure
Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule.
The secondary structures of biolo ...
.[ This has been measured at between 20-50% efficiency for ]murine coronavirus
Murine coronavirus (M-CoV) is a virus in the genus '' Betacoronavirus'' that infects mice. Belonging to the subgenus '' Embecovirus'', murine coronavirus strains are enterotropic or polytropic. Enterotropic strains include mouse hepatitis virus ...
, or 45-70% in SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a No ...
yielding a stoichiometry
Stoichiometry refers to the relationship between the quantities of reactants and products before, during, and following chemical reactions.
Stoichiometry is founded on the law of conservation of mass where the total mass of the reactants equ ...
of roughly 1.5 to 2 times as much pp1a as pp1ab protein expressed.[
]
Processing
The polyprotein
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called proteas ...
s pp1a and pp1ab contain about 13 to 17 nonstructural proteins.[ They undergo auto-]proteolysis
Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
to release the nonstructural proteins due to the actions of internal cysteine protease domains.[
In coronaviruses, there are a total of 16 nonstructural proteins; pp1a protein contains nonstructural proteins nsp1-11 and the pp1ab protein contains nsp1-10 and nsp12-16. Proteolytic processing is performed by two proteases: the papain-like protease ]protein domain
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist o ...
located in the multidomain protein nsp3 cleaves up to nsp4, and the 3CL protease
The 3C-like protease (3CLpro) or main protease (Mpro), formally known as C30 endopeptidase or 3-chymotrypsin-like protease, is the main protease found in coronaviruses. It cleaves the coronavirus polyprotein at eleven conserved sites. It is a ...
(also known as the main protease, nsp5) performs the remaining cleavages of nsp5 through the polyprotein C-terminus
The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When the protein i ...
.[ Proteins nsp12-16, the C-terminal components of the pp1ab polyprotein, contain the core enzymatic activities necessary for ]viral replication
Viral replication is the formation of biological viruses during the infection process in the target host cells. Viruses must first get into the cell before viral replication can occur. Through the generation of abundant copies of its genome a ...
.[ After proteolytic processing, several of the nonstructural proteins assemble into a large ]protein complex
A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multienzyme complexes, in which multiple catalytic domains are found in a single polypeptide chain.
Protein ...
known as the replicase-transcriptase complex (RTC) which performs genome replication and transcription.[
]
Components
Core replicase domains
A set of five conserved "core replicase" protein domain
In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist o ...
s are present in all nidovirus lineages ( arteriviruses, mesoniviruses, ronivirus
''Okavirus'' is a genus of enveloped positive-strand RNA viruses which infect crustaceans. Host organisms are mostly shrimp. It is the only genus in the family ''Roniviridae''. Viruses associated with the genus include: gill-associated virus (G ...
es, and coronaviruses): from ORF1a, the main protease
The 3C-like protease (3CLpro) or main protease (Mpro), formally known as C30 endopeptidase or 3-chymotrypsin-like protease, is the main protease found in coronaviruses. It cleaves the coronavirus polyprotein at eleven conserved sites. It is a c ...
flanked on either end by transmembrane domain
A transmembrane domain (TMD) is a membrane-spanning protein domain. TMDs generally adopt an alpha helix topological conformation, although some TMDs such as those in porins can adopt a different conformation. Because the interior of the lipid b ...
s; and from ORF1b, a nucleotidyltransferase domain known as NiRAN, RNA-dependent RNA polymerase
RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
(RdRp), a zinc
Zinc is a chemical element with the symbol Zn and atomic number 30. Zinc is a slightly brittle metal at room temperature and has a shiny-greyish appearance when oxidation is removed. It is the first element in group 12 (IIB) of the periodic t ...
-binding domain, and a helicase
Helicases are a class of enzymes thought to be vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separat ...
. (This is sometimes considered seven domains, counting the transmembrane regions separately.[) In addition, an endoribonuclease domain is found in all nidoviruses that infect ]vertebrate
Vertebrates () comprise all animal taxon, taxa within the subphylum Vertebrata () (chordates with vertebral column, backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the ...
hosts. Arteriviruses, which have smaller genomes than the other nidovirus lineages, also lack methyltransferases as well as a proofreading exoribonuclease
An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3' ex ...
, a domain that is conserved in nidoviruses with larger genomes.[ This proofreading functionality is thought to be required for sufficient fidelity to replicate large RNA genomes, but may also play additional roles in some viruses.][
]
Coronaviruses
In coronaviruses, pp1a and pp1ab together contain sixteen nonstructural proteins, which have the following functions:
Evolution
The structure and organization of the genome, including ORF1a, ORF1b, and the frameshift
Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process c ...
separating them, is conserved among nidoviruses. Some "non-canonical" nidovirus structures have been described, mainly involving gene fusions.[ The largest known nidovirus, ]planarian secretory cell nidovirus
Planarian secretory cell nidovirus (PSCNV) is a virus of the species ''Planidovirus 1'', a nidovirus notable for its extremely large genome. At 41.1 kilobases, it is the largest known genome of an RNA virus. It was discovered by inspecting the tr ...
(PSCNV), with a 41kb genome, has a non-canonical genome structure in which ORF1a, ORF1b, and downstream ORFs containing structural proteins are fused and expressed as a single large ORF encoding a polyprotein of over 13,000 amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s. In these non-canonical genomes, other frameshift locations or stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
readthrough may be used to regulate the stoichiometry
Stoichiometry refers to the relationship between the quantities of reactants and products before, during, and following chemical reactions.
Stoichiometry is founded on the law of conservation of mass where the total mass of the reactants equ ...
of viral proteins.[
Nidoviruses vary widely in genome size, from arteriviruses with typically 12-15kb genomes to coronaviruses at 27-32kb. Their evolutionary history has been of research interest in understanding the replication of very large RNA genomes despite the relatively low-fidelity replication mechanism of the viral ]RNA-dependent RNA polymerase
RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
(RdRp).[ The larger nidovirus genomes (above around 20kb][) encode a proofreading ]exoribonuclease
An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3' ex ...
( nsp14 in coronaviruses) thought to be required for replication fidelity.[
Among coronaviruses, ORF1ab is more highly conserved than the 3' ORFs encoding structural proteins.][ Throughout the ]COVID-19 pandemic
The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified ...
, the genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
of SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a No ...
viruses has been sequenced many times, resulting in identification of thousands of distinct variants
Variant may refer to:
In arts and entertainment
* ''Variant'' (magazine), a former British cultural magazine
* Variant cover, an issue of comic books with varying cover art
* ''Variant'' (novel), a novel by Robison Wells
* "The Variant", 2021 e ...
. In a World Health Organization
The World Health Organization (WHO) is a specialized agency of the United Nations responsible for international public health. The WHO Constitution states its main objective as "the attainment by all peoples of the highest possible level o ...
analysis from July 2020, ORF1ab was the most frequently mutated
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitos ...
gene, followed by the S gene encoding the spike protein. The most commonly mutated protein within ORF1ab was papain-like protease (nsp3), and the single most commonly observed missense mutation was in RNA-dependent RNA polymerase
RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
. Some PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
tests that detect COVID-19 analyze the specimen for the ORF1ab gene, among others.
References
{{Viral proteins
Coronavirus proteins