ORF1a
   HOME

TheInfoList



OR:

ORF1ab (also ORF1a/b) refers collectively to two
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a Prokaryote, prokaryotic DNA sequence, where only one of the #Six-fra ...
s (ORFs), ORF1a and ORF1b, that are conserved in the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
s of nidoviruses, a group of viruses that includes coronaviruses. The
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s express large
polyprotein Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called proteas ...
s that undergo
proteolysis Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
to form several nonstructural proteins with various functions in the
viral life cycle Viruses are only able to replicate themselves by commandeering the reproductive apparatus of cells and making them reproduce the virus's genetic structure and particles instead. How viruses do this depends mainly on the type of nucleic acid DNA ...
, including
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the form ...
s and the components of the replicase-transcriptase complex (RTC). Together the two ORFs are sometimes referred to as the replicase gene. They are related by a
programmed ribosomal frameshift Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can ...
that allows the ribosome to continue
translating Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
past the
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
at the end of ORF1a, in a -1
reading frame In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid ( DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during ...
. The resulting polyproteins are known as pp1a and pp1ab.


Expression

ORF1a is the first
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a Prokaryote, prokaryotic DNA sequence, where only one of the #Six-fra ...
at the
5' end Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar- ...
of the genome. Together ORF1ab occupies about two thirds of the genome, with the remaining third at the
3' end Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar- ...
encoding the structural proteins and accessory proteins. It is translated from a 5' capped RNA by cap-dependent translation. Nidoviruses have a complex system of discontinuous subgenomic RNA production to enable expression of genes in their relatively large RNA genomes (typically 27-32 kb for coronaviruses), but ORF1ab is translated directly from the genomic RNA. ORF1ab sequences have been observed in noncanonical subgenomic RNAs, though their functional significance is unclear. A
programmed ribosomal frameshift Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can ...
allows reading through the
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
that terminates ORF1a to continue in a -1
reading frame In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid ( DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during ...
, producing the longer polyprotein pp1ab. The frameshift occurs at a slippery sequence which is followed by a pseudoknot
RNA secondary structure Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biolo ...
. This has been measured at between 20-50% efficiency for
murine coronavirus Murine coronavirus (M-CoV) is a virus in the genus '' Betacoronavirus'' that infects mice. Belonging to the subgenus '' Embecovirus'', murine coronavirus strains are enterotropic or polytropic. Enterotropic strains include mouse hepatitis virus ...
, or 45-70% in
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a No ...
yielding a
stoichiometry Stoichiometry refers to the relationship between the quantities of reactants and products before, during, and following chemical reactions. Stoichiometry is founded on the law of conservation of mass where the total mass of the reactants equ ...
of roughly 1.5 to 2 times as much pp1a as pp1ab protein expressed.


Processing

The
polyprotein Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called proteas ...
s pp1a and pp1ab contain about 13 to 17 nonstructural proteins. They undergo auto-
proteolysis Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called protease ...
to release the nonstructural proteins due to the actions of internal cysteine protease domains. In coronaviruses, there are a total of 16 nonstructural proteins; pp1a protein contains nonstructural proteins nsp1-11 and the pp1ab protein contains nsp1-10 and nsp12-16. Proteolytic processing is performed by two proteases: the papain-like protease
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist o ...
located in the multidomain protein nsp3 cleaves up to nsp4, and the
3CL protease The 3C-like protease (3CLpro) or main protease (Mpro), formally known as C30 endopeptidase or 3-chymotrypsin-like protease, is the main protease found in coronaviruses. It cleaves the coronavirus polyprotein at eleven conserved sites. It is a ...
(also known as the main protease, nsp5) performs the remaining cleavages of nsp5 through the polyprotein
C-terminus The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When the protein i ...
. Proteins nsp12-16, the C-terminal components of the pp1ab polyprotein, contain the core enzymatic activities necessary for
viral replication Viral replication is the formation of biological viruses during the infection process in the target host cells. Viruses must first get into the cell before viral replication can occur. Through the generation of abundant copies of its genome a ...
. After proteolytic processing, several of the nonstructural proteins assemble into a large
protein complex A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multienzyme complexes, in which multiple catalytic domains are found in a single polypeptide chain. Protein ...
known as the replicase-transcriptase complex (RTC) which performs genome replication and transcription.


Components


Core replicase domains

A set of five conserved "core replicase"
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist o ...
s are present in all nidovirus lineages ( arteriviruses, mesoniviruses,
ronivirus ''Okavirus'' is a genus of enveloped positive-strand RNA viruses which infect crustaceans. Host organisms are mostly shrimp. It is the only genus in the family ''Roniviridae''. Viruses associated with the genus include: gill-associated virus (G ...
es, and coronaviruses): from ORF1a, the
main protease The 3C-like protease (3CLpro) or main protease (Mpro), formally known as C30 endopeptidase or 3-chymotrypsin-like protease, is the main protease found in coronaviruses. It cleaves the coronavirus polyprotein at eleven conserved sites. It is a c ...
flanked on either end by
transmembrane domain A transmembrane domain (TMD) is a membrane-spanning protein domain. TMDs generally adopt an alpha helix topological conformation, although some TMDs such as those in porins can adopt a different conformation. Because the interior of the lipid b ...
s; and from ORF1b, a nucleotidyltransferase domain known as NiRAN,
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
(RdRp), a
zinc Zinc is a chemical element with the symbol Zn and atomic number 30. Zinc is a slightly brittle metal at room temperature and has a shiny-greyish appearance when oxidation is removed. It is the first element in group 12 (IIB) of the periodic t ...
-binding domain, and a
helicase Helicases are a class of enzymes thought to be vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separat ...
. (This is sometimes considered seven domains, counting the transmembrane regions separately.) In addition, an endoribonuclease domain is found in all nidoviruses that infect
vertebrate Vertebrates () comprise all animal taxon, taxa within the subphylum Vertebrata () (chordates with vertebral column, backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the ...
hosts. Arteriviruses, which have smaller genomes than the other nidovirus lineages, also lack methyltransferases as well as a proofreading
exoribonuclease An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3' ex ...
, a domain that is conserved in nidoviruses with larger genomes. This proofreading functionality is thought to be required for sufficient fidelity to replicate large RNA genomes, but may also play additional roles in some viruses.


Coronaviruses

In coronaviruses, pp1a and pp1ab together contain sixteen nonstructural proteins, which have the following functions:


Evolution

The structure and organization of the genome, including ORF1a, ORF1b, and the
frameshift Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process c ...
separating them, is conserved among nidoviruses. Some "non-canonical" nidovirus structures have been described, mainly involving gene fusions. The largest known nidovirus,
planarian secretory cell nidovirus Planarian secretory cell nidovirus (PSCNV) is a virus of the species ''Planidovirus 1'', a nidovirus notable for its extremely large genome. At 41.1 kilobases, it is the largest known genome of an RNA virus. It was discovered by inspecting the tr ...
(PSCNV), with a 41kb genome, has a non-canonical genome structure in which ORF1a, ORF1b, and downstream ORFs containing structural proteins are fused and expressed as a single large ORF encoding a polyprotein of over 13,000
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
s. In these non-canonical genomes, other frameshift locations or
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
readthrough may be used to regulate the
stoichiometry Stoichiometry refers to the relationship between the quantities of reactants and products before, during, and following chemical reactions. Stoichiometry is founded on the law of conservation of mass where the total mass of the reactants equ ...
of viral proteins. Nidoviruses vary widely in genome size, from arteriviruses with typically 12-15kb genomes to coronaviruses at 27-32kb. Their evolutionary history has been of research interest in understanding the replication of very large RNA genomes despite the relatively low-fidelity replication mechanism of the viral
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
(RdRp). The larger nidovirus genomes (above around 20kb) encode a proofreading
exoribonuclease An exoribonuclease is an exonuclease ribonuclease, which are enzymes that degrade RNA by removing terminal nucleotides from either the 5' end or the 3' end of the RNA molecule. Enzymes that remove nucleotides from the 5' end are called ''5'-3' ex ...
( nsp14 in coronaviruses) thought to be required for replication fidelity. Among coronaviruses, ORF1ab is more highly conserved than the 3' ORFs encoding structural proteins. Throughout the
COVID-19 pandemic The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified ...
, the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
of
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a No ...
viruses has been sequenced many times, resulting in identification of thousands of distinct
variants Variant may refer to: In arts and entertainment * ''Variant'' (magazine), a former British cultural magazine * Variant cover, an issue of comic books with varying cover art * ''Variant'' (novel), a novel by Robison Wells * "The Variant", 2021 e ...
. In a
World Health Organization The World Health Organization (WHO) is a specialized agency of the United Nations responsible for international public health. The WHO Constitution states its main objective as "the attainment by all peoples of the highest possible level o ...
analysis from July 2020, ORF1ab was the most frequently
mutated In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitos ...
gene, followed by the S gene encoding the spike protein. The most commonly mutated protein within ORF1ab was papain-like protease (nsp3), and the single most commonly observed missense mutation was in
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
. Some
PCR PCR or pcr may refer to: Science * Phosphocreatine, a phosphorylated creatine molecule * Principal component regression, a statistical technique Medicine * Polymerase chain reaction ** COVID-19 testing, often performed using the polymerase chain r ...
tests that detect COVID-19 analyze the specimen for the ORF1ab gene, among others.


References

{{Viral proteins Coronavirus proteins