Coronavirus Nucleocapsid Protein
   HOME

TheInfoList



OR:

The nucleocapsid (N) protein is a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
that packages the
positive-sense RNA In molecular biology and genetics, the sense of a nucleic acid molecule, particularly of a strand of DNA or RNA, refers to the nature of the roles of the strand and its complement in specifying a sequence of amino acids. Depending on the contex ...
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
of
coronavirus Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans and birds, they cause respiratory tract infections that can range from mild to lethal. Mild illnesses in humans include some cases of the com ...
es to form
ribonucleoprotein Nucleoproteins are proteins conjugated with nucleic acids (either DNA or RNA). Typical nucleoproteins include ribosomes, nucleosomes and viral nucleocapsid proteins. Structures Nucleoproteins tend to be positively charged, facilitating in ...
structures enclosed within the viral
capsid A capsid is the protein shell of a virus, enclosing its genetic material. It consists of several oligomeric (repeating) structural subunits made of protein called protomers. The observable 3-dimensional morphological subunits, which may or may ...
. The N protein is the most highly expressed of the four major coronavirus
structural protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respond ...
s. In addition to its interactions with
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, N forms protein-protein interactions with the
coronavirus membrane protein The membrane (M) protein (previously called E1, sometimes also matrix protein) is an integral membrane protein that is the most abundant of the four major structural proteins found in coronaviruses. The M protein organizes the assembly of corona ...
(M) during the process of viral assembly. N also has additional functions in manipulating the
cell cycle The cell cycle, or cell-division cycle, is the series of events that take place in a cell that cause it to divide into two daughter cells. These events include the duplication of its DNA (DNA replication) and some of its organelles, and subs ...
of the host cell. The N protein is highly
immunogenic Immunogenicity is the ability of a foreign substance, such as an antigen, to provoke an immune response in the body of a human or other animal. It may be wanted or unwanted: * Wanted immunogenicity typically relates to vaccines, where the injectio ...
and
antibodies An antibody (Ab), also known as an immunoglobulin (Ig), is a large, Y-shaped protein used by the immune system to identify and neutralize foreign objects such as pathogenic bacteria and viruses. The antibody recognizes a unique molecule of the ...
to N are found in patients recovered from
SARS Severe acute respiratory syndrome (SARS) is a viral respiratory disease of zoonotic origin caused by the severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), the first identified strain of the SARS coronavirus species, ''sever ...
and
Covid-19 Coronavirus disease 2019 (COVID-19) is a contagious disease caused by a virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was COVID-19 pandemic in Hubei, identified in Wuhan, China, in December ...
.


History

The coronavirus from
Wuhan, China Wuhan (, ; ; ) is the capital of Hubei Province in the People's Republic of China. It is the largest city in Hubei and the most populous city in Central China, with a population of over eleven million, the ninth-most populous Chinese city a ...
was first identified in January 2020. A patient in the state of Washington was given a diagnosis of coronavirus infection on 20 January. A group of scientists based at the
Centers for Disease Control and Prevention The Centers for Disease Control and Prevention (CDC) is the national public health agency of the United States. It is a United States federal agency, under the Department of Health and Human Services, and is headquartered in Atlanta, Georgi ...
in
Atlanta, Georgia Atlanta ( ) is the capital and most populous city of the U.S. state of Georgia. It is the seat of Fulton County, the most populous county in Georgia, but its territory falls in both Fulton and DeKalb counties. With a population of 498,715 ...
isolated the virus from
nasopharyngeal The pharynx (plural: pharynges) is the part of the throat behind the mouth and nasal cavity, and above the oesophagus and trachea (the tubes going down to the stomach and the lungs). It is found in vertebrates and invertebrates, though its struct ...
and
oropharyngeal The pharynx (plural: pharynges) is the part of the throat behind the mouth and nasal cavity, and above the oesophagus and trachea (the tubes going down to the stomach and the lungs). It is found in vertebrates and invertebrates, though its struct ...
swabs and were able to characterize the
genomic sequence In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding gen ...
, replication properties and
cell culture Cell culture or tissue culture is the process by which cells are grown under controlled conditions, generally outside of their natural environment. The term "tissue culture" was coined by American pathologist Montrose Thomas Burrows. This te ...
tropism A tropism is a biological phenomenon, indicating growth or turning movement of a biological organism, usually a plant, in response to an environmental stimulus. In tropisms, this response is dependent on the direction of the stimulus (as oppose ...
from the swabs. They made available the virus to the wider scientific community shortly thereafter "by depositing it into two virus reagent repositories".


Structure

The N protein is composed of two main
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of s ...
s connected by an
intrinsically disordered region In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs ran ...
(IDR) known as the linker region, with additional disordered segments at each terminus. A third small domain at the C-terminal tail appears to have an ordered
alpha helical The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues ear ...
secondary structure Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
and may be involved in the formation of higher-order
oligomer In chemistry and biochemistry, an oligomer () is a molecule that consists of a few repeating units which could be derived, actually or conceptually, from smaller molecules, monomers.Quote: ''Oligomer molecule: A molecule of intermediate relativ ...
ic assemblies. In
SARS-CoV Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1; or Severe acute respiratory syndrome coronavirus, SARS-CoV) is a strain of coronavirus that causes severe acute respiratory syndrome (SARS), the respiratory illness responsible for ...
, the causative agent of
SARS Severe acute respiratory syndrome (SARS) is a viral respiratory disease of zoonotic origin caused by the severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), the first identified strain of the SARS coronavirus species, ''sever ...
, the N protein is 422 amino acid residues long and in
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a ...
, the causative agent of
Covid-19 Coronavirus disease 2019 (COVID-19) is a contagious disease caused by a virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was COVID-19 pandemic in Hubei, identified in Wuhan, China, in December ...
, it is 419 residues long. Both the
N-terminal The N-terminus (also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide, referring to the free amine group (-NH2) located at the end of a polypeptide. Within a peptide, the ami ...
and
C-terminal The C-terminus (also known as the carboxyl-terminus, carboxy-terminus, C-terminal tail, C-terminal end, or COOH-terminus) is the end of an amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH). When the protein is ...
domains are capable of binding
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
. The C-terminal domain forms a
dimer Dimer may refer to: * Dimer (chemistry), a chemical structure formed from two similar sub-units ** Protein dimer, a protein quaternary structure ** d-dimer * Dimer model, an item in statistical mechanics, based on ''domino tiling'' * Julius Dimer ...
that is likely to be the native functional state. Parts of the IDR, particularly a conserved
sequence motif In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as ''As ...
rich in
serine Serine (symbol Ser or S) is an α-amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated − form under biological conditions), a carboxyl group (which is in the deprotonated − form un ...
and
arginine Arginine is the amino acid with the formula (H2N)(HN)CN(H)(CH2)3CH(NH2)CO2H. The molecule features a guanidino group appended to a standard amino acid framework. At physiological pH, the carboxylic acid is deprotonated (−CO2−) and both the am ...
residues (the SR-rich region), may also be implicated in dimer formation, though reports on this vary. Although higher-order
oligomer In chemistry and biochemistry, an oligomer () is a molecule that consists of a few repeating units which could be derived, actually or conceptually, from smaller molecules, monomers.Quote: ''Oligomer molecule: A molecule of intermediate relativ ...
s formed through the C-terminal domain have been observed crystallographically, it is unclear if these structures have a physiological role. The C-terminal dimer has been structurally characterized by
X-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
for several coronaviruses and has a highly conserved structure. The N-terminal domain - sometimes known as the RNA-binding domain, though other parts of the protein also interact with RNA - has also been crystallized and has been studied by
nuclear magnetic resonance spectroscopy Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic fiel ...
in the presence of RNA.


Post-translational modifications

The N protein is
post-translationally modified Post-translational modification (PTM) is the covalent and generally enzyme, enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by r ...
by
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
at sites located in the IDR, particularly in the SR-rich region. SARS-CoV-2 nucleocapsid (N) protein is arginine methylated by protein arginine methyltransferase 1 (PRMT1)at residues R95 and R177. Type I PRMT inhibitor (MS023) or substitution of R95 or R177 with lysine inhibited interaction of N protein with the 5’-UTR of SARS-CoV-2 genomic RNA, a property required for viral packaging , doi: 10.1016/j.jbc.2021.100821 , PMID: 34029587. In several coronaviruses,
ADP-ribosylation ADP-ribosylation is the addition of one or more ADP-ribose moieties to a protein. It is a reversible post-translational modification that is involved in many cellular processes, including cell signaling, DNA repair, gene regulation and apoptosis. ...
of the N protein has also been reported. With unclear functional significance, the SARS-CoV N protein has been observed to be SUMOylated and the N proteins of several coronaviruses including SARS-CoV-2 have been observed to be proteolytically cleaved.


Expression and localization

The N protein is the most highly expressed in host cells of the four major
structural protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respond ...
s. Like the other structural proteins, the
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
encoding the N protein is located toward the 3' end of the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
. N protein is localized primarily to the
cytoplasm In cell biology, the cytoplasm is all of the material within a eukaryotic cell, enclosed by the cell membrane, except for the cell nucleus. The material inside the nucleus and contained within the nuclear membrane is termed the nucleoplasm. The ...
. In many coronaviruses, a population of N protein is localized to the
nucleolus The nucleolus (, plural: nucleoli ) is the largest structure in the nucleus of eukaryotic cells. It is best known as the site of ribosome biogenesis, which is the synthesis of ribosomes. The nucleolus also participates in the formation of sig ...
, thought to be associated with its effects on the
cell cycle The cell cycle, or cell-division cycle, is the series of events that take place in a cell that cause it to divide into two daughter cells. These events include the duplication of its DNA (DNA replication) and some of its organelles, and subs ...
.


Function


Genome packaging and viral assembly

The N protein binds to
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
to form
ribonucleoprotein Nucleoproteins are proteins conjugated with nucleic acids (either DNA or RNA). Typical nucleoproteins include ribosomes, nucleosomes and viral nucleocapsid proteins. Structures Nucleoproteins tend to be positively charged, facilitating in ...
(RNP) structures for packaging the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
into the viral
capsid A capsid is the protein shell of a virus, enclosing its genetic material. It consists of several oligomeric (repeating) structural subunits made of protein called protomers. The observable 3-dimensional morphological subunits, which may or may ...
. The RNP particles formed are roughly spherical and are organized in flexible helical structures inside the virus. Formation of RNPs is thought to involve allosteric interactions between RNA and multiple RNA-binding regions of the protein.
Dimer Dimer may refer to: * Dimer (chemistry), a chemical structure formed from two similar sub-units ** Protein dimer, a protein quaternary structure ** d-dimer * Dimer model, an item in statistical mechanics, based on ''domino tiling'' * Julius Dimer ...
ization of N is important for assembly of RNPs. Encapsidation of the genome occurs through interactions between N and M. N is essential for viral assembly. N also serves as a
chaperone protein In molecular biology, molecular chaperones are proteins that assist the conformational folding or unfolding of large proteins or macromolecular protein complexes. There are a number of classes of molecular chaperones, all of which function to assi ...
for the formation of
RNA structure Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quatern ...
in the genomic RNA.


Genomic and subgenomic RNA synthesis

Synthesis of genomic RNA appears to involve participation by the N protein. N is physically colocalized with the viral
RNA-dependent RNA polymerase RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to t ...
early in the replication cycle and forms interactions with non-structural protein 3, a component of the replicase-transcriptase complex. Although N appears to facilitate efficient replication of genomic RNA, it is not required for RNA transcription in all coronaviruses. In at least one coronavirus,
transmissible gastroenteritis virus Transmissible gastroenteritis virus or Transmissible gastroenteritis coronavirus (TGEV) is a coronavirus which infects pigs. It is an Viral envelope, enveloped, Sense (molecular biology), positive-sense, RNA, single-stranded RNA virus which ent ...
(TGEV), N is involved in template switching in the production of
subgenomic mRNA Subgenomic mRNAs are essentially smaller sections of the original transcribed template strand. 3' to 5' DNA or RNA During transcription, the original template strand is usually read from the 3' to the 5' end from beginning to end. Subgenomic ...
s, a process that is a distinctive feature of viruses in the order ''
Nidovirales ''Nidovirales'' is an order of enveloped, positive-strand RNA viruses which infect vertebrates and invertebrates. Host organisms include mammals, birds, reptiles, amphibians, fish, arthropods, molluscs, and helminths. The order includes the fami ...
''.


Cell cycle effects

Coronaviruses manipulate the
cell cycle The cell cycle, or cell-division cycle, is the series of events that take place in a cell that cause it to divide into two daughter cells. These events include the duplication of its DNA (DNA replication) and some of its organelles, and subs ...
of the host cell through various mechanisms. In several coronaviruses, including
SARS-CoV Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1; or Severe acute respiratory syndrome coronavirus, SARS-CoV) is a strain of coronavirus that causes severe acute respiratory syndrome (SARS), the respiratory illness responsible for ...
, the N protein has been reported to cause cell cycle arrest in
S phase S phase (Synthesis Phase) is the phase of the cell cycle in which DNA is replicated, occurring between G1 phase and G2 phase. Since accurate duplication of the genome is critical to successful cell division, the processes that occur during ...
through interactions with
cyclin-CDK A cyclin-dependent kinase complex (CDKC, cyclin-CDK) is a protein complex formed by the association of an inactive catalytic subunit of a protein kinase, cyclin-dependent kinase (CDK), with a regulatory subunit, cyclin.Malumbres M, Barbacid M. ...
. In SARS-CoV, a cyclin box-binding region in the N protein can serve as a cyclin-CDK
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
substrate. Trafficking of N to the
nucleolus The nucleolus (, plural: nucleoli ) is the largest structure in the nucleus of eukaryotic cells. It is best known as the site of ribosome biogenesis, which is the synthesis of ribosomes. The nucleolus also participates in the formation of sig ...
may also play a role in cell cycle effects. More broadly, N may be involved in reduction of host cell
protein translation In molecular biology and genetics, translation is the process in which ribosomes in the cytoplasm or endoplasmic reticulum synthesize proteins after the process of transcription of DNA to RNA in the cell's nucleus. The entire process is ...
activity.


Immune system effects

The N protein is involved in
viral pathogenesis Viral pathogenesis is the study of the process and mechanisms by which viruses cause diseases in their target hosts, often at the cellular or molecular level. It is a specialized field of study in virology. Pathogenesis is a qualitative descriptio ...
via its effects on components of the
immune system The immune system is a network of biological processes that protects an organism from diseases. It detects and responds to a wide variety of pathogens, from viruses to parasitic worms, as well as cancer cells and objects such as wood splinte ...
. In
SARS-CoV Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1; or Severe acute respiratory syndrome coronavirus, SARS-CoV) is a strain of coronavirus that causes severe acute respiratory syndrome (SARS), the respiratory illness responsible for ...
,
MERS-CoV ''Middle East respiratory syndrome–related coronavirus'' (''MERS-CoV''), or EMC/2012 ( HCoV-EMC/2012), is the virus that causes Middle East respiratory syndrome (MERS). It is a species of coronavirus which infects humans, bats, and camels. Th ...
, and
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a ...
, N has been reported as suppressing
interferon Interferons (IFNs, ) are a group of signaling proteins made and released by host cells in response to the presence of several viruses. In a typical scenario, a virus-infected cell will release interferons causing nearby cells to heighten the ...
responses.


Evolution and conservation

The sequences and structures of N proteins from different coronaviruses, particularly the C-terminal domains, appear to be well conserved. Similarities between the structure and topology of the N proteins of coronaviruses and arteriviruses suggest a common evolutionary origin and supports the classification of these two groups in the common order ''
Nidovirales ''Nidovirales'' is an order of enveloped, positive-strand RNA viruses which infect vertebrates and invertebrates. Host organisms include mammals, birds, reptiles, amphibians, fish, arthropods, molluscs, and helminths. The order includes the fami ...
''. Examination of SARS-CoV-2 sequences collected during the
Covid-19 pandemic The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identif ...
found that
missense mutation In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution. Substitution of protein from DNA mutations Missense m ...
s were most common in the central linker region of the protein, suggesting this relatively unstructured region is more tolerant of mutations than the structured domains. A separate study of SARS-CoV-2 sequences identified at least one site in the N protein under
positive selection In population genetics, directional selection, is a mode of negative natural selection in which an extreme phenotype is favored over other phenotypes, causing the allele frequency to shift over time in the direction of that phenotype. Under dir ...
. The N protein’s properties of being well conserved, not appearing to recombine frequently, and producing a strong T-cell response have led to it being studied as a potential target for coronavirus vaccines. The vaccine candidate UB-612 is one such experimental vaccine that targets the N protein, along with other viral proteins, to attempt to induce broad immunity.


References

{{Viral proteins Coronavirus proteins Viral protein class Viral structural proteins