HOME

TheInfoList



OR:

Micropeptides (also referred to as microproteins) are
polypeptides Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A p ...
with a length of less than 100-150
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s that are encoded by short
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
s (sORFs). In this respect, they differ from many other active small polypeptides, which are produced through the posttranslational cleavage of larger polypeptides. In terms of size, micropeptides are considerably shorter than "canonical" proteins, which have an average length of 330 and 449 amino acids in prokaryotes and eukaryotes, respectively. Micropeptides are sometimes named according to their genomic location. For example, the translated product of an upstream open reading frame (uORF) might be called a uORF-encoded peptide (uPEP). Micropeptides lack an N-terminal signaling sequences, suggesting that they are likely to be localized to the
cytoplasm In cell biology, the cytoplasm is all of the material within a eukaryotic cell, enclosed by the cell membrane, except for the cell nucleus. The material inside the nucleus and contained within the nuclear membrane is termed the nucleoplasm. The ...
. However, some micropeptides have been found in other cell compartments, as indicated by the existence of
transmembrane A transmembrane protein (TP) is a type of integral membrane protein that spans the entirety of the cell membrane. Many transmembrane proteins function as gateways to permit the transport of specific substances across the membrane. They frequentl ...
micropeptides. They are found in both prokaryotes and eukaryotes. The sORFs from which micropeptides are translated can be encoded in 5' UTRs, small genes, or
polycistronic mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
s. Some micropeptide-coding genes were originally mis-annotated as
long non-coding RNA Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as mic ...
s (lncRNAs). Given their small size, sORFs were originally overlooked. However, hundreds of thousands of putative micropeptides have been identified through various techniques in a multitude of organisms. Only a small fraction of these with coding potential have had their expression and function confirmed. Those that have been functionally characterized, in general, have roles in
cell signaling In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...
,
organogenesis Organogenesis is the phase of embryonic development that starts at the end of gastrulation and continues until birth. During organogenesis, the three germ layers formed from gastrulation (the ectoderm, endoderm, and mesoderm) form the internal orga ...
, and
cellular physiology Cell physiology is the biological study of the activities that take place in a cell to keep it alive. The term ''physiology'' refers to normal functions in a living organism. Animal cells, plant cells and microorganism cells show similarities in th ...
. As more micropeptides are discovered so are more of their functions. One regulatory function is that of peptoswitches, which inhibit expression of downstream coding sequences by stalling
ribosome Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
s, through their direct or indirect activation by small molecules.


Identification

Various experimental techniques exist for identifying potential s
ORF ORF or Orf may refer to: * Norfolk International Airport, IATA airport code ORF * Observer Research Foundation, an Indian research institute * One Race Films, a film production company founded by Vin Diesel * Open reading frame, a portion of the ...
s and their translational products. These techniques are only useful for identification of sORF that may produce micropeptides and not for direct functional characterization.


RNA sequencing

One method for finding potential sORFs, and therefore micropeptides, is through RNA sequencing (
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
). RNA-Seq uses next-generation sequencing (NGS) to determine which RNAs are expressed in a given cell, tissue, or organism at a specific point in time. This collection of data, known as a
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
, can then be used as a resource for finding potential sORFs. Because of the strong likelihood of sORFs less than 100 aa occurring by chance, further study is necessary to determine the validity of data obtained using this method.


Ribosome profiling (Ribo-Seq)

Ribosome profiling Ribosome profiling, or Ribo-Seq (also named ribosome footprinting), is an adaptation of a technique developed by Joan Steitz and Marilyn Kozak almost 50 years ago that Nicholas Ingolia and Jonathan Weissman adapted to work with next generation seq ...
has been used to identify potential micropeptides in a growing number of organisms, including fruit flies, zebrafish, mice and humans. One method uses compounds such as harringtonine, puromycin or lactimidomycin to stop ribosomes at translation initiation sites. This indicates where active translation is taking place. Translation elongation inhibitors, such as emetine or cycloheximide, may also be used to obtain ribosome footprints which are more likely to result in a translated ORF. If a ribosome is bound at or near a sORF, it putatively encodes a micropeptide.


Mass spectrometry

Mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
(MS) is the gold standard for identifying and sequencing proteins. Using this technique, investigators are able to determine if polypeptides are, in fact, translated from a sORF.


Proteogenomic applications

Proteogenomics Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Proteogenomics is used to identify new peptides by comparing MS/MS sp ...
combines proteomics, genomics, and transciptomics. This is important when looking for potential micropeptides. One method of using proteogenomics entails using RNA-Seq data to create a custom database of all possible polypeptides. Liquid chromatography followed by tandem MS (LC-MS/MS) is performed to provide sequence information for translation products. Comparison of the transcriptomic and proteomics data can be used to confirm the presence of micropeptides.


Phylogenetic conservation

Phylogenetic conservation can be a useful tool, particularly when sifting through a large database of sORFs. The likelihood of a sORF resulting in a functional micropeptide is more likely if it is conserved across numerous species. However, this will not work for all sORFs. For example, those that are encoded by lncRNAs are less likely to be conserved given lncRNAs themselves do not have high sequence conservation. Further experimentation will be necessary to determine if a functional micropeptide is in fact produced.


Validating protein-coding potential


Antibodies

Custom
antibodies An antibody (Ab), also known as an immunoglobulin (Ig), is a large, Y-shaped protein used by the immune system to identify and neutralize foreign objects such as pathogenic bacteria and viruses. The antibody recognizes a unique molecule of the ...
targeted to the micropeptide of interest can be useful for quantifying expression or determining intracellular localization. As is the case with most proteins, low expression may make detection difficult. The small size of the micropeptide can also lead to difficulties in designing an epitope from which to target the antibody.


Tagging with CRISPR-Cas9

Genome editing Genome editing, or genome engineering, or gene editing, is a type of genetic engineering in which DNA is inserted, deleted, modified or replaced in the genome of a living organism. Unlike early genetic engineering techniques that randomly inserts ...
can be used to add FLAG/MYC or other small peptide tags to an endogenous sORF, thus creating fusion proteins. In most cases, this method is beneficial in that it can be performed more quickly than developing a custom antibody. It is also useful for micropeptides for which no epitope can be targeted.


In vitro translation

This process entails cloning the full-length micropeptide cDNA into a plasmid containing a T7 or SP6 promoter. This method utilizes a cell-free protein-synthesizing system in the presence of 35S-methionine to produce the peptide of interest. The products can then be analyzed by gel electrophoresis and the 35S-labeled peptide is visualized using autoradiography.


Databases and repositories

There are several repositories and databases that have been created for both sORFs and micropeptides. A repository for of small ORFs discovered by ribosome profiling can be found at sORFs.org. A repository of putative sORF-encoded peptides in ''Arabidopsis thaliana'' can be found at ARA-PEPs. A database of small proteins, especially encoded by non-coding RNAs can be found at SmProt.


Prokaryotic examples

To date, most micropeptides have been identified in
prokaryotic A prokaryote () is a Unicellular organism, single-celled organism that lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek language, Greek wikt:πρό#Ancient Greek, πρό (, 'before') a ...
organisms. While most have yet to be fully characterized, of those that have been studied, many appear to be critical to the survival of these organisms. Because of their small size, prokaryotes are particularly susceptible to changes in their environment, and as such have developed methods to ensure their existence.


''Escherichia coli'' ''(E. coli)''

Micropeptides expressed in ''E. coli'' exemplify bacterial environmental adaptations. Most of these have been classified into three groups: leader peptides, ribosomal proteins, and toxic proteins. Leader proteins regulate transcription and/or translation of proteins involved in amino acid metabolism when amino acids are scarce. Ribosomal proteins include L36 (''rpmJ'') and L34 (''rpmH''), two components of the 50S ribosomal subunit. Toxic proteins, such as ''ldrD'', are toxic at high levels and can kill cells or inhibit growth, which functions to reduce the host cell's viability.


''Salmonella enterica (S. enterica)''

In ''S. enterica,'' the MgtC virulence factor is involved in adaptation to low magnesium environments. The hydrophobic peptide MgrR, binds to MgtC, causing its degradation by the FtsH protease.


''Bacillus subtilis (B. subtilis)''

The 46 aa Sda micropeptide, expressed by ''B. subtilis'', represses sporulation when replication initiation is impaired. By inhibiting the histidine Kinase KinA, Sda prevents the activation of the transcription factor Spo0A, which is required for sporulation.


''Staphylococcus aureus (S. aureus)''

In ''S. aureus'', there are a group of micropeptides, 20-22 aa, that are excreted during host infection to disrupt neutrophil membranes, causing cell lysis. These micropeptides allow the bacterium to avoid degradation by the human immune systems' main defenses.


Eukaryotic examples

Micropeptides have been discovered in
eukaryotic Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
organisms from ''Arabidopsis thaliana'' to humans. They play diverse roles in tissue and organ development, as well as maintenance and function once fully developed. While many are yet to be functionally characterized, and likely more remain to be discovered, below is a summary of recently identified eukaryotic micropeptide functions.


''Arabidopsis thaliana'' (''A. thaliana'')

The ''POLARIS (PLS)'' gene encodes a 36 aa micropeptide. It is necessary for proper vascular leaf patterning and cell expansion in the root. This micropeptide interacts with developmental PIN proteins to form a critical network for hormonal crosstalk between auxin, ethylene, and cytokinin. ''ROTUNDIFOLIA (ROT4'') in ''A. thaliana'' encodes a 53 aa peptide, which localizes to the plasma membrane of leaf cells. The mechanism of ROT4 function is not well understood, but mutants have short rounded leaves, indicating that this peptide may be important in leaf morphogenesis.


''Zea mays'' (''Z. mays'')

Brick1 (Brk1) encodes a 76 aa micropeptide, which is highly conserved in both plants and animals. ''In Z. mays,'' it was found to be involved in morphogenesis of leaf epithelia, by promoting multiple actin-dependent cell polarization events in the developing leaf epidermis. Zm401p10 is an 89 aa micropeptide, which plays a role in normal pollen development in the tapetum. After mitosis it also is essential in the degradation of the tapetum. Zm908p11 is a micropeptide 97 aa in length, encoded by the ''Zm908'' gene that is expressed in mature pollen grains. It localizes to the cytoplasm of pollen tubes, where it aids in their growth and development.


''Drosophila melanogaster'' (''D. melanogaster'')

The evolutionarily conserved polished rice (''pri'') gene, known as ''tarsal-less (tal) in'' ''D.'' ''melanogaster'', is involved in epidermal differentiation. This polycistronic transcript encodes four similar peptides, which range between 11-32 aa in length. They function to truncate the transcription factor
Shavenbaby Overview of ''shavenbaby (svb)'' The ''shavenbaby'' (''svb'') or ''ovo'' gene encodes a transcription factor in ''Drosophila'' responsible for inducing cells to become hair-like projections called trichomes or microtrichia. Many of the major dev ...
(Svb). This converts Svb into an activator that directly regulates the expression of target effectors, including ''miniature (m)'' and ''shavenoid (sha)'', which are together responsible for trichome formation.


''Danio rerio'' (''D. rerio'')

The ''
Elabela ELABELA (ELA, Apela, Toddler) is a hormonal peptide that in humans is encoded by the ''APELA'' gene. Elabela is one of two endogenous ligands for the G-protein-coupled APLNR receptor. Ela is secreted by certain cell types including human embryo ...
'' gene (''Ela)'' (a.k.a. Apela, Toddler) is important for embryogenesis. It is specifically expressed during late blastula and gastrula stages. During
gastrulation Gastrulation is the stage in the early embryonic development of most animals, during which the blastula (a single-layered hollow sphere of cells), or in mammals the blastocyst is reorganized into a multilayered structure known as the gastrula. Be ...
, it is critical in promoting the internalization and animal-pole directed movement of mes
endoderm Endoderm is the innermost of the three primary germ layers in the very early embryo. The other two layers are the ectoderm (outside layer) and mesoderm (middle layer). Cells migrating inward along the archenteron form the inner layer of the gast ...
al cells. After gastrulation, Ela is expressed in the lateral mesoderm, endoderm, as well as the anterior, and posterior, notochord. Although it was annotated as a
lncRNA Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as mic ...
in zebrafish, mouse, and human, the 58-aa ORF was found to be highly conserved among vertebrate species. Ela is processed by removal of its N-terminus
signal peptide A signal peptide (sometimes referred to as signal sequence, targeting signal, localization signal, localization sequence, transit peptide, leader sequence or leader peptide) is a short peptide (usually 16-30 amino acids long) present at the N-ter ...
and then
secreted 440px Secretion is the movement of material from one point to another, such as a secreted chemical substance from a cell or gland. In contrast, excretion is the removal of certain substances or waste products from a cell or organism. The classical ...
in the
extracellular space Extracellular space refers to the part of a multicellular organism outside the cells, usually taken to be outside the plasma membranes, and occupied by fluid. This is distinguished from intracellular space, which is inside the cells. The compositi ...
. Its 34-aa mature peptide serves as the first endogenous
ligand In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's electr ...
to a
GPCR G protein-coupled receptors (GPCRs), also known as seven-(pass)-transmembrane domain receptors, 7TM receptors, heptahelical receptors, serpentine receptors, and G protein-linked receptors (GPLR), form a large group of evolutionarily-related p ...
known as the
Apelin Receptor The Apelin Receptor (APLNR, also known as APJ) is a G protein-coupled receptor. APLNR possesses two endogenous ligands which are APELIN and ELABELA ELABELA (ELA, Apela, Toddler) is a hormonal peptide that in humans is encoded by the ''APELA'' g ...
. The genetic inactivation of Ela or Aplnr in
zebrafish The zebrafish (''Danio rerio'') is a freshwater fish belonging to the minnow family ( Cyprinidae) of the order Cypriniformes. Native to South Asia, it is a popular aquarium fish, frequently sold under the trade name zebra danio (and thus often ...
results in heartless phenotypes.


''Mus musculus'' (''M. musculus'')

Myoregulin (Mln) is encoded by a gene originally annotated as a lncRNA. Mln is expressed in all 3 types of skeletal muscle, and works similarly to the micropeptides
phospholamban Phospholamban, also known as PLN or PLB, is a micropeptide protein that in humans is encoded by the ''PLN'' gene. Phospholamban is a 52-amino acid integral membrane protein that regulates the calcium (Ca2+) pump in cardiac muscle cells. Funct ...
(Pln) in the cardiac muscle and
sarcolipin Sarcolipin is a micropeptide protein that in humans is encoded by the ''SLN'' gene. Function Sarcoplasmic reticulum Ca2+-ATPases are transmembrane proteins that catalyze the ATP-dependent transport of Ca2+ from the cytosol into the lumen of t ...
(Sln) in slow (Type I) skeletal muscle. These micropeptides interact with sarcoplasmic reticulum Ca2+-ATPase (SERCA), a membrane pump responsible for regulating Ca2+ uptake into the
sarcoplasmic reticulum The sarcoplasmic reticulum (SR) is a membrane-bound structure found within muscle cells that is similar to the smooth endoplasmic reticulum in other Cell (biology), cells. The main function of the SR is to store calcium ions (Ca2+). Calcium in bio ...
(SR). By inhibiting Ca2+ uptake into the SR, they cause muscle relaxation. Similarly, the endoregulin (ELN) and another-regulin (ALN) genes code for transmembrane micropeptides that contain the SERCA binding motif, and are conserved in mammals. Myomixer (Mymx) is encoded by the gene ''Gm7325,'' a muscle-specific peptide, 84 aa in length, which plays a role during embryogenesis in fusion and skeletal muscle formation. It localizes to the plasma membrane, associating with a fusogenic membrane protein, Myomaker (Mymk). In humans, the gene encoding Mymx is annotated as uncharacterized ''LOC101929726''. Orthologs are found in the turtle, frog and fish genomes as well.


''Homo sapiens'' (''H. sapiens'')

In humans,
NoBody Nobody may refer to: * Nobody, an indefinite pronoun Nobody may also refer to: Fictional characters *Nobody (Kingdom Hearts), Nobody (''Kingdom Hearts''), a race of beings in the ''Kingdom Hearts'' video game series *Nobody, a character in the ...
(non-annotated P-body dissociating polypeptide), a 68 aa micropeptide, was discovered in the long intervening noncoding RNA (lincRNA) ''LINC01420''. It has high sequence conservation among mammals, and localizes to
P-bodies P-bodies, or processing bodies are distinct foci formed by phase separation within the cytoplasm of the eukaryotic cell consisting of many enzymes involved in mRNA turnover. P-bodies are highly conserved structures and have been observed in soma ...
. It enriches proteins associated with 5’ mRNA decapping. It is thought to interact directly with Enhancer of mRNA Decapping 4 (EDC4). ''
ELABELA ELABELA (ELA, Apela, Toddler) is a hormonal peptide that in humans is encoded by the ''APELA'' gene. Elabela is one of two endogenous ligands for the G-protein-coupled APLNR receptor. Ela is secreted by certain cell types including human embryo ...
'' (''ELA)'' (a.k.a. APELA) is an endogenous
hormone A hormone (from the Greek participle , "setting in motion") is a class of signaling molecules in multicellular organisms that are sent to distant organs by complex biological processes to regulate physiology and behavior. Hormones are required ...
that is secreted as a 32 amino acid micropeptide by human
embryonic stem cell Embryonic stem cells (ESCs) are pluripotent stem cells derived from the inner cell mass of a blastocyst, an early-stage pre- implantation embryo. Human embryos reach the blastocyst stage 4–5 days post fertilization, at which time they consi ...
s. It is essential to maintain the self-renewal and
pluripotency Pluripotency: These are the cells that can generate into any of the three Germ layers which imply Endodermal, Mesodermal, and Ectodermal cells except tissues like the placenta. According to Latin terms, Pluripotentia means the ability for many thin ...
of human
embryonic stem cell Embryonic stem cells (ESCs) are pluripotent stem cells derived from the inner cell mass of a blastocyst, an early-stage pre- implantation embryo. Human embryos reach the blastocyst stage 4–5 days post fertilization, at which time they consi ...
s. Its signals in an
autocrine Autocrine signaling is a form of cell signaling in which a cell secretes a hormone or chemical messenger (called the autocrine agent) that binds to autocrine receptors on that same cell, leading to changes in the cell. This can be contrasted with pa ...
fashion through the PI3/AKT pathway via an as yet unidentified
cell surface receptor Cell surface receptors (membrane receptors, transmembrane receptors) are receptors that are embedded in the plasma membrane of cells. They act in cell signaling by receiving (binding to) extracellular molecules. They are specialized integral m ...
. In differentiating mesoendermal cells ELA binds to, and signals via, APLNR, a
GPCR G protein-coupled receptors (GPCRs), also known as seven-(pass)-transmembrane domain receptors, 7TM receptors, heptahelical receptors, serpentine receptors, and G protein-linked receptors (GPLR), form a large group of evolutionarily-related p ...
which can also respond to the hormonal peptide APLN. The ''
C7orf49 Cell cycle regulator of non-homologous end joining is a protein that in humans is encoded by the CYREN gene. It prevents classical non-homologous end joining, a method of repair of double-stranded DNA breaks. This protein is therefore important ...
'' gene, conserved in mammals, when alternatively spliced is predicted to produce three micropeptides. MRI-1 was previously found to be a modulator of retrovirus infection. The second predicted micropeptide, MRI-2, may be important in
non-homologous end joining Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA. NHEJ is referred to as "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology direct ...
(NHEJ) of DNA double strand breaks. In Co-Immunoprecipitation experiments, MRI-2 bound to
Ku70 Ku70 is a protein that, in humans, is encoded by the ''XRCC6'' gene. Function Together, Ku70 and Ku80 make up the Ku heterodimer, which binds to DNA double-strand break ends and is required for the non-homologous end joining (NHEJ) pathway of ...
and
Ku80 Ku80 is a protein that, in humans, is encoded by the ''XRCC5'' gene. Together, Ku70 and Ku80 make up the Ku heterodimer, which binds to DNA double-strand break ends and is required for the non-homologous end joining (NHEJ) pathway of DNA repair ...
, two subunits of Ku, which play a major role in the NHEJ pathway. The 24 amino acid micropeptide,
Humanin Humanin is a micropeptide encoded in the mitochondrial genome by the 16S ribosomal RNA gene, MT-RNR2. Its structure contains a three-turn α-helix, and no symmetry. In ''in vitro'' and animal models, it appears to have cytoprotective effects. ...
(HN), interacts with the apoptosis-inducing protein Bcl2-associated X protein (Bax). In its active state, Bax undergoes a conformational change which exposes membrane-targeting domains. This causes it to move from the cytosol to the mitochondrial membrane, where it inserts and releases apoptogenic proteins such as cytochrome c. By interacting with Bax, HN prevents Bax targeting of the mitochondria, thereby blocking apoptosis. A micropeptide of 90aa, ‘ Small Regulatory Polypeptide of Amino Acid Response’ or SPAAR, was found to be encoded in the lncRNA ''LINC00961''. It is conserved between human and mouse, and localizes to the late endosome/lysosome. SPAAR interacts with four subunits of the
v-ATPase Vacuolar-type ATPase (V-ATPase) is a highly conserved evolutionarily ancient enzyme with remarkably diverse functions in eukaryotic organisms. V-ATPases acidify a wide array of intracellular organelles and pumps protons across the plasma ...
complex, inhibiting
mTORC1 mTORC1, also known as mammalian target of rapamycin complex 1 or mechanistic target of rapamycin complex 1, is a protein complex that functions as a nutrient/energy/redox sensor and controls protein synthesis. mTOR Complex 1 (mTORC1) is compo ...
translocation to the lysosomal surface where it is activated. Down-regulation of this micropeptide enables mTORC1 activation by amino acid stimulation, promoting muscle regeneration.


References

{{Academic peer reviewed, Q60017699