Chimeric RNA
   HOME

TheInfoList



OR:

Chimeric RNA, sometimes referred to as a
fusion transcript Fusion transcript is a chimeric RNA encoded by a fusion gene or by two different genes by subsequent trans-splicing. Certain fusion transcripts are commonly produced by cancer cells, and detection of fusion transcripts is part of routine diagnostics ...
, is composed of
exons An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
from two or more different genes that have the potential to encode novel proteins. These
mRNAs In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the p ...
are different from those produced by conventional splicing as they are produced by two or more gene loci.


Review of RNA Production

In 1956,
Francis Crick Francis Harry Compton Crick (8 June 1916 – 28 July 2004) was an English molecular biologist, biophysicist, and neuroscientist. He, James Watson, Rosalind Franklin, and Maurice Wilkins played crucial roles in deciphering the helical struc ...
proposed what is now known as the "
central dogma The central dogma of molecular biology is an explanation of the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning. It was first stated by ...
" of biology: DNA encodes the genetic information required for an organism to carry out its life cycle. In effect, DNA serves as the "hard drive" which stores genetic data. DNA is replicated and serves as its own template for replication. DNA forms a double helix structure and is a composed of a sugar-phosphate backbone and nitrogenous bases; this can be thought of as a ladder structure where the sides of the ladder are constructed of
deoxyribose Deoxyribose, or more precisely 2-deoxyribose, is a monosaccharide with idealized formula H−(C=O)−(CH2)−(CHOH)3−H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of a hydroxy group. D ...
sugar and
phosphate In chemistry, a phosphate is an anion, salt, functional group or ester derived from a phosphoric acid. It most commonly means orthophosphate, a derivative of orthophosphoric acid . The phosphate or orthophosphate ion is derived from phospho ...
while the rungs of the ladder are composed of paired
nitrogenous bases Nucleobases, also known as ''nitrogenous bases'' or often simply ''bases'', are nitrogen-containing biological compounds that form nucleosides, which, in turn, are components of nucleotides, with all of these monomers constituting the basic b ...
. There are four bases in a DNA molecule:
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its derivati ...
(A),
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an am ...
(C),
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nu ...
(T), and
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is called ...
(G). Nucleotides are a structural component of DNA and RNA, being made of a molecule of sugar and a molecule of phosphoric acid. The double helix structure of DNA is composed of two antiparallel strands which are oriented in opposite directions. DNA is composed of base pairs in which adenine pairs with thymine and guanine pairs with cytosine. While DNA serves as template for production of ribonucleic acid (RNA), RNA is usually responsible for making protein. The process of making RNA from DNA is called transcription. RNA uses a similar set of bases except that thymine is replaced with
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by ...
. A group of enzymes called
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template. Using the enzyme helicase, RNAP locally opens the ...
s (isolated by biochemists
Jerard Hurwitz Jerard Hurwitz (November 20, 1928 – January 24, 2019) was an American biochemist who co-discovered RNA polymerase in 1960 along with Sam Weiss, Audrey Stevens, and James Bonner. He most recently worked at the Sloan-Kettering Institute in New Yo ...
and Samuel B. Weiss) function in the presence of DNA. These enzymes produce RNA using segments of chromosomal DNA as a template. Unlike replication, where a complete copy of DNA is made, transcription copies only the gene that is to be expressed as a protein. Initially, it was thought that RNA served as a structural template for
protein synthesis Protein biosynthesis (or protein synthesis) is a core biological process, occurring inside Cell (biology), cells, homeostasis, balancing the loss of cellular proteins (via Proteolysis, degradation or Protein targeting, export) through the product ...
, essentially ordering amino acids by a series of cavities shaped specifically so that only specific amino acids would fit. Crick was not satisfied with this hypothesis given that the four bases of RNA are hydrophilic and that many amino acids prefer interactions with hydrophobic groups. Additionally, some amino acids are very structurally similar and Crick felt that accurate discrimination would not be possible given the similarities. Crick then proposed that prior to incorporation into proteins, amino acids are first attached to adapter molecules which have unique surface features that can bind to specific bases on the RNA templates. These adapter molecules are called
transfer RNA Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino ac ...
(tRNA). Through a series of experiments involving
E. coli ''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Escher ...
and the
T4 phage Escherichia virus T4 is a species of bacteriophages that infect ''Escherichia coli'' bacteria. It is a double-stranded DNA virus in the subfamily ''Tevenvirinae'' from the family Myoviridae. T4 is capable of undergoing only a lytic lifecycle ...
in 1960, it was shown that messenger RNA (mRNA) carriers information from DNA to the ribosomal sites of protein synthesis. The tRNA-amino acid precursors are brought into position by
ribosomes Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to f ...
where they can read the information provided mRNA templates to synthesize protein.


RNA Splicing

Creating a protein consists of two main steps:
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
of DNA into RNA and
translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
of RNA into protein. After DNA is transcribed into RNA, the molecule is known as pre-messenger RNA (mRNA) and it consists of
exons An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
and
introns An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
that can be split apart and rearranged in many different ways. Historically, exons are considered the coding sequence and introns are considered the “junk” DNA. Although this has been shown to be false, it is true that exons are often merged. Depending on the needs of the cell, regulatory mechanisms choose which exons, and sometimes introns, to join. This process of removing pieces of a pre- mRNA transcript and combining them with other pieces is called splicing. The
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
encodes approximately 25,000 genes but there are significantly more proteins produced. This is accomplished through RNA splicing. The exons of these 25,000 genes can be spliced in many different ways to create countless combinations of RNA transcripts and ultimately countless proteins. Normally, exons from the same pre-mRNA transcript are spliced together. However, occasionally gene products or pre-mRNA transcripts are spliced together so that exons from different transcripts are mixed together in a fusion product known as chimeric RNA. Chimeric RNA often incorporates exons from highly expressed genes, but the chimeric transcript itself is usually expressed at low levels. This chimeric RNA can then be translated into a fusion protein.
Fusion protein Fusion proteins or chimeric (kī-ˈmir-ik) proteins (literally, made of parts from different sources) are proteins created through the joining of two or more genes that originally coded for separate proteins. Translation of this ''fusion gene'' r ...
s are very tissue-specific and they are frequently associated with cancers such as colorectal, prostate, and mesotheliomas. They significantly exploit
signal peptides A signal peptide (sometimes referred to as signal sequence, targeting signal, localization signal, localization sequence, transit peptide, leader sequence or leader peptide) is a short peptide (usually 16-30 amino acids long) present at the N-ter ...
and
transmembrane proteins A transmembrane protein (TP) is a type of integral membrane protein that spans the entirety of the cell membrane. Many transmembrane proteins function as gateways to permit the transport of specific substances across the membrane. They frequentl ...
which can alter the localization of proteins, possibly contributing to the disease phenotype.


Discovery of Chimeric RNA

One of the first studies to investigate the generation of chimeric RNA examined the fusion of the first three exons of a gene known as
JAZF1 Juxtaposed with another zinc finger protein 1 (JAZF1) also known as TAK1-interacting protein 27 (TIP27) or zinc finger protein 802 (ZNF802) is a protein that in humans is encoded by the ''JAZF1'' gene. Variants are associated with an increased risk ...
to the last 15 exons of a gene known as JJAZ1. This exact transcript, and the resulting protein, was found specifically in endometrial tissue. While often found in endometrial cancers, these transcripts are expressed in normal tissue as well. Originally thought to be the result of chromosomal fusions, one group investigated whether this was accurate. Using
Southern blotting A Southern blot is a method used in molecular biology for detection of a specific DNA sequence in DNA samples. Southern blotting combines transfer of electrophoresis-separated DNA fragments to a filter membrane and subsequent fragment detecti ...
and
fluorescence in situ hybridization Fluorescence ''in situ'' hybridization (FISH) is a molecular cytogenetic technique that uses fluorescent probes that bind to only particular parts of a nucleic acid sequence with a high degree of sequence complementarity. It was developed b ...
(FISH) on the genome, the researchers found no evidence of DNA rearrangement. They decided to investigate further by combining human endometrial cells with rhesus
fibroblasts A fibroblast is a type of biological cell that synthesizes the extracellular matrix and collagen, produces the structural framework ( stroma) for animal tissues, and plays a critical role in wound healing. Fibroblasts are the most common cells o ...
and found chimeric products containing sequences from both species. These data suggested that chimeric RNA is generated by splicing parts of genes together rather than chromosomal re-arrangements. They also performed
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
on the translated protein to verify that the chimeric RNA is translated into protein. Recently, advances in
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
have decreased the cost of sequencing significantly, allowing more RNAseq projects to be conducted. These RNAseq projects are able to detect novel RNA transcripts instead of the traditional
microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
in which only known transcripts can be detected.
Deep sequencing Coverage (or depth) in DNA sequencing is the number of unique reads that include a given nucleotide in the reconstructed sequence. Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence. ...
enables detection of transcripts even at very low levels. This has allowed researchers to detect many more chimeric RNAs and fusion proteins and has facilitated understanding their role in health and disease.


Chimeric protein products

Numerous putative chimeric transcripts have been identified among the
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proce ...
s using high throughput RNA sequencing technology. In humans, chimeric transcripts can be generated in several ways such as
trans-splicing ''Trans''-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archaea ...
of pre-mRNAs, RNA transcription runoff, from other errors in RNA transcription or they can also be the result of
gene fusion A fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplas ...
following inter-chromosomal translocations or rearrangements. Among the few corresponding protein products that have been characterized so far, most result from chromosomal translocations and are associated with cancer. For instance,
gene fusion A fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplas ...
in
chronic myelogenous leukemia Chronic myelogenous leukemia (CML), also known as chronic myeloid leukemia, is a cancer of the white blood cells. It is a form of leukemia characterized by the increased and unregulated growth of myeloid cells in the bone marrow and the accumulat ...
(CML) leads to an mRNA transcript that encompasses the 5′ end of the breakpoint cluster region protein (BCR) gene and the 3′ end of the Abelson murine leukemia viral oncogene homolog 1 (ABL) gene. Translation of this transcript results in a chimeric BCR–ABL protein that possesses increased
tyrosine kinase A tyrosine kinase is an enzyme that can transfer a phosphate group from ATP to the tyrosine residues of specific proteins inside a cell. It functions as an "on" or "off" switch in many cellular functions. Tyrosine kinases belong to a larger cla ...
activity. Chimeric transcripts characterize specific cellular phenotypes and are suspected to function not only in cancer, but also in normal cells. One example of a chimera in normal human cells is generated by trans-splicing of the 5′ exons of the
JAZF1 Juxtaposed with another zinc finger protein 1 (JAZF1) also known as TAK1-interacting protein 27 (TIP27) or zinc finger protein 802 (ZNF802) is a protein that in humans is encoded by the ''JAZF1'' gene. Variants are associated with an increased risk ...
gene on chromosome 7p15 and the 3′ exons of JJAZ1 (
SUZ12 Polycomb protein SUZ12 is a protein that in humans is encoded by the ''SUZ12'' gene. Function This zinc finger gene has been identified at the breakpoints of a recurrent chromosomal translocation reported in endometrial stromal sarcoma. Recomb ...
) on chromosome 17q1. This chimeric RNA is translated in endometrial stroma cells and encodes an anti-apoptotic protein. Notable examples of chimeric genes in cancer are the fused BCR-ABL, FUS-
ERG The erg is a unit of energy equal to 10−7joules (100 nJ). It originated in the Centimetre–gram–second system of units (CGS). It has the symbol ''erg''. The erg is not an SI unit. Its name is derived from (), a Greek word meaning 'work' o ...
, MLL-AF6, and MOZ-CBP genes expressed in
acute myeloid leukemia Acute myeloid leukemia (AML) is a cancer of the myeloid line of blood cells, characterized by the rapid growth of abnormal cells that build up in the bone marrow and blood and interfere with normal blood cell production. Symptoms may includ ...
(AML), and the TMPRSS2-ETS chimera associated with overexpression of the
oncogene An oncogene is a gene that has the potential to cause cancer. In tumor cells, these genes are often mutated, or expressed at high levels.
in prostate cancer.


Characteristics of chimeric proteins

Frenkel-Morgenstern et al. have defined two main features of chimeric proteins. They have reported that chimeras exploit
signal peptides A signal peptide (sometimes referred to as signal sequence, targeting signal, localization signal, localization sequence, transit peptide, leader sequence or leader peptide) is a short peptide (usually 16-30 amino acids long) present at the N-ter ...
and
transmembrane domains A transmembrane domain (TMD) is a membrane-spanning protein domain. TMDs generally adopt an alpha helix topological conformation, although some TMDs such as those in porins can adopt a different conformation. Because the interior of the lipid bil ...
to alter the cellular localization of the associated activities. Second, chimeras incorporate parental genes that are expressed at a high level. A survey of all the functional domains in proteins encoded by chimeric transcripts demonstrated that chimeras contain complete protein domains significantly more often than in random data sets.


Databases of chimeric transcripts

Several databases have been constructed to incorporate chimeric transcripts from different resources using a variety of computational procedures: * ChiTaRS * ChimerDB 2.0 * HybridDB * TICdb * dbCrid


Computational tools for detecting chimeric RNA

Recent advances in high throughput transcriptome sequencing have paved the way for new computational methods for fusion discovery. The following are computational tools available for detection of fusion transcripts from RNA-Seq data: * Fusim is a software tool for simulating fusion transcripts for comprehensive comparison across fusion discovery methods. * CRAC integrates genomic locations and local coverage to enable splice junction or fusion RNA predictions directly from RNA-seq read analysis. * TopHat-Fusion can discover fusion products deriving from known genes, unknown genes and unannotated splice variants of known genes. * FusionAnalyser is a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
sequencing data. * ChimeraScan offers discovery of chimeric transcription between two independent transcripts in high-throughput transcriptome sequencing data by providing features such as the ability to process long (>75 bp) paired-end reads, processing of ambiguously mapping reads and detection of reads spanning a fusion junction. * FusionHunter identifies fusion transcripts from transcriptional analysis of paired-end RNA-seq reads. * SplitSeek allows de novo prediction of splice junctions in short-read RNA-seq data, suitable for detection of novel splicing events and chimeric transcripts. * Trans-AB ySS is a de novo short-read transcriptome assembly and analysis pipeline that helps in the identification of known, new and alternative structures in expressed transcripts such as chimeric transcripts. * FusionSeq identifies fusion transcripts from paired-end RNA-sequencing. It includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments. Some caution needs to be applied in the interpretation of trans-splicing events detected in high-throughput sequencing experiments as the
reverse transcriptase A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
enzymes ubiquitously used to determine RNA sequences are capable of introducing apparent trans-splicing events that were not present in the original RNA. Some chimeric RNAs have been confirmed by other methods however.


Chimeric RNA in lower eukaryotes

Although rare in higher eukaryotes, various lower eukaryotes including
nematode The nematodes ( or grc-gre, Νηματώδη; la, Nematoda) or roundworms constitute the phylum Nematoda (also called Nemathelminthes), with plant-Parasitism, parasitic nematodes also known as eelworms. They are a diverse animal phylum inhab ...
s and
trypanosomes Trypanosomatida is a group of kinetoplastid excavates distinguished by having only a single flagellum. The name is derived from the Greek ''trypano'' (borer) and ''soma'' (body) because of the corkscrew-like motion of some trypanosomatid species. ...
make extensive use of trans-splicing to generate chimeric RNAs. In these organisms, splicing reactions between a protein coding RNA and a universal sequence result in the attachment of a splice-leader to the 5' end of the RNA, generating a functional
messenger RNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the p ...
. This system allows the use of
operons In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
- collections of protein-coding genes with a shared function that are simultaneously transcribed into a single RNA and then spliced into individual messenger RNAs, each of which codes for a single protein.


References

{{reflist RNA