The transcriptome is the set of all
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
transcripts, including coding and
non-coding, in an individual or a population of
cells. The term can also sometimes be used to refer to
all RNAs, or just
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
, depending on the particular experiment. The term ''transcriptome'' is a portmanteau of the words ''transcript'' and ''genome''; it is associated with the process of transcript production during the biological process of
transcription.
The early stages of transcriptome annotations began with
cDNA
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
libraries published in the 1980s. Subsequently, the advent of high-throughput technology led to faster and more efficient ways of obtaining data about the transcriptome. Two biological techniques are used to study the transcriptome, namely
DNA microarray
A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
, a hybridization-based technique and
RNA-seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
, a sequence-based approach.
RNA-seq is the preferred method and has been the dominant
transcriptomics technique since the 2010s.
Single-cell transcriptomics allows tracking of transcript changes over time within individual cells.
Data obtained from the transcriptome is used in research to gain insight into processes such as
cellular differentiation
Cellular differentiation is the process in which a stem cell changes from one type to a differentiated one. Usually, the cell changes to a more specialized type. Differentiation happens multiple times during the development of a multicellula ...
,
carcinogenesis
Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cell (biology), cells are malignant transformation, transformed into cancer cells. The process is characterized by changes at the cellular, G ...
,
transcription regulation and
biomarker discovery among others. Transcriptome-obtained data also
finds applications in establishing
phylogenetic relationships during the process of evolution and in
''in vitro'' fertilization. The transcriptome is closely related to other
-ome based biological fields of study; it is complementary to the
proteome
A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
and the
metabolome
The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The ...
and encompasses the
translatome,
exome, meiome and
thanatotranscriptome which can be seen as ome fields studying specific types of RNA transcripts. There are quantifiable and conserved relationships between the Transcriptome and other -omes, and Transcriptomics data can be used effectively to predict other molecular species, such as metabolites. There are numerous publicly available transcriptome databases.
Etymology and history
The word ''transcriptome'' is a
portmanteau
In linguistics, a blend—also known as a blend word, lexical blend, or portmanteau—is a word formed by combining the meanings, and parts of the sounds, of two or more words together. of the words ''transcript'' and ''genome''. It appeared along with other
neologism
In linguistics, a neologism (; also known as a coinage) is any newly formed word, term, or phrase that has achieved popular or institutional recognition and is becoming accepted into mainstream language. Most definitively, a word can be considered ...
s formed using the suffixes ''-ome'' and ''-omics'' to denote all studies conducted on a genome-wide scale in the fields of life sciences and technology. As such, transcriptome and transcriptomics were one of the first words to emerge along with genome and proteome.
The first study to present a case of a collection of a
cDNA
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
library for
silk moth mRNA was published in 1979. The first seminal study to mention and investigate the transcriptome of an organism was published in 1997 and it described 60,633 transcripts expressed in ''
S. cerevisiae'' using
serial analysis of gene expression (SAGE). With the rise of high-throughput technologies and
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
and the subsequent increased computational power, it became increasingly efficient and easy to characterize and analyze enormous amount of data.
Attempts to characterize the transcriptome became more prominent with the advent of automated DNA sequencing during the 1980s.
During the 1990s,
expressed sequence tag
In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has pro ...
sequencing was used to identify genes and their fragments.
This was followed by techniques such as serial analysis of gene expression (SAGE),
cap analysis of gene expression (CAGE), and
massively parallel signature sequencing (MPSS).
Transcription
The transcriptome encompasses all the
ribonucleic acid
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins ( messenger RNA). RNA and deoxyr ...
(RNA) transcripts present in a given organism or experimental sample.
RNA is the main carrier of genetic information that is responsible for the process of converting
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
into an organism's phenotype. A gene can give rise to a single-stranded
messenger RNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
(mRNA) through a molecular process known as
transcription; this mRNA is complementary to the strand of DNA it originated from.
The enzyme
RNA polymerase II
RNA polymerase II (RNAP II and Pol II) is a Protein complex, multiprotein complex that Transcription (biology), transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNA pol ...
attaches to the template DNA strand and catalyzes the addition of
ribonucleotides to the 3' end of the growing sequence of the mRNA transcript.
In order to initiate its function, RNA polymerase II needs to recognize a
promoter sequence, located upstream (5') of the gene. In eukaryotes, this process is mediated by
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s, most notably
Transcription factor II D (TFIID) which recognizes the
TATA box and aids in the positioning of RNA polymerase at the appropriate start site. To finish the production of the RNA transcript,
termination takes place usually several hundred nuclecotides away from the termination sequence and cleavage takes place.
This process occurs in the nucleus of a cell along with
RNA processing by which mRNA molecules are
capped,
spliced and
polyadenylated to increase their stability before being subsequently taken to the cytoplasm. The mRNA gives rise to proteins through the process of
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
that takes place in
ribosome
Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
s.
Types of RNA transcripts
Almost all functional transcripts are derived from known genes. The only exceptions are a small number of transcripts that might play a direct role in regulating gene expression near the prompters of known genes. (See
Enhancer RNA.)
Gene occupy most of prokaryotic genomes so most of their genomes are transcribed. Many eukaryotic genomes are very large and known genes may take up only a fraction of the genome. In mammals, for example, known genes only account for 40-50% of the genome.
Nevertheless, identified transcripts often map to a much larger fraction of the genome suggesting that the transcriptome contains spurious transcripts that do not come from genes. Some of these transcripts are known to be non-functional because they map to transcribed pseudogenes or degenerative transposons and viruses. Others map to unidentified regions of the genome that may be junk DNA.
Spurious transcription is very common in eukaryotes, especially those with large genomes that might contain a lot of
junk DNA.
Some scientists claim that if a transcript has not been assigned to a known gene then the default assumption must be that it is junk RNA until it has been shown to be functional.
This would mean that much of the transcriptome in species with large genomes is probably junk RNA. (See Non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally imp ...
)
The transcriptome includes the transcripts of protein-coding genes (mRNA plus introns) as well as the transcripts of non-coding genes (functional RNAs plus introns).
*Ribosomal RNA
Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal ...
/rRNA: Usually the most abundant RNA in the transcriptome.
* Long non-coding RNA/lncRNA: Non-coding RNA transcripts that are more than 200 nucleotides long. Members of this group comprise the largest fraction of the non-coding transcriptome other than introns. It is not known how many of these transcripts are functional and how many are junk RNA.
*transfer RNA
Transfer ribonucleic acid (tRNA), formerly referred to as soluble ribonucleic acid (sRNA), is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes). In a cell, it provides the physical link between the gene ...
/tRNA
* micro RNA/miRNA: 19-24 nucleotides (nt) long. Micro RNAs up- or downregulate expression levels of mRNAs by the process of RNA interference
RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known by ...
at the post-transcriptional level.
*small interfering RNA
Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA, double-stranded non-coding RNA, non-coding RNA, RNA molecules, typically 20–24 base pairs in length, similar to microR ...
/siRNA: 20-24 nt
* small nucleolar RNA/snoRNA
* Piwi-interacting RNA/piRNA: 24-31 nt. They interact with Piwi proteins of the Argonaute family and have a function in targeting and cleaving transposon
A transposable element (TE), also transposon, or jumping gene, is a type of mobile genetic element, a nucleic acid sequence in DNA that can change its position within a genome.
The discovery of mobile genetic elements earned Barbara McClinto ...
s.
* enhancer RNA/eRNA:
Scope of study
In the human genome, all genes get transcribed into RNA because that's how the molecular gene is defined. (See Gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
.) The transcriptome consists of coding regions of mRNA plus non-coding UTRs, introns, non-coding RNAs, and spurious non-functional transcripts.
Several factors render the content of the transcriptome difficult to establish. These include alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing, is an alternative RNA splicing, splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene ma ...
, RNA editing and alternative transcription among others. Additionally, transcriptome techniques are capable of capturing transcription occurring in a sample at a specific time point, although the content of the transcriptome can change during differentiation. The main aims of transcriptomics are the following: "catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions".
The term can be applied to the total set of transcripts in a given organism
An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
, or to the specific subset of transcripts present in a particular cell type. Unlike the genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
, which is roughly fixed for a given cell line (excluding mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects the gene
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s that are being actively expressed at any given time, with the exception of mRNA degradation phenomena such as transcriptional attenuation. The study of transcriptomics
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA, RNA transcripts. The information content of an organism is recorded in the DNA of its genome and Gene expression, expressed throu ...
, (which includes expression profiling, splice variant analysis etc.), examines the expression level of RNAs in a given cell population, often focusing on mRNA, but sometimes including others such as tRNAs and sRNAs.
Methods of construction
Transcriptomics is the quantitative science that encompasses the assignment of a list of strings ("reads") to the object ("transcripts" in the genome). To calculate the expression strength, the density of reads corresponding to each object is counted. Initially, transcriptomes were analyzed and studied using expressed sequence tags libraries and serial and cap analysis of gene expression (SAGE).
Currently, the two main transcriptomics techniques include DNA microarray
A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
s and RNA-Seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
. Both techniques require RNA isolation through RNA extraction techniques, followed by its separation from other cellular components and enrichment of mRNA.
There are two general methods of inferring transcriptome sequences. One approach maps sequence reads onto a reference genome, either of the organism itself (whose transcriptome is being studied) or of a closely related species. The other approach, ''de novo'' transcriptome assembly, uses software to infer transcripts directly from short sequence reads and is used in organisms with genomes that are not sequenced.
DNA microarrays
The first transcriptome studies were based on microarray
A microarray is a multiplex (assay), multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of biological interactions. It is a two-dimensional array on a Substrate (materials science), solid substrate—usu ...
techniques (also known as DNA chips). Microarrays consist of thin glass layers with spots on which oligonucleotide
Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, Recombinant DNA, research, and Forensic DNA, forensics. Commonly made in the laboratory by Oligonucleotide synthesis, solid-phase ...
s, known as "probes" are arrayed; each spot contains a known DNA sequence.
When performing microarray analyses, mRNA is collected from a control and an experimental sample, the latter usually representative of a disease. The RNA of interest is converted to cDNA to increase its stability and marked with fluorophore
A fluorophore (or fluorochrome, similarly to a chromophore) is a fluorescent chemical compound that can re-emit light upon light excitation. Fluorophores typically contain several combined aromatic groups, or planar or cyclic molecules with se ...
s of two colors, usually green and red, for the two groups. The cDNA is spread onto the surface of the microarray where it hybridizes with oligonucleotides on the chip and a laser is used to scan. The fluorescence intensity on each spot of the microarray corresponds to the level of gene expression and based on the color of the fluorophores selected, it can be determined which of the samples exhibits higher levels of the mRNA of interest.
One microarray usually contains enough oligonucleotides to represent all known genes; however, data obtained using microarrays does not provide information about unknown genes. During the 2010s, microarrays were almost completely replaced by next-generation techniques that are based on DNA sequencing.
RNA sequencing
RNA sequencing is a next-generation sequencing technology; as such it requires only a small amount of RNA and no previous knowledge of the genome. It allows for both qualitative and quantitative analysis of RNA transcripts, the former allowing discovery of new transcripts and the latter a measure of relative quantities for transcripts in a sample.
The three main steps of sequencing transcriptomes of any biological samples include RNA purification, the synthesis of an RNA or cDNA library and sequencing the library. The RNA purification process is different for short and long RNAs. This step is usually followed by an assessment of RNA quality, with the purpose of avoiding contaminants such as DNA or technical contaminants related to sample processing. RNA quality is measured using UV spectrometry with an absorbance peak of 260 nm. RNA integrity can also be analyzed quantitatively comparing the ratio and intensity of 28S RNA to 18S RNA reported in the RNA Integrity Number (RIN) score. Since mRNA is the species of interest and it represents only 3% of its total content, the RNA sample should be treated to remove rRNA and tRNA and tissue-specific RNA transcripts.
The step of library preparation with the aim of producing short cDNA fragments, begins with RNA fragmentation to transcripts in length between 50 and 300 base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s. Fragmentation can be enzymatic (RNA endonuclease
In molecular biology, endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain (namely DNA or RNA). Some, such as deoxyribonuclease I, cut DNA relatively nonspecifically (with regard to sequence), while man ...
s), chemical (trismagnesium salt buffer, chemical hydrolysis) or mechanical (sonication
image:Sonicator.jpg, A sonicator at the Weizmann Institute of Science during sonicationSonication is the act of applying sound energy to agitate particles in a sample, for various purposes such as the extraction of multiple compounds from plants, ...
, nebulisation). Reverse transcription is used to convert the RNA templates into cDNA and three priming methods can be used to achieve it, including oligo-DT, using random primers or ligating special adaptor oligos.
Single-cell transcriptomics
Transcription can also be studied at the level of individual cells by single-cell transcriptomics. Single-cell RNA sequencing (scRNA-seq) is a recently developed technique that allows the analysis of the transcriptome of single cells, including bacteria
Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
. With single-cell transcriptomics, subpopulations of cell types that constitute the tissue of interest are also taken into consideration. This approach allows to identify whether changes in experimental samples are due to phenotypic cellular changes as opposed to proliferation, with which a specific cell type might be overexpressed in the sample. Additionally, when assessing cellular progression through differentiation, average expression profiles are only able to order cells by time rather than their stage of development and are consequently unable to show trends in gene expression levels specific to certain stages. Single-cell trarnscriptomic techniques have been used to characterize rare cell populations such as circulating tumor cells, cancer stem cells in solid tumors, and embryonic stem cells (ESCs) in mammalian blastocyst
The blastocyst is a structure formed in the early embryonic development of mammals. It possesses an inner cell mass (ICM) also known as the ''embryoblast'' which subsequently forms the embryo, and an outer layer of trophoblast cells called the ...
s.
Although there are no standardized techniques for single-cell transcriptomics, several steps need to be undertaken. The first step includes cell isolation, which can be performed using low- and high-throughput techniques. This is followed by a qPCR step and then single-cell RNAseq where the RNA of interest is converted into cDNA. Newer developments in single-cell transcriptomics allow for tissue and sub-cellular localization preservation through cryo-sectioning thin slices of tissues and sequencing the transcriptome in each slice. Another technique allows the visualization of single transcripts under a microscope while preserving the spatial information of each individual cell where they are expressed.
Analysis
A number of organism-specific transcriptome databases have been constructed and annotated to aid in the identification of genes that are differentially expressed in distinct cell populations.
RNA-seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
is emerging (2013) as the method of choice for measuring transcriptomes of organisms, though the older technique of DNA microarray
A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
s is still used. RNA-seq measures the transcription of a specific gene by converting long RNAs into a library of cDNA
In genetics, complementary DNA (cDNA) is DNA that was reverse transcribed (via reverse transcriptase) from an RNA (e.g., messenger RNA or microRNA). cDNA exists in both single-stranded and double-stranded forms and in both natural and engin ...
fragments. The cDNA fragments are then sequenced using high-throughput sequencing technology and aligned to a reference genome or transcriptome which is then used to create an expression profile of the genes.
Applications
Mammals
The transcriptomes of stem cell
In multicellular organisms, stem cells are undifferentiated or partially differentiated cells that can change into various types of cells and proliferate indefinitely to produce more of the same stem cell. They are the earliest type of cell ...
s and cancer
Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
cells are of particular interest to researchers who seek to understand the processes of cellular differentiation
Cellular differentiation is the process in which a stem cell changes from one type to a differentiated one. Usually, the cell changes to a more specialized type. Differentiation happens multiple times during the development of a multicellula ...
and carcinogenesis
Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cell (biology), cells are malignant transformation, transformed into cancer cells. The process is characterized by changes at the cellular, G ...
. A pipeline using RNA-seq or gene array data can be used to track genetic changes occurring in stem and precursor cells and requires at least three independent gene expression data from the former cell type and mature cells.
Analysis of the transcriptomes of human oocyte
An oocyte (, oöcyte, or ovocyte) is a female gametocyte or germ cell involved in reproduction. In other words, it is an immature ovum, or egg cell. An oocyte is produced in a female fetus in the ovary during female gametogenesis. The female ger ...
s and embryos is used to understand the molecular mechanisms and signaling pathways controlling early embryonic development, and could theoretically be a powerful tool in making proper embryo selection in in vitro fertilisation
In vitro fertilisation (IVF) is a process of fertilisation in which an ovum, egg is combined with spermatozoon, sperm in vitro ("in glass"). The process involves monitoring and stimulating the Ovulation cycle, ovulatory process, then removing ...
. Analyses of the transcriptome content of the placenta in the first-trimester of pregnancy in ''in vitro'' fertilization and embryo transfer (IVT-ET) revealed differences in genetic expression which are associated with higher frequency of adverse perinatal outcomes. Such insight can be used to optimize the practice. Transcriptome analyses can also be used to optimize cryopreservation of oocytes, by lowering injuries associated with the process.
Transcriptomics is an emerging and continually growing field in biomarker
In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, ...
discovery for use in assessing the safety of drugs or chemical risk assessment
Risk assessment is a process for identifying hazards, potential (future) events which may negatively impact on individuals, assets, and/or the environment because of those hazards, their likelihood and consequences, and actions which can mitigate ...
.
Transcriptomes may also be used to infer phylogenetic relationships among individuals or to detect evolutionary patterns of transcriptome conservation.
Transcriptome analyses were used to discover the incidence of antisense transcription, their role in gene expression through interaction with surrounding genes and their abundance in different chromosomes. RNA-seq was also used to show how RNA isoforms, transcripts stemming from the same gene but with different structures, can produce complex phenotypes from limited genomes.
Plants
Transcriptome analysis have been used to study the evolution
Evolution is the change in the heritable Phenotypic trait, characteristics of biological populations over successive generations. It occurs when evolutionary processes such as natural selection and genetic drift act on genetic variation, re ...
and diversification process of plant species. In 2014, the 1000 Plant Genomes Project was completed in which the transcriptomes of 1,124 plant species from the families viridiplantae, glaucophyta
The glaucophytes, also known as glaucocystophytes or glaucocystids, are a small group of unicellular algae found in freshwater and moist terrestrial environments, less common today than they were during the Proterozoic. The stated number of spec ...
and rhodophyta
Red algae, or Rhodophyta (, ; ), make up one of the oldest groups of eukaryotic algae. The Rhodophyta comprises one of the largest phyla of algae, containing over 7,000 recognized species within over 900 genera amidst ongoing taxonomic revisions. ...
were sequenced. The protein coding sequences were subsequently compared to infer phylogenetic relationships between plants and to characterize the time of their diversification in the process of evolution. Transcriptome studies have been used to characterize and quantify gene expression in mature pollen
Pollen is a powdery substance produced by most types of flowers of seed plants for the purpose of sexual reproduction. It consists of pollen grains (highly reduced Gametophyte#Heterospory, microgametophytes), which produce male gametes (sperm ...
. Genes involved in cell wall metabolism and cytoskeleton were found to be overexpressed. Transcriptome approaches also allowed to track changes in gene expression through different developmental stages of pollen, ranging from microspore to mature pollen grains; additionally such stages could be compared across species of different plants including ''Arabidopsis
''Arabidopsis'' (rockcress) is a genus in the family Brassicaceae. They are small flowering plants related to cabbage and mustard. This genus is of great interest since it contains thale cress (''Arabidopsis thaliana''), one of the model organ ...
'', rice
Rice is a cereal grain and in its Domestication, domesticated form is the staple food of over half of the world's population, particularly in Asia and Africa. Rice is the seed of the grass species ''Oryza sativa'' (Asian rice)—or, much l ...
and tobacco
Tobacco is the common name of several plants in the genus '' Nicotiana'' of the family Solanaceae, and the general term for any product prepared from the cured leaves of these plants. More than 70 species of tobacco are known, but the ...
.
Relation to other ome fields
Similar to other -ome based technologies, analysis of the transcriptome allows for an unbiased approach when validating hypotheses experimentally. This approach also allows for the discovery of novel mediators in signaling pathways. As with other -omics based technologies, the transcriptome can be analyzed within the scope of a multiomics approach. It is complementary to metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerpri ...
but contrary to proteomics, a direct association between a transcript and metabolite
In biochemistry, a metabolite is an intermediate or end product of metabolism.
The term is usually used for small molecules. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, c ...
cannot be established.
There are several -ome fields that can be seen as subcategories of the transcriptome. The exome differs from the transcriptome in that it includes only those RNA molecules found in a specified cell population, and usually includes the amount or concentration of each RNA molecule in addition to the molecular identities. Additionally, the transcritpome also differs from the translatome, which is the set of RNAs undergoing translation.
The term meiome is used in functional genomics
Functional genomics is a field of molecular biology that attempts to describe gene (and protein) functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects (such as genome sequen ...
to describe the meiotic transcriptome or the set of RNA transcripts produced during the process of meiosis
Meiosis () is a special type of cell division of germ cells in sexually-reproducing organisms that produces the gametes, the sperm or egg cells. It involves two rounds of division that ultimately result in four cells, each with only one c ...
. Meiosis is a key feature of sexually reproducing eukaryote
The eukaryotes ( ) constitute the Domain (biology), domain of Eukaryota or Eukarya, organisms whose Cell (biology), cells have a membrane-bound cell nucleus, nucleus. All animals, plants, Fungus, fungi, seaweeds, and many unicellular organisms ...
s, and involves the pairing of homologous chromosome
Homologous chromosomes or homologs are a set of one maternal and one paternal chromosome that pair up with each other inside a cell during meiosis. Homologs have the same genes in the same locus (genetics), loci, where they provide points along e ...
, synapse and recombination. Since meiosis in most organisms occurs in a short time period, meiotic transcript profiling is difficult due to the challenge of isolation (or enrichment) of meiotic cells ( meiocytes). As with transcriptome analyses, the meiome can be studied at a whole-genome level using large-scale transcriptomic techniques. The meiome has been well-characterized in mammal and yeast systems and somewhat less extensively characterized in plants.
The thanatotranscriptome consists of all RNA transcripts that continue to be expressed or that start getting re-expressed in internal organs of a dead body 24–48 hours following death. Some genes include those that are inhibited after fetal development. If the thanatotranscriptome is related to the process of programmed cell death (apoptosis
Apoptosis (from ) is a form of programmed cell death that occurs in multicellular organisms and in some eukaryotic, single-celled microorganisms such as yeast. Biochemistry, Biochemical events lead to characteristic cell changes (Morphology (biol ...
), it can be referred to as the apoptotic thanatotranscriptome. Analyses of the thanatotranscriptome are used in forensic medicine
Forensic medicine is a broad term used to describe a group of medical specialties which deal with the examination and diagnosis of individuals who have been injured by or who have died because of external or unnatural causes such as poisoning, assa ...
.
eQTL mapping can be used to complement genomics with transcriptomics; genetic variants at DNA level and gene expression measures at RNA level.
Relation to proteome
The transcriptome can be seen as a subset of the proteome
A proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. P ...
, that is, the entire set of proteins expressed by a genome.
However, the analysis of relative mRNA expression levels can be complicated by the fact that relatively small changes in mRNA expression can produce large changes in the total amount of the corresponding protein present in the cell. One analysis method, known as gene set enrichment analysis, identifies coregulated gene networks rather than individual genes that are up- or down-regulated in different cell populations.
Although microarray studies can reveal the relative amounts of different mRNAs in the cell, levels of mRNA are not directly proportional to the expression level of the protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
s they code for. The number of protein molecules synthesized using a given mRNA molecule as a template is highly dependent on translation-initiation features of the mRNA sequence; in particular, the ability of the translation initiation sequence is a key determinant in the recruiting of ribosome
Ribosomes () are molecular machine, macromolecular machines, found within all cell (biology), cells, that perform Translation (biology), biological protein synthesis (messenger RNA translation). Ribosomes link amino acids together in the order s ...
s for protein translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
.
Transcriptome databases
*Ensembl
*OmicTools
*Transcriptome Browser
*ArrayExpress
See also
Notes
References
*
Further reading
* Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. ''Proc Natl Acad Sci USA'' 102(43):15545-50.
* Laule O, Hirsch-Hoffmann M, Hruz T, Gruissem W, and P Zimmermann. (2006) Web-based analysis of the mouse transcriptome using Genevestigator. ''BMC Bioinformatics'' 7:311
*
*
{{Genomics
Gene expression
Omics
RNA
RNA splicing