transcriptomics technologies
   HOME

TheInfoList



OR:

Transcriptomics technologies are the techniques used to study an organism's
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
and expressed through
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
. Here,
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
serves as a transient intermediary molecule in the information network, whilst
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally im ...
s perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a
cell Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...
. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated. The first attempts to study whole transcriptomes began in the early 1990s. Subsequent technological advances since the late 1990s have repeatedly transformed the field and made transcriptomics a widespread discipline in biological sciences. There are two key contemporary techniques in the field: microarrays, which quantify a set of predetermined sequences, and RNA-Seq, which uses
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
to record all transcripts. As the technology improved, the volume of data produced by each transcriptome experiment increased. As a result, data analysis methods have steadily been adapted to more accurately and efficiently analyse increasingly large volumes of data. Transcriptome databases getting bigger and more useful as transcriptomes continue to be collected and shared by researchers. It would be almost impossible to interpret the information contained in a transcriptome without the knowledge of previous experiments. Measuring the expression of an organism's
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s in different tissues or conditions, or at different times, gives information on how genes are
regulated Regulation is the management of complex systems according to a set of rules and trends. In systems theory, these types of rules exist in various fields of biology and society, but the term has slightly different meanings according to context. Fo ...
and reveals details of an organism's biology. It can also be used to infer the functions of previously unannotated genes. Transcriptome analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
. An analysis of gene expression in its entirety allows detection of broad coordinated trends which cannot be discerned by more targeted assays.


History

Transcriptomics has been characterised by the development of new techniques which have redefined what is possible every decade or so and rendered previous technologies obsolete. The first attempt at capturing a partial human transcriptome was published in 1991 and reported 609
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
sequences from the
human brain The human brain is the central organ (anatomy), organ of the human nervous system, and with the spinal cord makes up the central nervous system. The brain consists of the cerebrum, the brainstem and the cerebellum. It controls most of the act ...
. In 2008, two human transcriptomes, composed of millions of transcript-derived sequences covering 16,000 genes, were published, and by 2015 transcriptomes had been published for hundreds of individuals. Transcriptomes of different
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
states, tissues, or even single cells are now routinely generated. This explosion in transcriptomics has been driven by the rapid development of new technologies with improved sensitivity and economy.


Before transcriptomics

Studies of individual transcripts were being performed several decades before any transcriptomics approaches were available.
Libraries A library is a collection of Document, materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or electronic media, digital access (soft copies) materials, and may be a ...
of silkmoth mRNA transcripts were collected and converted to
complementary DNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a spec ...
(cDNA) for storage using reverse transcriptase in the late 1970s. In the 1980s, low-throughput sequencing using the Sanger method was used to sequence random transcripts, producing
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
s (ESTs). The Sanger method of sequencing was predominant until the advent of high-throughput methods such as
sequencing by synthesis Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. I ...
(Solexa/Illumina). ESTs came to prominence during the 1990s as an efficient method to determine the gene content of an organism without sequencing the entire
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
. Amounts of individual transcripts were quantified using Northern blotting, nylon membrane arrays, and later reverse transcriptase quantitative PCR (RT-qPCR) methods, but these methods are laborious and can only capture a tiny subsection of a transcriptome. Consequently, the manner in which a transcriptome as a whole is expressed and regulated remained unknown until higher-throughput techniques were developed.


Early attempts

The word "transcriptome" was first used in the 1990s. In 1995, one of the earliest sequencing-based transcriptomic methods was developed,
serial analysis of gene expression Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those tra ...
(SAGE), which worked by
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
of concatenated random transcript fragments. Transcripts were quantified by matching the fragments to known genes. A variant of SAGE using high-throughput sequencing techniques, called digital gene expression analysis, was also briefly used. However, these methods were largely overtaken by high throughput sequencing of entire transcripts, which provided additional information on transcript structure such as splice variants.


Development of contemporary techniques

The dominant contemporary techniques,
microarrays A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon ...
and RNA-Seq, were developed in the mid-1990s and 2000s. Microarrays that measure the abundances of a defined set of transcripts via their hybridisation to an array of complementary probes were first published in 1995. Microarray technology allowed the assay of thousands of transcripts simultaneously and at a greatly reduced cost per gene and labour saving. Both spotted oligonucleotide arrays and Affymetrix high-density arrays were the method of choice for transcriptional profiling until the late 2000s. Over this period, a range of microarrays were produced to cover known genes in
model A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
or economically important organisms. Advances in design and manufacture of arrays improved the specificity of probes and allowed more genes to be tested on a single array. Advances in
fluorescence detection A fluorometer, fluorimeter or fluormeter is a device used to measure parameters of visible spectrum fluorescence: its intensity and wavelength distribution of emission spectrum after excitation by a certain spectrum of light. These parameters ...
increased the sensitivity and measurement accuracy for low abundance transcripts. RNA-Seq is accomplished by reverse transcribing RNA ''in vitro'' and sequencing the resulting
cDNAs In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a spe ...
. Transcript abundance is derived from the number of counts from each transcript. The technique has therefore been heavily influenced by the development of high-throughput sequencing technologies.
Massively parallel signature sequencing Massive parallel signature sequencing (MPSS) is a procedure that is used to identify and quantify mRNA transcripts, resulting in data similar to serial analysis of gene expression (SAGE), although it employs a series of biochemical and sequencing ...
(MPSS) was an early example based on generating 16–20  bp sequences via a complex series of hybridisations,In molecular biology, hybridisation is a phenomenon in which single-stranded deoxyribonucleic acid ( DNA) or ribonucleic acid ( RNA) molecules  anneal to  complementary DNA or RNA. and was used in 2004 to validate the expression of ten thousand genes in '' Arabidopsis thaliana''. The earliest RNA-Seq work was published in 2006 with one hundred thousand transcripts sequenced using 454 technology. This was sufficient coverage to quantify relative transcript abundance. RNA-Seq began to increase in popularity after 2008 when new Solexa/Illumina technologies allowed one billion transcript sequences to be recorded. This yield now allows for the quantification and comparison of human transcriptomes.


Data gathering

Generating data on RNA transcripts can be achieved via either of two main principles: sequencing of individual transcripts ( ESTs, or RNA-Seq) or hybridisation of transcripts to an ordered array of nucleotide probes (microarrays).


Isolation of RNA

All transcriptomic methods require RNA to first be isolated from the experimental organism before transcripts can be recorded. Although biological systems are incredibly diverse,
RNA extraction RNA extraction is the purification of RNA from biological samples. This procedure is complicated by the ubiquitous presence of ribonuclease enzymes in cells and tissues, which can rapidly degrade RNA. Several methods are used in molecular biology ...
techniques are broadly similar and involve mechanical disruption of cells or tissues, disruption of
RNase Ribonuclease (commonly abbreviated RNase) is a type of nuclease that catalyzes the degradation of RNA into smaller components. Ribonucleases can be divided into endoribonucleases and exoribonucleases, and comprise several sub-classes within t ...
with chaotropic salts, disruption of macromolecules and nucleotide complexes, separation of RNA from undesired biomolecules including DNA, and concentration of the RNA via
precipitation In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls under gravitational pull from clouds. The main forms of precipitation include drizzle, rain, sleet, snow, ice pellets, graupel and hail. ...
from solution or elution from a solid matrix. Isolated RNA may additionally be treated with
DNase Deoxyribonuclease (DNase, for short) refers to a group of glycoprotein endonucleases which are enzymes that catalyze the hydrolytic cleavage of phosphodiester linkages in the DNA backbone, thus degrading DNA. The role of the DNase enzyme in cells ...
to digest any traces of DNA. It is necessary to enrich messenger RNA as total RNA extracts are typically 98%
ribosomal RNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosom ...
. Enrichment for transcripts can be performed by poly-A affinity methods or by depletion of ribosomal RNA using sequence-specific probes. Degraded RNA may affect downstream results; for example, mRNA enrichment from degraded samples will result in the depletion of 5’ mRNA ends and an uneven signal across the length of a transcript. Snap-freezing of tissue prior to RNA isolation is typical, and care is taken to reduce exposure to RNase enzymes once isolation is complete.


Expressed sequence tags

An
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
(EST) is a short nucleotide sequence generated from a single RNA transcript. RNA is first copied as
complementary DNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a spec ...
(cDNA) by a reverse transcriptase enzyme before the resultant cDNA is sequenced. Because ESTs can be collected without prior knowledge of the organism from which they come, they can be made from mixtures of organisms or environmental samples. Although higher-throughput methods are now used, EST libraries commonly provided sequence information for early microarray designs; for example, a
barley Barley (''Hordeum vulgare''), a member of the grass family, is a major cereal grain grown in temperate climates globally. It was one of the first cultivated grains, particularly in Eurasia as early as 10,000 years ago. Globally 70% of barley p ...
microarray was designed from 350,000 previously sequenced ESTs.


Serial and cap analysis of gene expression (SAGE/CAGE)

Serial analysis of gene expression Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those tra ...
(SAGE) was a development of EST methodology to increase the throughput of the tags generated and allow some quantitation of transcript abundance. cDNA is generated from the RNA but is then digested into 11 bp "tag" fragments using
restriction enzyme A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
s that cut DNA at a specific sequence, and 11 base pairs along from that sequence. These cDNA tags are then joined head-to-tail into long strands (>500 bp) and sequenced using low-throughput, but long read-length methods such as
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Fred ...
. The sequences are then divided back into their original 11 bp tags using computer software in a process called
deconvolution In mathematics, deconvolution is the operation inverse to convolution. Both operations are used in signal processing and image processing. For example, it may be possible to recover the original signal after a filter (convolution) by using a deco ...
. If a high-quality
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assemble ...
is available, these tags may be matched to their corresponding gene in the genome. If a reference genome is unavailable, the tags can be directly used as diagnostic markers if found to be differentially expressed in a disease state. The cap analysis gene expression (CAGE) method is a variant of SAGE that sequences tags from the 5’ end of an mRNA transcript only. Therefore, the transcriptional start site of genes can be identified when the tags are aligned to a reference genome. Identifying gene start sites is of use for promoter analysis and for the
cloning Cloning is the process of producing individual organisms with identical or virtually identical DNA, either by natural or artificial means. In nature, some organisms produce clones through asexual reproduction. In the field of biotechnology, c ...
of full-length cDNAs. SAGE and CAGE methods produce information on more genes than was possible when sequencing single ESTs, but sample preparation and data analysis are typically more labour-intensive.


Microarrays


Principles and advances

Microarrays usually consist of a grid of short nucleotide
oligomers In chemistry and biochemistry, an oligomer () is a molecule that consists of a few repeating units which could be derived, actually or conceptually, from smaller molecules, monomers.Quote: ''Oligomer molecule: A molecule of intermediate relativ ...
, known as " probes", typically arranged on a glass slide. Transcript abundance is determined by hybridisation of fluorescently labelled transcripts to these probes. The fluorescence intensity at each probe location on the array indicates the transcript abundance for that probe sequence. Groups of probes designed to measure the same transcript (i.e., hybridizing a specific transcript in different positions) are usually referred to as "probesets". Microarrays require some genomic knowledge from the organism of interest, for example, in the form of an
annotated An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For ann ...
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
sequence, or a
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
of ESTs that can be used to generate the probes for the array.


Methods

Microarrays for transcriptomics typically fall into one of two broad categories: low-density spotted arrays or high-density short probe arrays. Transcript abundance is inferred from the intensity of fluorescence derived from fluorophore-tagged transcripts that bind to the array. Spotted low-density arrays typically feature picolitreOne picolitre is about 30 million times smaller than a drop of water. drops of a range of purified
cDNAs In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a spe ...
arrayed on the surface of a glass slide. These probes are longer than those of high-density arrays and cannot identify alternative splicing events. Spotted arrays use two different
fluorophore A fluorophore (or fluorochrome, similarly to a chromophore) is a fluorescent chemical compound that can re-emit light upon light excitation. Fluorophores typically contain several combined aromatic groups, or planar or cyclic molecules with se ...
s to label the test and control samples, and the ratio of fluorescence is used to calculate a relative measure of abundance. High-density arrays use a single fluorescent label, and each sample is hybridised and detected individually. High-density arrays were popularised by the Affymetrix GeneChip array, where each transcript is quantified by several short 25
-mer {{Short pages monitor