Cap analysis gene expression (CAGE) is a
gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the
messenger RNA population in a biological sample (the
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
). The small fragments (historically 27
nucleotide
Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s long, but now limited only by sequencing technologies) from the very beginnings of mRNAs (5' ends of
capped
In sport, a cap is a player's appearance in a game at international level. The term dates from the practice in the United Kingdom of awarding a cap to every player in an international match of rugby football and association football. In the ea ...
transcripts) are extracted,
reverse-transcribed to cDNA,
PCR amplified (if needed) and
sequenced
In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which suc ...
. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003.
CAGE has been extensively used within the
FANTOM FANTOM (Functional Annotation of the Mouse/Mammalian Genome) is an international research consortium first established in 2000 as part of the RIKEN research institute in Japan. The original meeting gathered international scientists from diverse bac ...
research projects.
Analysis
The output of CAGE is a set of short nucleotide sequences (often called ''tags'' in analogy to
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
s) with their observed counts. Copy numbers of CAGE tags provide a digital quantification of the RNA transcript abundances in biological samples. Using a reference genome, a researcher can usually determine, with some confidence, the original
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
(and therefore which
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
) the tag was extracted from.
Unlike a similar technique
serial analysis of gene expression
Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those tra ...
(SAGE) in which tags come from other parts of transcripts, CAGE is primarily used to locate exact
transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, the fir ...
start sites in the genome. This knowledge in turn allows a researcher to investigate
promoter structure necessary for gene expression.
CAGE tags tend to start with an extra
guanine
Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
(G) that is not encoded in the genome, which is attributed to the template-free 5′-extension during the first-strand
cDNA synthesis
or reverse-transcription of the cap itself.
When not corrected, this can induce erroneous mapping of CAGE tags, for instance to nontranscribed pseudogenes.
On the other hand, this addition of Gs was also utilised as a signal to filter more reliable TSS peaks.
History
The original CAGE method (Shiraki ''et al.'', 2003)
was using CAP Trapper
for capturing the 5′ ends, oligo-dT primers for synthesizing the cDNAs, the
type IIs restriction enzyme MmeI for cleaving the tags, and the
Sanger method for sequencing them.
Random reverse-transcription primers were introduced in 2006 by Kodzius ''et al.''
to better detect the non-polyadenylated RNAs.
In ''DeepCAGE'' (Valen ''et al.'', 2008),
the tag concatemers were sequenced at a higher throughput on the
454
Year 454 ( CDLIV) was a common year starting on Friday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Aetius and Studius (or, less frequently, year 1207 ''Ab urbe condit ...
“''next-generation''” sequencing platform.
In 2008, barcode multiplexing was added to the DeepCAGE protocol (Maeda ''et al.'', 2008).
In ''nanoCAGE'' (Plessy ''et al.'', 2010),
the 5′ ends or RNAs were captured with the template-switching method instead of CAP Trapper, in order to analyze smaller starting amounts of total RNA. Longer tags were cleaved with the
type III restriction enzyme EcoP15I and directly sequenced on the
Solexa (then Illumina) platform without concatenation.
The ''CAGEscan'' methodology (Plessy ''et al.'', 2010),
where the enzymatic tag cleavage is skipped, and the 5′ cDNAs sequenced
paired-end, was introduced in the same article to connect novel promoters to known annotations.
With ''HeliScopeCAGE'' (Kanamori-Katayama ''et al.'', 2011),
the CAP-trapped CAGE protocol was changed to skip the enzymatic tag cleavage and sequence directly the capped 5′ ends on the
HeliScope platform, without PCR amplification. It was then automated by Itoh ''et al.''
in 2012.
In 2012, the standard CAGE protocol was updated by Takahashi ''et al.''
to cleave tags with EcoP15I and sequence them on the Illumina-Solexa platform.
In 2013, Batut ''et al.''
combined CAP trapper, template switching, and 5′-phosphate-dependent exonuclease digestion in ''RAMPAGE'' to maximize promoter specificity.
In 2014, Murata ''et al.''
published the ''nAnTi-CAGE'' protocol, where capped 5′ ends are sequenced on the Illumina platform with no PCR amplification and no tag cleavage.
In 2017, Poulain ''et al.''
updated the ''nanoCAGE'' protocol to use the ''tagmentation'' method (based on
Tn5 transposition) for multiplexing.
In 2018, Cvetesic '' et al.''
increased the sensitivity of CAP-trapped CAGE by introducing selectively degradable carrier RNA (SLIC-CAGE, "Super-Low Input Carrier-CAGE").
In 2021, Takahashi ''et al.''
simplified the sequencing of CAGE libraries on Illumina sequencers by skipping second-strand synthesis directly loading single-strand cDNAs (Low Quantity Single Strand CAGE, "LQ-ssCAGE").
See also
*
Serial analysis of gene expression
Serial Analysis of Gene Expression (SAGE) is a transcriptomic technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those tra ...
*
RNA-Seq
*
Transcriptomics
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. H ...
References
External links
CAGE homepageat the
RIKEN Omics Science Center.
Protocols pageon the FANTOM5 website.
{{DEFAULTSORT:Cap Analysis Gene Expression
Molecular biology
Gene expression