TopHat (bioinformatics)
   HOME

TheInfoList



OR:

TopHat is an open-source
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
tool for the throughput alignment of shotgun cDNA sequencing reads generated by
transcriptomics technologies Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. ...
(e.g.
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
) using
Bowtie The bow tie is a type of necktie. A modern bow tie is tied using a common shoelace knot, which is also called the bow knot for that reason. It consists of a ribbon of fabric tied around the collar of a shirt in a symmetrical manner so that the ...
first and then mapping to a
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assemble ...
to discover RNA splice sites ''de novo''. TopHat aligns RNA-Seq reads to mammalian-sized genomes.


History

TopHat was originally developed in 2009 by Cole Trapnell,
Lior Pachter Lior Samuel Pachter is a computational biologist. He works at the California Institute of Technology, where he is the Bren Professor of Computational Biology. He has widely varied research interests including genomics, combinatorics, computational ...
and
Steven Salzberg Steven Lloyd Salzberg (born 1960) is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is als ...
at the Center for Bioinformatics and Computational Biology at the
University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public land-grant research university in College Park, Maryland. Founded in 1856, UMD is the flagship institution of the University System of Mary ...
and at the Mathematics Department,
UC Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public university, public land-grant university, land-grant research university in Berkeley, California. Established in 1868 as the University of Californi ...
. TopHat2 was a collaborative effort of Daehwan Kim and Steven Salzberg, initially at the
University of Maryland, College Park The University of Maryland, College Park (University of Maryland, UMD, or simply Maryland) is a public land-grant research university in College Park, Maryland. Founded in 1856, UMD is the flagship institution of the University System of Mary ...
and later at the Center for Computational Biology at
Johns Hopkins University Johns Hopkins University (Johns Hopkins, Hopkins, or JHU) is a private university, private research university in Baltimore, Maryland. Founded in 1876, Johns Hopkins is the oldest research university in the United States and in the western hem ...
. Kim re-wrote some of Trapnell's original TopHat code in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
to make it much faster, and added many heuristics to improve its accuracy, in a collaboration with Cole Trapnell and others. Kim and Salzberg also developed TopHat-fusion which used
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
data to discover gene fusions in cancer tissues.


Uses

TopHat is used to align reads from an RNA-Seq experiment. It is a read-mapping algorithm and it aligns the reads to a reference genome. It is useful because it does not need to rely on known splice sites. TopHat can be used with the
Tuxedo Black tie is a semi-formal Western dress code for evening events, originating in British and American conventions for attire in the 19th century. In British English, the dress code is often referred to synecdochically by its principal element fo ...
pipeline, and is frequently used with
Bowtie The bow tie is a type of necktie. A modern bow tie is tied using a common shoelace knot, which is also called the bow knot for that reason. It consists of a ribbon of fabric tied around the collar of a shirt in a symmetrical manner so that the ...
.


Advantages/Disadvantages


Advantages

When TopHat first came out, it was faster than previous systems. It mapped more than 2.2 million reads per CPU hour. That speed allowed the user to process and entire RNA-Seq experiment in less than a day, even on a standard desktop computer. Tophat uses Bowtie in the beginning to analyze the reads, but then does more to analyze the reads that span exon-exon junctions. If you are using TopHat for RNA-Seq data, you will get more read aligned against the reference genome. Another advantage for TopHat is that it does not need to rely on known splice sites when aligning reads to a reference genome.


Disadvantages

TopHat is in a low maintenance, low support stage, and contains software bugs that have spawned 3rd party post-processing software to correct. It has been superseded by HISAT2, which is more efficient and accurate and provides the same core functionality (spliced alignment of RNA-Seq reads).


See also

*
Bowtie (sequence analysis) Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of 2 ...
*
List of RNA-Seq bioinformatics tools RNA-Seq is a technique that allows transcriptome studies (see also Transcriptomics technologies) based on next-generation sequencing technologies. This technique is largely dependent on bioinformatics tools developed to support the different steps ...
*
Microarray analysis techniques Microarray analysis techniques are used in interpreting the data generated from experiments on DNA (Gene chip analysis), RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many ...
*
next generation sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
*
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...


References


External links


TopHat page on Center for Computational Biology at JHU
{{Bioinformatics Bioinformatics algorithms Bioinformatics software Laboratory software Software using the Artistic license