Bowtie (sequence Analysis)
   HOME
*





Bowtie (sequence Analysis)
Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of 2017, the ''Genome Biology'' paper describing the original Bowtie method has been cited more than 11,000 times. Bowtie is open-source software and is currently maintained by Johns Hopkins University. History The Bowtie sequence aligner was originally developed by Ben Langmead ''et al.'' at the University of Maryland in 2009. The aligner is typically used with short reads and a large reference genome, or for whole genome analysis. Bowtie is promoted as "an ultrafast, memory-efficient short aligner for short DNA sequences." The speed increase of Bowtie is partly due to implementing the Burrows–Wheeler transform for aligning, which reduces the memory footprint (typically to around 2.2GB for the human genome); a similar method is used by t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Ben Langmead
Ben Langmead is a computational biologist and associate professor in the Computational Biology & Medicine Group at Johns Hopkins University. Education Langmead gained his Bachelor of Arts degree in computer science from Columbia College, Columbia University in 2003. He gained both his Master of Science and PhD degrees in computer science from the University of Maryland, supervised by Steven Salzberg, in 2009 and 2012, respectively. Career Langmead is known for developing the Bowtie sequence alignment algorithm, which implements the Burrows–Wheeler transform in order to improve the scalability of sequence alignment. , the original ''Genome Biology ''Genome Biology'' is a peer-reviewed open access scientific journal covering research in genomics. It was established in 2000 and is published by BioMed Central. The chief editor is currently Andrew Cosgrove (BioMed Central, New York). Abstractin ...'' paper which describes the Bowtie software package has been cited over 20,000 tim ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Reference Genome
A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. For example, the most recent human reference genome (assembly '' GRCh38/hg38'') is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers s ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bioinformatics Software
The list of bioinformatics software tools can be split up according to the license used: *List of proprietary bioinformatics software *List of open-source bioinformatics software Alternatively, here is a categorization according to the respective bioinformatics subfield specialized on: *Sequence analysis software **List of sequence alignment software ** List of alignment visualization software **Alignment-free sequence analysis **De novo sequence assemblers **List of gene prediction software ** List of disorder prediction software ** List of Protein subcellular localization prediction tools **List of phylogenetics software **List of phylogenetic tree visualization software ** :Metagenomics_software *Structural biology software **List of molecular graphics systems **List of protein-ligand docking software **List of RNA structure prediction software **List of software for protein model error verification **List of protein secondary structure prediction programs **List of protein struct ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bioinformatics Algorithms
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for '' in silico'' analyses of biological queries using computational and statistical techniques. Bioinformatics includes biological studies that use computer programming as part of their methodology, as well as specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of bioinformatics include the identification of candidates genes and single nucleotide polymorphisms (SNPs). Often, such identification is made with the aim to better understand the genetic basis of disease, unique adaptations, desirable properties ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Suffix Array
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics. Suffix arrays were introduced by as a simple, space efficient alternative to suffix trees. They had independently been discovered by Gaston Gonnet in 1987 under the name ''PAT array'' . gave the first in-place \mathcal(n) time suffix array construction algorithm that is optimal both in time and space, where ''in-place'' means that the algorithm only needs \mathcal(1) additional space beyond the input string and the output suffix array. Enhanced suffix arrays (ESAs) are suffix arrays with additional tables that reproduce the full functionality of suffix trees preserving the same time and memory complexity. The suffix array for a subset of all suffixes of a string is called sparse suffix array. Multiple probabilistic algorithms have been developed to minimize the additiona ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




FM-index
In computer science, an FM-index is a compressed full-text substring index based on the Burrows–Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,Paolo Ferragina and Giovanni Manzini (2000)"Opportunistic Data Structures with Applications".Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p.390. who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space. It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. The query time, as well as the required storage space, has a sublinear complexity with respect to the size of the input data. The original authors have devised improvements to their original approach and dubbed it "FM-Index version 2". A further improvemen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fork (software Development)
In software engineering, a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct and separate piece of software. The term often implies not merely a development branch, but also a split in the developer community; as such, it is a form of schism. Grounds for forking are varying user preferences and stagnated or discontinued development of the original software. Free and open-source software is that which, by definition, may be forked from the original development team without prior permission, and without violating copyright law. However, licensed forks of proprietary software (''e.g.'' Unix) also happen. Etymology The word "fork" has been used to mean "to divide in branches, go separate ways" as early as the 14th century. In the software environment, the word evokes the fork system call, which causes a running process to split itself into two (almost) identical copies that (ty ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bioconductor
Bioconductor is a Free software, free, Open-source software, open source and Open source software development, open development software project for the analysis and comprehension of Genome, genomic data generated by Wet laboratory, wet lab experiments in molecular biology. Bioconductor is based primarily on the statistics, statistical R (programming language), R programming language, but does contain contributions in other programming languages. It has two Software release life cycle, releases each year that follow the semiannual releases of R. At any one time there is a Software versioning, release version, which corresponds to the released version of R, and a Software versioning, development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many genome annotation packages available that are mainly, but not solely, oriented towards different types of microarrays. While computati ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


TopHat (bioinformatics)
TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies (e.g. RNA-Seq) using Bowtie first and then mapping to a reference genome to discover RNA splice sites ''de novo''. TopHat aligns RNA-Seq reads to mammalian-sized genomes. History TopHat was originally developed in 2009 by Cole Trapnell, Lior Pachter and Steven Salzberg at the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park and at the Mathematics Department, UC Berkeley. TopHat2 was a collaborative effort of Daehwan Kim and Steven Salzberg, initially at the University of Maryland, College Park and later at the Center for Computational Biology at Johns Hopkins University. Kim re-wrote some of Trapnell's original TopHat code in C++ to make it much faster, and added many heuristics to improve its accuracy, in a collaboration with Cole Trapnell and others. Kim and Salzberg also developed TopH ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Depth-first Search
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph. A version of depth-first search was investigated in the 19th century by French mathematician Charles Pierre Trémaux as a strategy for solving mazes. Properties The time and space analysis of DFS differs according to its application area. In theoretical computer science, DFS is typically used to traverse an entire graph, and takes time where , V, is the number of vertices and , E, the number of edges. This is linear in the size of the graph. In these applications it also uses space O(, V, ) in the worst case to store the stack of vertices on th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Short Oligonucleotide Analysis Package
SOAP (Short Oligonucleotide Analysis Package) is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data. All programs in the SOAP package may be used free of charge and are distributed under the GPL open source software license. Functionality The SOAP suite of tools can be used to perform the following genome assembly tasks: Sequence Alignment ''SOAPaligner'' (SOAP2) is specifically designed for fast alignment of short reads and performs favorably with respect to similar alignment tools such as Bowtie and MAQ. Genome Assembly ''SOAPdenovo'' is a short read ''de novo'' assembler utilizing De Bruijn graph construction. It is optimized for short reads such as that generated by Illumina and is capable of assembling large genomes such as the human genome. ''SOAPdenovo'' was used to assemble the genome of the gi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Memory Footprint
Memory footprint refers to the amount of main memory that a program uses or references while running. The word footprint generally refers to the extent of physical dimensions that an object occupies, giving a sense of its size. In computing, the memory footprint of a software application indicates its runtime memory requirements, while the program executes. This includes all sorts of active memory regions like code segment containing (mostly) program instructions (and occasionally constants), data segment (both initialized and uninitialized), heap memory, call stack, plus memory required to hold any additional data structures, such as symbol tables, debugging data structures, open files, shared libraries mapped to the current process, etc., that the program ever needs while executing and will be loaded at least once during the entire run. Larger programs have larger memory footprints. An application's memory footprint is roughly proportionate to the number and sizes of shared lib ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]