SAMtools
   HOME
*





SAMtools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction and format conversion. SAM files can be very large (10s of Gigabytes is common), so compression is used to save space. SAM files are human-readable text files, and BAM files are simply their binary equivalent, whilst CRAM files are a restructured column-oriented binary container format. BAM files are typically compressed and more efficient for software to work with than SAM. SAMtools makes it possible to work directly with a compressed BAM file, without having to uncompress the whole file. Additionally, since the format for a SAM/BAM file is somewha ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


SAM (file Format)
Sequence Alignment Map (SAM) is a text-based format originally for storing biological sequences aligned to a reference sequence developed by Heng Li and Bob Handsaker ''et al''. It was developed when the 1000 Genomes Project wanted to move away from the MAQ mapper format and decided to design a new format. The overall TAB-delimited flavour of the format came from an earlier format inspired by BLAT’s PSL. The name of SAM came from Gabor Marth from University of Utah, who originally had a format under the same name but with a different syntax more similar to a BLAST output. It is widely used for storing data, such as nucleotide sequences, generated by next generation sequencing technologies, and the standard has been broadened to include unmapped sequences. The format supports short and long reads (up to 128 Mbp) produced by different sequencing platforms and is used to hold mapped data within the Genome Analysis Toolkit (GATK) and across the Broad Institute, the Wellco ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Heng Li
Heng Li is a Chinese bioinformatics scientist. He is an associate professor at the department of Biomedical Informatics of Harvard Medical School and the department of Data Science of Dana-Farber Cancer Institute. He was previously a research scientist working at the Broad Institute in Cambridge, Massachusetts with David Reich and David Altshuler. Li's work has made several important contributions in the field of next generation sequencing. Education Li majored in physics at Nanjing University during 1997–2001. He received his PhD from the Institute of Theoretical Physics at the Chinese Academy of Sciences in 2006. His thesis, titled "Constructing the TreeFam database", was supervised by Wei-Mou Zheng. Research Li was involved in a number of projects while working at the Beijing Genomics Institute from 2002 to 2006. These included studying rice finishing, silkworm sequencing, and genetic variation in chickens. From 2006 to 2009, Li worked on a postdoctoral research fellow ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Pileup Format
Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates visual display of SNP/indel calling and alignment. It was first used by Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, and became widely known through its implementation within the SAMtools software suite. Format Example The columns Each line consists of 5 (or optionally 6) tab-separated columns: #Sequence identifier #Position in sequence (starting from 1) #Reference nucleotide at that position #Number of aligned reads covering that position (depth of coverage) #Bases at that position from aligned reads #Phred Quality of those bases, represented in ASCII with -33 offset (OPTIONAL) Column 5: The bases string *. (dot) means a base that matched the reference on the forward strand *, (comma) means a base that matched the reference on the reverse strand * (less-/greater-than sign) denotes a reference skip. This occurs, for example, i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Pileup Format
Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates visual display of SNP/indel calling and alignment. It was first used by Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, and became widely known through its implementation within the SAMtools software suite. Format Example The columns Each line consists of 5 (or optionally 6) tab-separated columns: #Sequence identifier #Position in sequence (starting from 1) #Reference nucleotide at that position #Number of aligned reads covering that position (depth of coverage) #Bases at that position from aligned reads #Phred Quality of those bases, represented in ASCII with -33 offset (OPTIONAL) Column 5: The bases string *. (dot) means a base that matched the reference on the forward strand *, (comma) means a base that matched the reference on the reverse strand * (less-/greater-than sign) denotes a reference skip. This occurs, for example, i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


CRAM (file Format)
Compressed Reference-oriented Alignment Map (CRAM) is a compressed columnar file format for storing biological sequences aligned to a reference sequence, initially devised by Markus Hsi-Yang Fritz ''et al''. CRAM was designed to be an efficient reference-based alternative to the Sequence Alignment Map (SAM) and Binary Alignment Map (BAM) file formats. It optionally uses a genomic reference to describe differences between the aligned sequence fragments and the reference sequence, reducing storage costs. Additionally each column in the SAM format is separated into its own blocks, improving compression ratio. CRAM files typically vary from 30 to 60% smaller than BAM, depending on the data held within them. Implementations of CRAM exist in htsjdk, htslib, JBrowse, and Scramble. The file format specification is maintained by the Global Alliance for Genomics and Health (GA4GH) with the specification document available from the EBI cram toolkit page. File format The basic stru ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

BAM (file Format)
Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that ta ... representation of the Sequence Alignment Map-files. BAM is the compressed binary representation of SAM (Sequence Alignment Map), a compact and index-able representation of nucleotide sequence alignments. The goal of indexing is to retrieve alignments that overlap a specific location quickly without having to go through all of them. Before indexing, BAM must be sorted by reference ID and then leftmost coordinate. BAM is in compressed BGZF format. The structure of BAM files include a header section and an alignment section: * Header—The sample name, sample length, and alignment method are all included in th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


List Of Sequence Alignment Software
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database search only *Sequence type: protein or nucleotide Pairwise alignment *Sequence type: protein or nucleotide **Alignment type: local or global Multiple sequence alignment *Sequence type: protein or nucleotide. **Alignment type: local or global Genomics analysis *Sequence type: protein or nucleotide Motif finding *Sequence type: protein or nucleotide Benchmarking Alignment viewers, editors Please see List of alignment visualization software. Short-read sequence alignment See also * List of open source bioinformatics software References {{Reflist Sequence Sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Burrows-Wheeler Aligner
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database search only *Sequence type: protein or nucleotide Pairwise alignment *Sequence type: protein or nucleotide **Alignment type: local or global Multiple sequence alignment *Sequence type: protein or nucleotide. **Alignment type: local or global Genomics analysis *Sequence type: protein or nucleotide Motif finding *Sequence type: protein or nucleotide Benchmarking Alignment viewers, editors Please see List of alignment visualization software. Short-read sequence alignment See also * List of open source bioinformatics software References {{Reflist Sequence Sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple seq ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

DNA Sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. Knowledge of DNA sequences has become indispensable for basic biological research, DNA Genographic Projects and in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology and biological systematics. Comparing healthy and mutated DNA sequences can diagnose different diseases including various cancers, characterize antibody repertoire, and can be used to guide patient treatment. Having a quick way to sequence DNA allows for faster and more individualized medical care to be administered, and for more organisms to be identified and cataloged. The rapid speed of sequencing attained with modern ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




MIT License
The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license compatibility. Unlike copyleft software licenses, the MIT License also permits reuse within proprietary software, provided that all copies of the software or its substantial portions include a copy of the terms of the MIT License and also a copyright notice. , the MIT License was the most popular software license found in one analysis, continuing from reports in 2015 that the MIT License was the most popular software license on GitHub. Notable projects that use the MIT License include the X Window System, Ruby on Rails, Nim, Node.js, Lua, and jQuery. Notable companies using the MIT License include Microsoft ( .NET), Google ( Angular), and Meta (React). License terms The MIT License has the identifier MIT in the SPDX License List. It is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


BSD Licenses
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD license was used for its namesake, the Berkeley Software Distribution (BSD), a Unix-like operating system. The original version has since been revised, and its descendants are referred to as modified BSD licenses. BSD is both a license and a class of license (generally referred to as BSD-like). The modified BSD license (in wide use today) is very similar to the license originally used for the BSD version of Unix. The BSD license is a simple license that merely requires that all code retain the BSD license notice if redistributed in source code format, or reproduce the notice if redistributed in binary format. The BSD license (unlike some other licenses e.g. GPL) does not require that source code be distributed at all. Terms In additi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

C Language
C (''pronounced like the letter c'') is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems. A successor to the programming language B, C was originally developed at Bell Labs by Ritchie between 1972 and 1973 to construct utilities running on Unix. It was applied to re-implementing the kernel of the Unix operating system. During the 1980s, C gradually gained popularity. It has become one of the most widely used programming languages, with C compilers available for practically all modern computer architectures and operating systems. C has be ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]