BAM Format
   HOME
*





BAM Format
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction and format conversion. SAM files can be very large (10s of Gigabytes is common), so compression is used to save space. SAM files are human-readable text files, and BAM files are simply their binary equivalent, whilst CRAM files are a restructured column-oriented binary container format. BAM files are typically compressed and more efficient for software to work with than SAM. SAMtools makes it possible to work directly with a compressed BAM file, without having to uncompress the whole file. Additionally, since the format for a SAM/BAM file is somewhat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Heng Li
Heng Li is a Chinese bioinformatics scientist. He is an associate professor at the department of Biomedical Informatics of Harvard Medical School and the department of Data Science of Dana-Farber Cancer Institute. He was previously a research scientist working at the Broad Institute in Cambridge, Massachusetts with David Reich and David Altshuler. Li's work has made several important contributions in the field of next generation sequencing. Education Li majored in physics at Nanjing University during 1997–2001. He received his PhD from the Institute of Theoretical Physics at the Chinese Academy of Sciences in 2006. His thesis, titled "Constructing the TreeFam database", was supervised by Wei-Mou Zheng. Research Li was involved in a number of projects while working at the Beijing Genomics Institute from 2002 to 2006. These included studying rice finishing, silkworm sequencing, and genetic variation in chickens. From 2006 to 2009, Li worked on a postdoctoral research fellowship ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Format Conversion
Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires that data contains, for example, parity bit checks. Similarly, the operating system is predicated on certain standards for data and file handling. Furthermore, each computer program handles data in a different manner. Whenever any one of these variables is changed, data must be converted in some way before it can be used by a different computer, operating system or program. Even different versions of these elements usually involve different data structures. For example, the changing of bits from one format to another, usually for the purpose of application interoperability or of the capability of using new features, is merely a data conversion. Data conversions may be as simple as the conversion of a text file from one character encoding syste ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bioinformatics Algorithms
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for '' in silico'' analyses of biological queries using computational and statistical techniques. Bioinformatics includes biological studies that use computer programming as part of their methodology, as well as specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of bioinformatics include the identification of candidates genes and single nucleotide polymorphisms (SNPs). Often, such identification is made with the aim to better understand the genetic basis of disease, unique adaptations, desirable properties ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Pileup Format
Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates visual display of SNP/indel calling and alignment. It was first used by Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, and became widely known through its implementation within the SAMtools software suite. Format Example The columns Each line consists of 5 (or optionally 6) tab-separated columns: #Sequence identifier #Position in sequence (starting from 1) #Reference nucleotide at that position #Number of aligned reads covering that position (depth of coverage) #Bases at that position from aligned reads #Phred Quality of those bases, represented in ASCII with -33 offset (OPTIONAL) Column 5: The bases string *. (dot) means a base that matched the reference on the forward strand *, (comma) means a base that matched the reference on the reverse strand * (less-/greater-than sign) denotes a reference skip. This occurs, for example, if ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Pileup Format
Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates visual display of SNP/indel calling and alignment. It was first used by Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, and became widely known through its implementation within the SAMtools software suite. Format Example The columns Each line consists of 5 (or optionally 6) tab-separated columns: #Sequence identifier #Position in sequence (starting from 1) #Reference nucleotide at that position #Number of aligned reads covering that position (depth of coverage) #Bases at that position from aligned reads #Phred Quality of those bases, represented in ASCII with -33 offset (OPTIONAL) Column 5: The bases string *. (dot) means a base that matched the reference on the forward strand *, (comma) means a base that matched the reference on the reverse strand * (less-/greater-than sign) denotes a reference skip. This occurs, for example, if ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Standard Streams
In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), standard output (stdout) and standard error (stderr). Originally I/O happened via a physically connected system console (input via keyboard, output via monitor), but standard streams abstract this. When a command is executed via an interactive shell, the streams are typically connected to the text terminal on which the shell is running, but can be changed with redirection or a pipeline. More generally, a child process inherits the standard streams of its parent process. Application Users generally know standard streams as input and output channels that handle data coming from an input device, or that write data from the application. The data may be text with any encoding, or binary data. In many modern systems, the standard error str ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Conveyor Belt
A conveyor belt is the carrying medium of a belt conveyor system (often shortened to belt conveyor). A belt conveyor system is one of many types of conveyor systems. A belt conveyor system consists of two or more pulleys (sometimes referred to as drums), with a closed loop of carrying medium—the conveyor belt—that rotates about them. One or both of the pulleys are powered, moving the belt and the material on the belt forward. The powered pulley is called the drive pulley while the unpowered pulley is called the idler pulley. There are two main industrial classes of belt conveyors; Those in general material handling such as those moving boxes along inside a factory and bulk material handling such as those used to transport large volumes of resources and agricultural materials, such as grain, salt, coal, ore, sand, overburden and more. Overview Conveyors are durable and reliable components used in automated distribution and warehousing, as well as manufacturing and produ ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Stream (computing)
In computer science, a stream is a sequence of data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches. Streams are processed differently from batch data – normal functions cannot operate on streams as a whole, as they have potentially unlimited data, and formally, streams are '' codata'' (potentially unlimited), not data (which is finite). Functions that operate on a stream, producing another stream, are known as filters, and can be connected in pipelines, analogously to function composition. Filters may operate on one item of a stream at a time, or may base an item of output on multiple items of input, such as a moving average. Examples The term "stream" is used in a number of similar ways: * "Stream editing", as with sed, awk, and perl. Stream editing processes a file or files, in-place, without having to load the file(s) into a user interface. One example of such use is to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others. Initially intended for use inside the Bell System, AT&T licensed Unix to outside parties in the late 1970s, leading to a variety of both academic and commercial Unix variants from vendors including University of California, Berkeley (Berkeley Software Distribution, BSD), Microsoft (Xenix), Sun Microsystems (SunOS/Solaris (operating system), Solaris), Hewlett-Packard, HP/Hewlett Packard Enterprise, HPE (HP-UX), and IBM (IBM AIX, AIX). In the early 1990s, AT&T sold its rights in Unix to Novell, which then sold the UNIX trademark to The Open Group, an industry consortium founded in 1996. The Open Group allows the use of the mark for certified operating systems that comply with the Single UNIX Specification (SUS). Unix systems are chara ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Sequence Alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data. Interpretation If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another. In sequence alignments of proteins, the degree of similarity between amino acids occupying a parti ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Gigabyte
The gigabyte () is a multiple of the unit byte for digital information. The prefix ''giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB. This definition is used in all contexts of science (especially data science), engineering, business, and many areas of computing, including storage capacities of hard drives, solid state drives, and tapes, as well as data transmission speeds. However, the term is also used in some fields of computer science and information technology to denote (10243 or 230) bytes, particularly for sizes of RAM. Thus, prior to 1998, some usage of ''gigabyte'' has been ambiguous. To resolve this difficulty, IEC 80000-13 clarifies that a ''gigabyte'' (GB) is 109 bytes and specifies the term ''gibibyte'' (GiB) to denote 230 bytes. These differences are still readily seen for example, when a 400 GB drive's capacity is displayed by Microsoft Windows as 372 G ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Burrows-Wheeler Aligner
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database search only *Sequence type: protein or nucleotide Pairwise alignment *Sequence type: protein or nucleotide **Alignment type: local or global Multiple sequence alignment *Sequence type: protein or nucleotide. **Alignment type: local or global Genomics analysis *Sequence type: protein or nucleotide Motif finding *Sequence type: protein or nucleotide Benchmarking Alignment viewers, editors Please see List of alignment visualization software. Short-read sequence alignment See also * List of open source bioinformatics software References {{Reflist Sequence Sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]