Pileup Format
   HOME
*





Pileup Format
Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates visual display of SNP/indel calling and alignment. It was first used by Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, and became widely known through its implementation within the SAMtools software suite. Format Example The columns Each line consists of 5 (or optionally 6) tab-separated columns: #Sequence identifier #Position in sequence (starting from 1) #Reference nucleotide at that position #Number of aligned reads covering that position (depth of coverage) #Bases at that position from aligned reads #Phred Quality of those bases, represented in ASCII with -33 offset (OPTIONAL) Column 5: The bases string *. (dot) means a base that matched the reference on the forward strand *, (comma) means a base that matched the reference on the reverse strand * (less-/greater-than sign) denotes a reference skip. This occurs, for example, if ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


File Format
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats are designed for very particular types of data: PNG files, for example, store bitmapped images using lossless data compression. Other file formats, however, are designed for storage of several different types of data: the Ogg format can act as a container for different types of multimedia including any combination of audio and video, with or without text (such as subtitles), and metadata. A text file can contain any stream of characters, including possible control characters, and is encoded in one of various character encoding schemes. Some file formats, such as HTML, scalable vector graphics, and the source code of computer software are text files with defined syntaxes that allow them to be used for specific purposes. Specifications ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Single-nucleotide Polymorphism
In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more), many publications do not apply such a frequency threshold. For example, at a specific base position in the human genome, the G nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – G or A – are said to be the alleles for this specific position. SNPs pinpoint differences in our susceptibility to a wide range of diseases, for example age-related macular degeneration (a common SNP in the CFH gene is associated with increased risk of the disease) or nonalcoholic fatty liver disease (a SNP in the PNPLA3 gene is associated with inc ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Wellcome Trust Sanger Institute
The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust. It is located on the Wellcome Genome Campus by the village of Hinxton, outside Cambridge. It shares this location with the European Bioinformatics Institute. It was established in 1992 and named after double Nobel Laureate Frederick Sanger. It was conceived as a large scale DNA sequencing centre to participate in the Human Genome Project, and went on to make the largest single contribution to the gold standard sequence of the human genome. From its inception the institute established and has maintained a policy of data sharing, and does much of its research in collaboration. Since 2000, the institute expanded its mission to understand "the role of genetics in health and disease". The institute now employs around 900 people and engages in five main areas of research ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


SAMtools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction and format conversion. SAM files can be very large (10s of Gigabytes is common), so compression is used to save space. SAM files are human-readable text files, and BAM files are simply their binary equivalent, whilst CRAM files are a restructured column-oriented binary container format. BAM files are typically compressed and more efficient for software to work with than SAM. SAMtools makes it possible to work directly with a compressed BAM file, without having to uncompress the whole file. Additionally, since the format for a SAM/BAM file is somewhat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

The Base Quality String
''The'' () is a grammatical article in English, denoting persons or things that are already or about to be mentioned, under discussion, implied or otherwise presumed familiar to listeners, readers, or speakers. It is the definite article in English. ''The'' is the most frequently used word in the English language; studies and analyses of texts have found it to account for seven percent of all printed English-language words. It is derived from gendered articles in Old English which combined in Middle English and now has a single form used with nouns of any gender. The word can be used with both singular and plural nouns, and with a noun that starts with any letter. This is different from many other languages, which have different forms of the definite article for different genders or numbers. Pronunciation In most dialects, "the" is pronounced as (with the voiced dental fricative followed by a schwa) when followed by a consonant sound, and as (homophone of the archaic pron ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Regular Expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory. The concept of regular expressions began in the 1950s, when the American mathematician Stephen Cole Kleene formalized the concept of a regular language. They came into common use with Unix text-processing utilities. Different syntaxes for writing regular expressions have existed since the 1980s, one being the POSIX standard and another, widely used, being the Perl syntax. Regular expressions are used in search engines, in search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK, and in lexical analysis. Most gener ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are , which severely limited its scope. All modern computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set. The Internet Assigned Numbers Authority (IANA) prefers the name US-ASCII for this character encoding. ASCII is one of the List of IEEE milestones, IEEE milestones. Overview ASCII was developed from telegraph code. Its first commercial use was as a seven-bit teleprinter code promoted by Bell data services. Work on the ASCII standard began in May 1961, with the first meeting of the American Standards Association's (ASA) (now the American Nat ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Phred Quality Score
A Phred quality score is a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing. It was originally developed for the computer program Phred (software), Phred to help in the automation of DNA sequencing in the Human Genome Project. Phred quality scores are assigned to each nucleotide base call in automated sequencer traces. The FASTQ format encodes phred scores as ASCII characters alongside the read sequences. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods. Perhaps the most important use of Phred quality scores is the automatic determination of accurate, quality-based consensus sequences. Definition Phred quality scores Q are logarithmically related to the base-calling error probabilities P and defined as Q = -10 \ \log_ P. This relation can be also be written as P = 10^. For example, if Phred assigns a quali ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

FASTQ Format
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA formatted sequence and its quality data, but has recently become the ''de facto'' standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer. Format A FASTQ file has four line-separated fields per sequence: * Field 1 begins with a '@' character and is followed by a sequence identifier and an ''optional'' description (like a FASTA title line). * Field 2 is the raw sequence letters. * Field 3 begins with a '+' character and is ''optionally'' followed by the same sequence identifier (and any description) again. * Field 4 encodes the quality values for the sequence in Field 2, and must contain the same num ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


File Extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a full stop (period), but in some systems it is separated with spaces. Other extension formats include dashes and/or underscores on early versions of Linux and some versions of IBM AIX. Some file systems implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction. Usage Filename extensions may be considered a type of metadata. They are commonly used to imply information about the way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Accelrys
BIOVIA is a software company headquartered in the United States, with representation in Europe and Asia. It provides software for chemical, materials and bioscience research for the pharmaceutical, biotechnology, consumer packaged goods, aerospace, energy and chemical industries. Previously named Accelrys, it is a wholly owned subsidiary of Dassault Systèmes after an April 2014 acquisition and has been renamed BIOVIA. History Accelrys was formed in 2001 as a wholly owned subsidiary of Pharmacopeia, Inc. from the fusion of five companies: Molecular Simulations Inc., Synopsys Scientific Systems, Oxford Molecular, the Genetics Computer Group (GCG), and Synomics Ltd. MSI, itself a result of the combination of Biodesign, Cambridge Molecular Design, Polygen and, later, Biocad and Biosym Technologies. In late 2003, Pharmacopeia, Inc. separated its drug discovery and software development businesses. The drug discovery company retained the name Pharmacopeia and remained in Princeton ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Université De Montréal
The Université de Montréal (UdeM; ; translates to University of Montreal) is a French-language public research university in Montreal, Quebec, Canada. The university's main campus is located in the Côte-des-Neiges neighborhood of Côte-des-Neiges–Notre-Dame-de-Grâce on Mount Royal near the Outremont Summit (also called Mount Murray), in the borough of Outremont. The institution comprises thirteen faculties, more than sixty departments and two affiliated schools: the Polytechnique Montréal (School of Engineering; formerly the École polytechnique de Montréal) and HEC Montréal (School of Business). It offers more than 650 undergraduate programmes and graduate programmes, including 71 doctoral programmes. The university was founded as a satellite campus of the Université Laval in 1878. It became an independent institution after it was issued a papal charter in 1919 and a provincial charter in 1920. Université de Montréal moved from Montreal's Quartier Latin to its pr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]