N50, L50, And Related Statistics
   HOME
*





N50, L50, And Related Statistics
In computational biology, N50 and L50 are statistics of a set of contig or scaffold lengths. The ''N50'' is similar to a mean or median of lengths, but has greater weight given to the longer contigs. It is used widely in genome assembly, especially in reference to contig lengths within a draft assembly. There are also the related U50, UL50, UG50, UG50%, N90, NG50, and D50 statistics. To provide a better assessment of assembly output for viral and microbial datasets, a new metric called U50 should be used. The ''U50'' identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the ''N50'' metric. The use of the ''U50'' metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. Most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented ''N50'' v ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computational Biology
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer engineering which uses bioengineering to build computers. History Bioinformatics, the analysis of informatics processes in biological systems, began in the early 1970s. At this time, research in artificial intelligence was using network models of the human brain in order to generate new algorithms. This use of biological data pushed biological researchers to use computers to evaluate and compare large data sets in their own field. By 1982, researchers shared information via punch cards. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Contig
A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005. In bottom-up sequencing projects, a contig refers to overlapping sequence data ( reads); in top-down sequencing projects, contig refers to the overlapping clones that form a physical map of the genome that is used to guide sequencing and assembly.Dear, P. H. ''Genome Mapping''. Encyclopedia of Life Sciences, 2005. . Contigs can thus refer both to overlapping DNA sequences and to overlapping physical segments (fragments) contained in clones depending on the context. Original definition of contig In 1980, Staden wrote: ''In order to make it easier to talk about our data gained by the shotgun method of sequencing we have invented the word "contig". A contig is a set of gel readings that are related to one another by overlap of their sequences. All gel readings belong to one and only one con ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Scaffolding (bioinformatics)
Scaffolding is a technique used in bioinformatics. It is defined as follows: Link together a non-contiguous series of genomic sequences into a scaffold, consisting of sequences separated by gaps of known length. The sequences that are linked are typically contiguous sequences corresponding to read overlaps. When creating a draft genome, individual reads of DNA are second assembled into contigs, which, by the nature of their assembly, have gaps between them. The next step is to then bridge the gaps between these contigs to create a scaffold. This can be done using either optical mapping or mate-pair sequencing. Assembly software The sequencing of the ''Haemophilus influenzae'' genome marked the advent of scaffolding. That project generated a total of 140 contigs, which were oriented and linked using paired end reads. The success of this strategy prompted the creation of the software, Grouper, which was included in genome assemblers. Until 2001, this was the only scaffolding soft ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithmetic mean'', also known as "arithmetic average", is a measure of central tendency of a finite set of numbers: specifically, the sum of the values divided by the number of values. The arithmetic mean of a set of numbers ''x''1, ''x''2, ..., x''n'' is typically denoted using an overhead bar, \bar. If the data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic mean is the ''sample mean'' (\bar) to distinguish it from the mean, or expected value, of the underlying distribution, the ''population mean'' (denoted \mu or \mu_x).Underhill, L.G.; Bradfield d. (1998) ''Introstat'', Juta and Company Ltd.p. 181/ref> Outside probability and statistics, a wide range of other notions of mean are o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of a "typical" value. Median income, for example, may be a better way to suggest what a "typical" income is, because income distribution can be very skewed. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median is not an arbitrarily large or small result. Finite data set of numbers The median of a finite list of numbers is the "middle" number, when those numbers are list ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Genome Assembly
In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically, the short fragments (reads) result from shotgun sequencing genomic DNA, or gene transcript ( ESTs). The problem of sequence assembly can be compared to taking many copies of a book, passing each of them through a shredder with a different cutter, and piecing the text of the book back together just by looking at the shredded pieces. Besides the obvious difficulty of this task, there are some extra practical issues: the original may have many repeated paragraphs, and some shreds may be modified during shredding to have typos. Excerpts from another book may also be added in, and some shreds may be completely unrecognizable. Gen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Contiguity
Contiguity or contiguous may refer to: *Contiguous data storage, in computer science *Contiguity (probability theory) *Contiguity (psychology) *Contiguous distribution of species, in biogeography *Geographic contiguity of territorial land *Contiguous zone The term territorial waters is sometimes used informally to refer to any area of water over which a sovereign state has jurisdiction, including internal waters, the territorial sea, the contiguous zone, the exclusive economic zone, and potenti ... in territorial waters See also

* * {{disambig ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Nucleotide
Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules within all life-forms on Earth. Nucleotides are obtained in the diet and are also synthesized from common nutrients by the liver. Nucleotides are composed of three subunit molecules: a nucleobase, a five-carbon sugar (ribose or deoxyribose), and a phosphate group consisting of one to three phosphates. The four nucleobases in DNA are guanine, adenine, cytosine and thymine; in RNA, uracil is used in place of thymine. Nucleotides also play a central role in metabolism at a fundamental, cellular level. They provide chemical energy—in the form of the nucleoside triphosphates, adenosine triphosphate (ATP), guanosine triphosphate (GTP), cytidine triphosphate (CTP) and uridine triphosphate (UTP)—throughout the cell for the many cellular func ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Kilobase
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" (or "Watson–Crick–Franklin") base pairs (guanine–cytosine and adenine–thymine) allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins ca ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Quantile
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as ''quartiles'' (four groups), ''deciles'' (ten groups), and ''percentiles'' (100 groups). The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points. -quantiles are values that partition a finite set of values into subsets of (nearly) equal sizes. There are partitions of the -quantiles, one for each integer satisfying . In some cases the value of a quantile may not be uniquely determined, as can be the case for the median (2-quantile) of a uniform probability distribution on a set of even size. Quantiles can also be applied to continuous distributi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Broad Institute
The Eli and Edythe L. Broad Institute of MIT and Harvard (IPA: , pronunciation respelling: ), often referred to as the Broad Institute, is a biomedical and genomic research center located in Cambridge, Massachusetts, Cambridge, Massachusetts, United States. The institute is independently governed and supported as a 501(c)(3) nonprofit research organization under the name Broad Institute Inc., and it partners with the Massachusetts Institute of Technology, Harvard University, and the five Harvard teaching hospitals. History The Broad Institute evolved from a decade of research collaborations among MIT and Harvard scientists. One cornerstone was the Center for Genome Research of Whitehead Institute at MIT. Founded in 1982, the Whitehead became a major center for genomics and the Human Genome Project. As early as 1995, scientists at the Whitehead started pilot projects in genomic medicine, forming an unofficial collaborative network among young scientists interested in genomic a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]