In
genetics
Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
, coverage is one of several measures of the depth or completeness of
DNA sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
, and is more specifically expressed in any of the following terms:
*Sequence coverage (or depth) is the number of unique reads that include a given
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
in the reconstructed sequence.
Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence.
*Physical coverage, the cumulative length of reads or read pairs expressed as a multiple of genome size.
[
*Genomic coverage, the percentage of all ]base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s or loci of the genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
covered by sequencing.
Sequence coverage
Rationale
Even though the sequencing accuracy for each individual nucleotide is very high, the very large number of nucleotides in the genome means that if an individual genome is only sequenced once, there will be a significant number of sequencing errors. Furthermore, many positions in a genome contain rare single-nucleotide polymorphism
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a ...
s (SNPs). Hence to distinguish between sequencing errors and true SNPs, it is necessary to increase the sequencing accuracy even further by sequencing individual genomes a large number of times.
Ultra-deep sequencing
The term "ultra-deep" can sometimes also refer to higher coverage (>100-fold), which allows for detection of sequence variants in mixed populations. In the extreme, error-corrected sequencing approaches such as Maximum-Depth Sequencing can make it so that coverage of a given region approaches the throughput of a sequencing machine, allowing coverages of >10^8.
Transcriptome sequencing
Deep sequencing of transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The ...
s, also known as RNA-Seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
, provides both the sequence and frequency of RNA molecules that are present at any particular time in a specific cell type, tissue or organ. Counting the number of mRNAs that are encoded by individual genes provides an indicator of protein-coding potential, a major contributor to phenotype
In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
. Improving methods for RNA sequencing is an active area of research both in terms of experimental and computational methods.
Calculation
The average coverage for a whole genome can be calculated from the length of the original genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
(''G''), the number of reads (''N''), and the average read length (''L'') as . For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2× redundancy. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (sometimes also called breadth of coverage). A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. The subject of DNA sequencing theory addresses the relationships of such quantities.
Physical coverage
Sometimes a distinction is made between ''sequence coverage'' and ''physical coverage''. Where sequence coverage is the average number of times a base is read, physical coverage is the average number of times a base is read or spanned by mate paired reads.
Genomic coverage
In terms of genomic coverage and accuracy, whole genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
can broadly be classified into either of the following:[ Last updated: November 1, 2021]
*A ''draft sequence'', covering approximately 90% of the genome at approximately 99.9% accuracy
*A ''finished sequence'', covering more than 95% of the genome at approximately 99.99% accuracy
Producing a truly high-quality ''finished'' sequence by this definition is very expensive. Thus, most human "whole genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing or just genome sequencing, is the process of determining the entirety of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's ...
" results are ''draft sequences'' (sometimes above and sometimes below the accuracy defined above).[
]
References
{{Reflist
Molecular biology
DNA sequencing