Circular Consensus Sequencing
   HOME

TheInfoList



OR:

Circular consensus sequencing (CCS) is a
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
method that is used in conjunction with
single-molecule real-time sequencing Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a ...
to yield highly accurate long-read sequencing datasets with read lengths averaging 15–25 kb with median accuracy greater than 99.9%. These long reads, which are created via the formation of consensus sequencing obtained from multiple passes on a single DNA molecule, can be used to improve results for complex applications such as single
nucleotide Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
and structural variant detection,
genome assembly In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one ...
, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. CCS allows resolution of large or complex genomes – such as the
California Redwood ''Sequoia sempervirens'' ()''Sunset Western Garden Book,'' 1995: 606–607 is the sole living species of the genus '' Sequoia'' in the cypress family Cupressaceae (formerly treated in Taxodiaceae). Common names include coast redwood, coast ...
genome, nine times the size of the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
- of any species, including variant detection single nucleotide variants (SNVs) to structural variants, with high precision. CCS also enables separation of the different copies of each chromosome (e.g., maternal and paternal for diploid), known as
haplotype A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
s. CCS reads offer the benefits of high accuracy equivalent to short-read sequencing data, but with the length necessary for complex genome assemblies and phasing of variants across the genome.


Technology

In this method, circularized fragments of DNA in solution float across the surface of a nanofluidic chip called a SMRT (Single Molecule, Real-Time) Cell. The surface of the chip is covered with millions of wells called
zero-mode waveguide A zero-mode waveguide is an optical waveguide that guides light energy into a volume that is small in all dimensions compared to the wavelength of the light. Zero-mode waveguides have been developed for rapid parallel sensing of zeptolitre sample ...
s (ZMWs), each a few nanometers wide. To prepare a sample for CCS/HiFi sequencing, primers and
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create t ...
are added to SMRTbell libraries. The circularized DNA becomes trapped in the ZMW, nucleotides are added, and the DNA polymerase enzyme begins to copy the molecule base by base. As this happens, a tiny amount of light is released and read by a detector, which helps the sequencer’s computer determine the order of bases present in the sample. The circularized DNA is sequenced in repeated passes to ensure accuracy – thus the name “circular” consensus sequencing – then  the primers and adapters are removed using bioinformatics to deliver a highly accurate consensus DNA read. In CCS, the genomic DNA is prepared without amplification such that individual base modifications such as
methylation Methylation, in the chemistry, chemical sciences, is the addition of a methyl group on a substrate (chemistry), substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replac ...
can be detected during sequencing. This allows for the capture of both sequence and valuable methylation information in a single experiment.


History

This sequencing method was first described by Travers, K.J., et al. in ''Nucleic Acids Research'' in 2010. It was later commercialized by
Pacific Biosciences Pacific Biosciences of California, Inc. (aka PacBio) is an American biotechnology company founded in 2004 that develops and manufactures systems for gene sequencing and some novel real time biological observation. PacBio has two principal sequ ...
in 2018 and made available on Sequel II and Revio long-read sequencing instruments. CCS technology has subsequently been used to power numerous studies in several fields, including: Human, telomere-to-telomere, whole genome assembly and pangenome research, pediatric rare disease genomic analysis, understanding DNA methylation in a rare disease cohorts, assembly of whole genomes of non-human vertebrates, assembly of whole genomics of other agriculturally significant species, analysis of cancer genomes and Metagenomics and microbial research, among others. Recognizing the importance of this technology in future genomic exploration and discovery, the editors of ''
Nature Methods ''Nature Methods'' is a monthly peer-reviewed scientific journal covering new scientific techniques. It was established in 2004 and is published by Springer Nature under the Nature Portfolio. Like other ''Nature'' journals, there is no external edi ...
'' named long-read sequencing technology its method of the year for 2022.


Applications


Human and conservation biology

CCS can be useful to researchers seeking to perform ''de novo'' sequencing assembly or studying
haplotype A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
d phased sequences from each
chromosomal A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most importa ...
copy, regardless of how many chromosomes are present in the species.Many biodiversity-oriented consortia have leveraged such technology to complete their conservation biology studies including African Biogenome Project, California Conservation Genomics Project, Darwin Tree of Life, Desert Agriculture Initiative,  Earth Biogenome Project, Global Ant Genomics Alliance, Human Pangenome, Telomere-to-Telomere Consortium, The 10,000 Fish Genomes Project and Vertebrate Genomes Project.


Human health

Circular consensus sequencing is helping researchers identify and characterize rare or structural variants with high confidence to better identify the underlying
genomics Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
of a given
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (physical form and structure), its developmental processes, its biochemical and physiological propert ...
, with numerous applications to human health including rare disease research,
microbiology Microbiology () is the branches of science, scientific study of microorganisms, those being of unicellular organism, unicellular (single-celled), multicellular organism, multicellular (consisting of complex cells), or non-cellular life, acellula ...
and
infectious disease An infection is the invasion of tissue (biology), tissues by pathogens, their multiplication, and the reaction of host (biology), host tissues to the infectious agent and the toxins they produce. An infectious disease, also known as a transmis ...
, cancer research, and other genetic disease research areas.


Rare diseases

Although they occur with low frequency in the human population, rare diseases as a collective are common and most have a genetic cause, presenting unique diagnostic challenges. An estimated 50–80% of structural variants are
tandem repeat In genetics, tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other, e.g. ATTCG ATTCG ATTCG, in which the sequence ATTCG is repeated three times. Several protein ...
s. Because CCS provides a comprehensive view of variation in the human genome, producing complete, accurate, and phased assemblies for variant calling, identification of repeat expansions and medically relevant interruption sequences, it is enabling the identification of causative
pathogen In biology, a pathogen (, "suffering", "passion" and , "producer of"), in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a Germ theory of d ...
ic variants and helping researchers discover novel disease-associated genes.


Microbiology and infectious diseases

Circular consensus sequencing can rapidly identify emerging pathogens and/or detection of changing pathogen genomics as part of regional or global surveillance operations.Where other molecular technologies for
public health surveillance Public health surveillance (also epidemiological surveillance, clinical surveillance or syndromic surveillance) is, according to the World Health Organization (WHO), "the continuous, systematic collection, analysis and interpretation of health-rela ...
may require re-validation or the development of new panels, the unbiased nature of circular consensus sequencing delivers comprehensive genetic information to further characterize global outbreaks,
pandemic A pandemic ( ) is an epidemic of an infectious disease that has a sudden increase in cases and spreads across a large region, for instance multiple continents or worldwide, affecting a substantial number of individuals. Widespread endemic (epi ...
s, and
epidemic An epidemic (from Greek ἐπί ''epi'' "upon or above" and δῆμος ''demos'' "people") is the rapid spread of disease to a large number of hosts in a given population within a short period of time. For example, in meningococcal infection ...
s.


Cancer research

Comprehensive resolution of structural variants enables researchers to better study and detect somatic variants driving cancer. Because of their size (>50 bp), structural variants and tandem repeats account for much genomic variation between individuals. Long-read RNA sequencing can be useful in cancer research to uncover sources of alternative splicing and fusion events which power cancer growth. CCS also provides an advantage over other sequencing technologies as it can provide phasing information of expressed
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
s.


References

{{Reflist DNA sequencing Biotechnology