HOME

TheInfoList



OR:

DNA nanoball sequencing is a
high throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
technology that is used to determine the entire
genomic sequence In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding gen ...
of an organism. The method uses
rolling circle replication Rolling circle replication (RCR) is a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genom ...
to amplify small fragments of genomic DNA into ''DNA nanoballs''. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the
fluorescence Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation. It is a form of luminescence. In most cases, the emitted light has a longer wavelength, and therefore a lower photon energy, tha ...
of the bound nucleotides This
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
method allows large numbers of DNA nanoballs to be sequenced per run at lower
reagent In chemistry, a reagent ( ) or analytical reagent is a substance or compound added to a system to cause a chemical reaction, or test if one occurs. The terms ''reactant'' and ''reagent'' are often used interchangeably, but reactant specifies a ...
costs compared to other
next generation sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assemble ...
. After purchasing Complete Genomics, the
Beijing Genomics Institute BGI Group, formerly Beijing Genomics Institute, is a Chinese genomics company with headquarters in Yantian District, Shenzhen. The company was originally formed in 1999 as a genetics research center to participate in the Human Genome Project. ...
(BGI) refined ''DNA nanoball sequencing'' to sequence nucleotide samples on their own platform.


Procedure

DNA Nanoball Sequencing involves isolating DNA that is to be sequenced, shearing it into small 100 – 350
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
(bp) fragments, ligating adapter sequences to the fragments, and circularizing the fragments. The circular fragments are copied by
rolling circle replication Rolling circle replication (RCR) is a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genom ...
resulting in many single-stranded copies of each fragment. The DNA copies concatenate head to tail in a long strand, and are compacted into a DNA nanoball. The nanoballs are then
adsorbed Adsorption is the adhesion of atoms, ions or molecules from a gas, liquid or dissolved solid to a surface. This process creates a film of the ''adsorbate'' on the surface of the ''adsorbent''. This process differs from absorption, in which a ...
onto a sequencing flow cell. The color of the
fluorescence Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation. It is a form of luminescence. In most cases, the emitted light has a longer wavelength, and therefore a lower photon energy, tha ...
at each interrogated position is recorded through a high-resolution camera.
Bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
are used to analyze the fluorescence data and make a base call, and for mapping or quantifying the 50bp, 100bp, or 150bp single- or paired-end reads.


DNA Isolation, fragmentation, and size capture

Cells are lysed and DNA is extracted from the cell
lysate Lysis ( ) is the breaking down of the membrane of a cell, often by viral, enzymic, or osmotic (that is, "lytic" ) mechanisms that compromise its integrity. A fluid containing the contents of lysed cells is called a ''lysate''. In molecular bio ...
. The high-molecular-weight DNA, often several megabase pairs long, is fragmented by physical or enzymatic methods to break the DNA double-strands at random intervals. Bioinformatic mapping of the sequencing reads is most efficient when the sample DNA contains a narrow length range. For small RNA sequencing, selection of the ideal fragment lengths for sequencing is performed by
gel electrophoresis Gel electrophoresis is a method for separation and analysis of biomacromolecules ( DNA, RNA, proteins, etc.) and their fragments, based on their size and charge. It is used in clinical chemistry to separate proteins by charge or size (IEF ...
; for sequencing of larger fragments, DNA fragments are separated by bead-based size selection.


Attaching adapter sequences

Adapter DNA sequences must be attached to the unknown DNA fragment so that DNA segments with known sequences flank the unknown DNA. In the first round of adapter
ligation Ligation may refer to: * Ligation (molecular biology), the covalent linking of two ends of DNA or RNA molecules * In medicine, the making of a ligature (tie) * Chemical ligation, the production of peptides from amino acids * Tubal ligation, a meth ...
, right (Ad153_right) and left (Ad153_left) adapters are attached to the right and left flanks of the fragmented DNA, and the DNA is amplified by PCR. A splint oligo then hybridizes to the ends of the fragments which are ligated to form a circle. An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template.


Rolling circle replication

Once a single-stranded circular DNA template is created, containing sample DNA that is ligated to two unique adapter sequences has been generated, the full sequence is amplified into a long string of DNA. This is accomplished by
rolling circle replication Rolling circle replication (RCR) is a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genom ...
with the Phi 29 DNA polymerase which binds and replicates the DNA template. The newly synthesized strand is released from the circular template, resulting in a long single-stranded DNA comprising several head-to-tail copies of the circular template. The resulting nanoparticle self-assembles into a tight ball of DNA approximately 300
nanometers 330px, Different lengths as in respect to the molecular scale. The nanometre (international spelling as used by the International Bureau of Weights and Measures; SI symbol: nm) or nanometer (American and British English spelling differences#-re ...
(nm) across. Nanoballs remain separated from each other because they are negatively charged naturally repel each other, reducing any tangling between different single stranded DNA lengths.


DNA nanoball patterned array

To obtain DNA sequence, the DNA nanoballs are attached to a patterned array flow cell. The flow cell is a silicon wafer coated with
silicon dioxide Silicon dioxide, also known as silica, is an oxide of silicon with the chemical formula , most commonly found in nature as quartz and in various living organisms. In many parts of the world, silica is the major constituent of sand. Silica is one ...
,
titanium Titanium is a chemical element with the symbol Ti and atomic number 22. Found in nature only as an oxide, it can be reduced to produce a lustrous transition metal with a silver color, low density, and high strength, resistant to corrosion in ...
, hexamethyldisilazane (HMDS), and a
photoresist A photoresist (also known simply as a resist) is a light-sensitive material used in several processes, such as photolithography and photoengraving, to form a patterned coating on a surface. This process is crucial in the electronic industry. T ...
material. The DNA nanoballs are added to the flow cell and selectively bind to the positively-charged aminosilane in a highly ordered pattern, allowing a very high density of DNA nanoballs to be sequenced.


Imaging

After each DNA nucleotide incorporation step, the flow cell is imaged to determine which nucleotide base bound to the DNA nanoball. The fluorophore is excited with a
laser A laser is a device that emits light through a process of optical amplification based on the stimulated emission of electromagnetic radiation. The word "laser" is an acronym for "light amplification by stimulated emission of radiation". The fir ...
that excites specific
wavelengths In physics, the wavelength is the spatial period of a periodic wave—the distance over which the wave's shape repeats. It is the distance between consecutive corresponding points of the same phase on the wave, such as two adjacent crests, tro ...
of light. The emission of fluorescence from each DNA nanoball is captured on a high resolution
CCD camera A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a ...
. The image is then processed to remove background noise and assess the intensity of each point. The color of each DNA nanoball corresponds to a base at the interrogative position and a computer records the base position information.


Sequencing data format

The data generated from the DNA nanoballs is formatted as standard
FASTQ format FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. ...
ted files with contiguous bases (no gaps). These files can be used in any data analysis pipeline that is configured to read single-end or paired-end FASTQ files. For example: Read 1, from a 100bp paired end run from @CL100011513L1C001R013_126365/1 CTAGGCAACTATAGGTCTCAGTTAAGTCAAATAAAATTCACATCAAATTTTTACTCCCACCATCCCAACACTTTCCTGCCTGGCATATGCCGTGTCTGCC + FFFFFFFFFFFGFGFFFFFF;FFFFFFFGFGFGFFFFFF;FFFFGFGFGFFEFFFFFEDGFDFF@FCFGFGCFFFFFEFFEGDFDFFFFFGDAFFEFGFF Corresponding Read 2: @CL100011513L1C001R013_126365/2 TGTCTACCATATTCTACATTCCACACTCGGTGAGGGAAGGTAGGCACATAAAGCAATGGCAGTACGGTGTAATACATGCTAATGTAGAGTAAGCACTCAG + 3E9EE?FD<<@EFE>>ECEF5CE:B6E:CEE?6B>B+@??31/FD:0?@:E9<3FE2/A:/8>9CB&=E<7:-+>;29:7+/5D9)?5F/:


Informatics Tips


Reference Genome Alignment

Default parameters for the popular aligners are sufficient.


Read Names

In the FASTQ file created by BGI/MGI sequencers using DNA nanoballs on a patterned array flowcell, the read names look like this: BGISEQ-500: CL100025298L1C002R050_244547 MGISEQ-2000: V100006430L1C001R018613883 Read names can be parsed to extract three variables describing the physical location of the read on the patterned array: (1) tile/region, (2) x coordinate, and (3) y coordinate. Note that, due to the order of these variables, these read names cannot be natively parsed b
Picard
MarkDuplicates in order to identify optical duplicates. However, as there are none on this platform, this poses no problem to Picard-based data analysis.


Duplicates

Because DNA nanoballs remain confined their spots on the patterned array there are no optical duplicates to contend with during bioinformatics analysis of sequencing reads. It is suggested to run Picard MarkDuplicates as follows: java -jar picard.jar MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt READ_NAME_REGEX=null A test with Picard-friendly, reformatted read names demonstrates the absence of this class of duplicate read: The single read marked as an optical duplicate is most assuredly artefactual. In any case, the effect on the estimated library size is negligible.


Advantages

DNA nanoball sequencing technology offers some advantages over other sequencing platforms. One advantage is the eradication of optical duplicates. DNA nanoballs remain in place on the patterned array and do not interfere with neighboring nanoballs. Another advantage of DNA nanoball sequencing include the use of high-fidelity Phi 29 DNA polymerase to ensure accurate amplification of the circular template, several hundred copies of the circular template compacted into a small area resulting in an intense signal, and attachment of the fluorophore to the probe at a long distance from the ligation point results in improved ligation.


Disadvantages

The main disadvantage of DNA nanoball sequencing is the short read length of the DNA sequences obtained with this method. Short reads, especially for DNA high in DNA repeats, may map to two or more regions of the reference genome. A second disadvantage of this method is that multiple rounds of PCR have to be used. This can introduce PCR bias and possibly amplify contaminants in the template construction phase. However, these disadvantages are common to all short-read sequencing platforms are not specific to DNA nanoballs.


Applications

DNA nanoball sequencing has been used in recent studies. Lee ''et al.'' used this technology to find mutations that were present in a lung cancer and compared them to normal lung tissue. They were able to identify over 50,000 single nucleotide variants. Roach ''et al.'' used DNA nanoball sequencing to sequence the genomes of a family of four relatives and were able to identify SNPs that may be responsible for a
Mendelian disorder A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
, and were able to estimate the inter-generation mutation rate. The
Institute for Systems Biology Institute for Systems Biology (ISB) is a non-profit research institution located in Seattle, Washington, United States. ISB concentrates on systems biology, the study of relationships and interactions between various parts of biological systems, ...
has used this technology to sequence 615 complete human genome samples as part of a survey studying
neurodegenerative A neurodegenerative disease is caused by the progressive loss of structure or function of neurons, in the process known as neurodegeneration. Such neuronal damage may ultimately involve cell death. Neurodegenerative diseases include amyotrophic ...
diseases, and the
National Cancer Institute The National Cancer Institute (NCI) coordinates the United States National Cancer Program and is part of the National Institutes of Health (NIH), which is one of eleven agencies that are part of the U.S. Department of Health and Human Services. ...
is using DNA nanoball sequencing to sequence 50 tumours and matched normal tissues from
pediatric cancers Childhood cancer is cancer in a child. About 80% of childhood cancer cases can be successfully treated thanks to modern medical treatments and optimal patient care. However, only about 10% of children diagnosed with cancer reside in high-income cou ...
.


Significance

Massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
next generation sequencing platforms like DNA nanoball sequencing may contribute to the diagnosis and treatment of many genetic diseases. The cost of sequencing an entire human genome has fallen from about one million dollars in 2008, to $4400 in 2010 with the DNA nanoball technology. Sequencing the entire genomes of patients with heritable diseases or
cancer Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal b ...
,
mutations In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
associated with these diseases have been identified, opening up strategies, such as targeted therapeutics for at-risk people and for
genetic counseling Genetic counseling is the process of investigating individuals and families affected by or at risk of genetic disorders to help them understand and adapt to the medical, psychological and familial implications of genetic contributions to disease; t ...
. As the price of sequencing an entire human genome approaches the $1000 mark, genomic sequencing of every individual may become feasible as part of normal
preventative medicine Preventive healthcare, or prophylaxis, consists of measures taken for the purposes of disease prevention.Hugh R. Leavell and E. Gurney Clark as "the science and art of preventing disease, prolonging life, and promoting physical and mental hea ...
.


References

{{reflist DNA sequencing Genomics techniques