Illumina dye sequencing
   HOME

TheInfoList



OR:

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by
Shankar Balasubramanian Sir Shankar Balasubramanian (born 30 September 1966) is an Indian-born British chemist and Herchel Smith Professor of Medicinal Chemistry in the Department of Chemistry at the University of Cambridge, ccessed 4 April 2013 Senior Group Leader ...
and
David Klenerman Sir David Klenerman (born 1959) is a British biophysical chemist and a professor of biophysical chemistry at the Department of Chemistry at the University of Cambridge and a Fellow of Christ's College, Cambridge. He is best known for his contr ...
of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
and region sequencing,
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
analysis,
metagenomics Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microb ...
, small
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
discovery,
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These t ...
profiling, and genome-wide
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
-
nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...
interaction analysis.


Overview

This works in three basic steps: amplify, sequence, and analyze. The process begins with purified DNA. The DNA is fragmented and adapters are added that contain segments that act as reference points during amplification, sequencing, and analysis. The modified DNA is loaded onto a flow cell where amplification and sequencing will take place. The flow cell contains nanowells that space out fragments and help with overcrowding. Each nanowell contains oligonucleotides that provide an anchoring point for the adapters to attach. Once the fragments have attached, a phase called cluster generation begins. This step makes about a thousand copies of each fragment of DNA and is done by bridge amplification PCR. Next, primers and modified nucleotides are washed onto the chip. These nucleotides have a reversible fluorescent blocker so the DNA polymerase can only add one nucleotide at a time onto the DNA fragment. After each round of synthesis, a camera takes a picture of the chip. A computer determines what base was added by the wavelength of the fluorescent tag and records it for every spot on the chip. After each round, non-incorporated molecules are washed away. A chemical deblocking step is then used to remove the 3’ fluorescent terminal blocking group. The process continues until the full DNA molecule is sequenced. With this technology, thousands of places throughout the genome are sequenced at once via
massive parallel sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
.


Procedure


Genomic Library

After the DNA is purified a DNA library, genomic library, needs to be generated. There are two ways a genomic library can be created, sonification and tagmentation. With tagmentation,
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transposition ...
s randomly cuts the DNA into sizes between 50 to 500 bp fragments and adds adaptors simultaneously. A genetic library can also be generated by using sonification to fragment genomic DNA. Sonification fragments DNA into similar sizes using ultrasonic sound waves. Right and left adapters will need to be attached by T7 DNA Polymerase and T4 DNA ligase after sonification. Strands that fail to have adapters ligated are washed away.


Adapters

Adapters contain three different segments: the sequence complementary to solid support (oligonucleotides on flow cell), the barcode sequence (indices), and the binding site for the sequencing primer. Indices are usually six base pairs long and are used during DNA sequence analysis to identify samples. Indices allow for up to 96 different samples to be run together, this is also known as multiplexing. During analysis, the computer will group all reads with the same index together. Illumina uses a "sequence by synthesis" approach. This process takes place inside of an acrylamide-coated glass flow cell. The flow cell has oligonucleotides (short nucleotide sequences) coating the bottom of the cell, and they serve as the solid support to hold the DNA strands in place during sequencing. As the fragmented DNA is washed over the flow cell, the appropriate adapter attaches to the complementary solid support.


Bridge amplification

Once attached, cluster generation can begin. The goal is to create hundreds of identical strands of DNA. Some will be the forward strand; the rest, the reverse. This is why right and left adapters are used. Clusters are generated through bridge amplification. DNA polymerase moves along a strand of DNA, creating its complementary strand. The original strand is washed away, leaving only the reverse strand. At the top of the reverse strand there is an adapter sequence. The DNA strand bends and attaches to the oligo that is complementary to the top adapter sequence. Polymerases attach to the reverse strand, and its complementary strand (which is identical to the original) is made. The now double stranded DNA is denatured so that each strand can separately attach to an oligonucleotide sequence anchored to the flow cell. One will be the reverse strand; the other, the forward. This process is called bridge amplification, and it happens for thousands of clusters all over the flow cell at once.


Clonal amplification

Over and over again, DNA strands will bend and attach to the solid support. DNA polymerase will synthesize a new strand to create a double stranded segment, and that will be denatured so that all of the DNA strands in one area are from a single source (clonal amplification). Clonal amplification is important for quality control purposes. If a strand is found to have an odd sequence, then scientists can check the reverse strand to make sure that it has the complement of the same oddity. The forward and reverse strands act as checks to guard against artefacts. Because Illumina sequencing uses DNA polymerase, base substitution errors have been observed, especially at the 3' end. Paired end reads combined with cluster generation can confirm an error took place. The reverse and forward strands should be complementary to each other, all reverse reads should match each other, and all forward reads should match each other. If a read is not similar enough to its counterparts (with which it should be a clone), an error may have occurred. A minimum threshold of 97% similarity has been used in some labs' analyses.


Sequence by synthesis

At the end of clonal amplification, all of the reverse strands are washed off the flow cell, leaving only forward strands. A primer attaches to the forward strands adapter primer binding site, and a polymerase adds a fluorescently tagged dNTP to the DNA strand. Only one base is able to be added per round due to the fluorophore acting as a blocking group; however, the blocking group is reversible. Using the four-color chemistry, each of the four bases has a unique emission, and after each round, the machine records which base was added. Once the color is recorded the fluorophore is washed away and another dNTP is washed over the flow cell and the process is repeated. Starting with the launch of the NextSeq and later the MiniSeq, Illumina introduced a new two-color sequencing chemistry. Nucleotides are distinguished by either one of two colors (red or green), no color ("black") or combining both colors (appearing orange as a mixture between red and green). Once the DNA strand has been read, the strand that was just added is washed away. Then, the index 1 primer attaches, polymerizes the index 1 sequence, and is washed away. The strand forms a bridge again, and the 3' end of the DNA strand attaches to an oligo on the flow cell. The index 2 primer attaches, polymerizes the sequence, and is washed away. A polymerase sequences the complementary strand on top of the arched strand. They separate, and the 3' end of each strand is blocked. The forward strand is washed away, and the process of sequence by synthesis repeats for the reverse strand.


Data analysis

The sequencing occurs for millions of clusters at once, and each cluster has ~1,000 identical copies of a DNA insert. The sequence data is analyzed by finding fragments with overlapping areas, called
contig A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005. In bottom-up sequencing projects, a contig refers to ov ...
s, and lining them up. If a reference sequence is known, the contigs are then compared to it for variant identification. This piecemeal process allows scientists to see the complete sequence even though an unfragmented sequence was never run; however, because Illumina read lengths are not very long (HiSeq sequencing can produce read lengths around 90 bp long), it can be a struggle to resolve short tandem repeat areas. Also, if the sequence is de novo and a reference doesn't exist, repeated areas can cause a lot of difficulty in sequence assembly. Additional difficulties include base substitutions (especially at the 3' end of reads) by inaccurate polymerases, chimeric sequences, and PCR-bias, all of which can contribute to generating an incorrect sequence.


Comparison with other sequencing methods

This technique offers several advantages over traditional sequencing methods such as
Sanger sequencing Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederi ...
. Sanger sequencing requires two reactions, one for the forward primer and another for the reverse primer. Unlike Illumina, Sanger sequencing uses fluorescently labeled dideoxynucleoside triphosphates (ddNTPs) to determine the sequence of the DNA fragment. ddNTPs are missing the 3' OH group and terminates DNA synthesis permanently. In each reaction tube, dNTPs and ddNTPs are added, along with DNA polymerase and primers. The ratio of ddNTPs to dNTPs matter since the template DNA needs to be completely synthesized, and an overabundance of ddNTPs will create multiple fragments of the same size and position of the DNA template. When the DNA polymerase adds a ddNTP the fragment is terminated and a new fragment is synthesized. Each fragment synthesized is one nucleotide longer than the last. Once the DNA template has been completely synthesized, the fragments are separated by capillary electrophoresis. At the bottom of the capillary tube a laser excites the fluorescently labeled ddNTPs and a camera captures the color emitted. Due to the automated nature of Illumina dye sequencing it is possible to sequence multiple strands at once and gain actual sequencing data quickly. With Sanger sequencing, only one strand is able to be sequenced at a time and is relatively slow. Illumina only uses
DNA polymerase A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create ...
as opposed to multiple, expensive
enzymes Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different molecule ...
required by other sequencing techniques (i.e.
pyrosequencing Pyrosequencing is a method of DNA sequencing (determining the order of nucleotides in DNA) based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequ ...
).


References

{{Reflist DNA sequencing methods