Epigenomics
   HOME

TheInfoList



OR:

Epigenomics is the study of the complete set of
epigenetic In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are "o ...
modifications on the genetic material of a cell, known as the
epigenome An epigenome consists of a record of the chemical changes to the DNA and histone proteins of an organism; these changes can be passed down to an organism's offspring via transgenerational stranded epigenetic inheritance. Changes to the epigenome ...
. The field is analogous to
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
and
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts t ...
and
histone modification In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn ar ...
. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and
tumorigenesis Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cells are transformed into cancer cells. The process is characterized by changes at the cellular, genetic, and epigenetic levels and abno ...
. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.


Introduction to epigenetics

The mechanisms governing
phenotypic plasticity Phenotypic plasticity refers to some of the changes in an organism's behavior, morphology and physiology in response to a unique environment. Fundamental to the way in which organisms cope with environmental variation, phenotypic plasticity encompa ...
, or the capacity of a cell to change its state in response to stimuli, have long been the subject of research (Phenotypic plasticity 1). The traditional
central dogma of biology The central dogma of molecular biology is an explanation of the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning. It was first stated by ...
states that the DNA of a cell is transcribed to
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, which is translated to
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s, which perform cellular processes and functions. A paradox exists, however, in that cells exhibit diverse responses to varying stimuli and that cells sharing identical sets of DNA such as in multicellular organisms can have a variety of distinct functions and phenotypes. Classical views have attributed phenotypic variation to differences in primary DNA structure, be it through aberrant
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
or an inherited sequence
allele An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. ::"The chro ...
. However, while this did explain some aspects of variation, it does not explain how tightly coordinated and regulated cellular responses, such as differentiation, are carried out. A more likely source of cellular plasticity is through the
Regulation of gene expression Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
, such that while two cells may have near identical DNA, the differential expression of certain genes results in variation. Research has shown that cells are capable of regulating gene expression at several stages: mRNA transcription, processing and transportation as well as in protein translation, post-translational processing and degradation. Regulatory proteins that bind to DNA, RNA, and/or proteins are key effectors in these processes and function by positively or negatively regulating specific protein level and function in a cell. And, while DNA binding transcription factors provide a mechanism for specific control of cellular responses, a model where DNA binding
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...
s are the sole regulators of gene activity is also unlikely. For example, in a study of
Somatic-cell nuclear transfer In genetics and developmental biology, somatic cell nuclear transfer (SCNT) is a laboratory strategy for creating a viable embryo from a body cell and an egg cell. The technique consists of taking an enucleated oocyte (egg cell) and implantin ...
, it was demonstrated that stable features of differentiation remain after the
nucleus Nucleus ( : nuclei) is a Latin word for the seed inside a fruit. It most often refers to: *Atomic nucleus, the very dense central region of an atom *Cell nucleus, a central organelle of a eukaryotic cell, containing most of the cell's DNA Nucle ...
is transferred to a new cellular environment, suggesting that a stable and heritable mechanism of gene regulation was involved in the maintenance of the differentiated state in the absence of the DNA binding transcription factors. With the finding that DNA methylation and histone modifications are stable, heritable, and also reversible processes that influence gene expression without altering DNA primary structure, a mechanism for the observed variability in cell gene expression was provided. These modifications were termed epigenetic, from epi “on top of” the genetic material “DNA” (Epigenetics 1). The mechanisms governing epigenetic modifications are complex, but through the advent of high-throughput sequencing technology they are now becoming better understood.


Epigenetics

Genomic modifications that alter gene expression that cannot be attributed to modification of the primary DNA sequence and that are heritable
mitotically In cell biology, mitosis () is a part of the cell cycle in which replicated chromosomes are separated into two new nuclei. Cell division by mitosis gives rise to genetically identical cells in which the total number of chromosomes is maintai ...
and meiotically are classified as epigenetic modifications. DNA methylation and histone modification are among the best characterized epigenetic processes.


DNA methylation

The first epigenetic modification to be characterized in depth was DNA methylation. As its name implies, DNA methylation is the process by which a
methyl group In organic chemistry, a methyl group is an alkyl derived from methane, containing one carbon atom bonded to three hydrogen atoms, having chemical formula . In formulas, the group is often abbreviated as Me. This hydrocarbon group occurs in many ...
is added to DNA. The enzymes responsible for catalyzing this reaction are the DNA methyltransferases (DNMTs). While DNA methylation is stable and heritable, it can be reversed by an antagonistic group of enzymes known as DNA de-methylases. In eukaryotes, methylation is most commonly found on the carbon 5 position of cytosine residues (5mC) adjacent to
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is called ...
, termed CpG dinucleotides. DNA methylation patterns vary greatly between species and even within the same organism. The usage of methylation among animals is quite different; with
vertebrate Vertebrates () comprise all animal taxa within the subphylum Vertebrata () ( chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, ...
s exhibiting the highest levels of 5mC and
invertebrate Invertebrates are a paraphyletic group of animals that neither possess nor develop a vertebral column (commonly known as a ''backbone'' or ''spine''), derived from the notochord. This is a grouping including all animals apart from the chordate ...
s more moderate levels of 5mC. Some organisms like ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
'' have not been demonstrated to have 5mC nor a conventional DNA methyltransferase; this would suggest that other mechanisms other than DNA methylation are also involved. Within an organism, DNA methylation levels can also vary throughout development and by region. For example, in mouse primordial
germ cell Germ or germs may refer to: Science * Germ (microorganism), an informal word for a pathogen * Germ cell, cell that gives rise to the gametes of an organism that reproduces sexually * Germ layer, a primary layer of cells that forms during embry ...
s, a genome wide de-methylation even occurs; by implantation stage, methylation levels return to their previous somatic values. When DNA methylation occurs at promoter regions, the sites of transcription initiation, it has the effect of repressing gene expression. This is in contrast to unmethylated promoter regions which are associated with actively expressed genes. The mechanism by which DNA methylation represses gene expression is a multi-step process. The distinction between methylated and unmethylated cytosine residues is carried out by specific DNA-binding proteins. Binding of these proteins recruit histone deacetylases (HDACs) enzyme which initiate
chromatin remodeling Chromatin remodeling is the dynamic modification of chromatin architecture to allow access of condensed genomic DNA to the regulatory transcription machinery proteins, and thereby control gene expression. Such remodeling is principally carried out ...
such that the DNA becoming less accessible to transcriptional machinery, such as
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template. Using the enzyme helicase, RNAP locally opens the ...
, effectively repressing gene expression.


Histone modification

In
eukaryote Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
s, genomic DNA is coiled into protein-DNA complexes called
chromatin Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in r ...
.
Histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn are wr ...
s, which are the most prevalent type of protein found in chromatin, function to condense the DNA; the net positive charge on histones facilitates their bonding with DNA, which is negatively charged. The basic and repeating units of chromatin,
nucleosome A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone proteins and resembles thread wrapped around a spool. The nucleosome is the fundamen ...
s, consist of an octamer of histone proteins (H2A, H2B, H3 and H4) and a 146 bp length of DNA wrapped around it. Nucleosomes and the DNA connecting form a 10 nm diameter chromatin fiber, which can be further condensed. Chromatin packaging of DNA varies depending on the cell cycle stage and by local DNA region. The degree to which chromatin is condensed is associated with a certain transcriptional state. Unpackaged or loose chromatin is more transcriptionally active than tightly packaged chromatin because it is more accessible to transcriptional machinery. By remodeling chromatin structure and changing the density of DNA packaging, gene expression can thus be modulated. Chromatin remodeling occurs via
post-translational modifications Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosomes ...
of the N-terminal tails of core histone proteins. The collective set of histone modifications in a given cell is known as the
histone code The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications (known as ''histone marks'') to histone proteins, primarily on their unstructured ends. Together with sim ...
. Many different types of histone modification are known, including:
acetylation : In organic chemistry, acetylation is an organic esterification reaction with acetic acid. It introduces an acetyl group into a chemical compound. Such compounds are termed ''acetate esters'' or simply '' acetates''. Deacetylation is the oppo ...
,
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These t ...
,
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
,
ubiquitin Ubiquitin is a small (8.6 kDa) regulatory protein found in most tissues of eukaryotic organisms, i.e., it is found ''ubiquitously''. It was discovered in 1975 by Gideon Goldstein and further characterized throughout the late 1970s and 1980s. Fo ...
ation,
SUMOylation In molecular biology, SUMO (Small Ubiquitin-like Modifier) proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function. This process is called SUMOylation (sometimes w ...
,
ADP-ribosylation ADP-ribosylation is the addition of one or more ADP-ribose moieties to a protein. It is a reversible post-translational modification that is involved in many cellular processes, including cell signaling, DNA repair, gene regulation and apoptosis. ...
,
deamination Deamination is the removal of an amino group from a molecule. Enzymes that catalyse this reaction are called deaminases. In the human body, deamination takes place primarily in the liver, however it can also occur in the kidney. In situations of e ...
and
proline isomerization In epigenetics, proline isomerization is the effect that ''cis-trans'' isomerization of the amino acid proline has on the regulation of gene expression. Similar to aspartic acid, the amino acid proline has the rare property of being able to occupy ...
; acetylation, methylation, phosphorylation and ubiquitination have been implicated in gene activation whereas methylation, ubiquitination, SUMOylation, deimination and proline isomerization have been implicated in gene repression. Note that several modification types including methylation, phosphorylation and ubiquitination can be associated with different transcriptional states depending on the specific amino acid on the histone being modified. Furthermore, the DNA region where histone modification occurs can also elicit different effects; an example being methylation of the 3rd core histone at lysine residue 36 (H3K36). When H3K36 occurs in the coding sections of a gene, it is associated with gene activation but the opposite is found when it is within the promoter region. Histone modifications regulate gene expression by two mechanisms: by disruption of the contact between nucleosomes and by recruiting chromatin remodeling ATPases. An example of the first mechanism occurs during the acetylation of
lysine Lysine (symbol Lys or K) is an α-amino acid that is a precursor to many proteins. It contains an α-amino group (which is in the protonated form under biological conditions), an α-carboxylic acid group (which is in the deprotonated −C ...
terminal tail amino acids, which is catalyzed by histone acetyltransferases (HATs). HATs are part of a multiprotein complex that is recruited to chromatin when activators bind to DNA binding sites. Acetylation effectively neutralizes the basic charge on lysine, which was involved in stabilizing chromatin through its affinity for negatively charged DNA. Acetylated histones therefore favor the dissociation of nucleosomes and thus unwinding of chromatin can occur. Under a loose chromatin state, DNA is more accessible to transcriptional machinery and thus expression is activated. The process can be reversed through removal of histone acetyl groups by deacetylases. The second process involves the recruitment of chromatin remodeling complexes by the binding of activator molecules to corresponding enhancer regions. The nucleosome remodeling complexes reposition nucleosomes by several mechanisms, enabling or disabling accessibility of transcriptional machinery to DNA. The SWI/SNF protein complex in yeast is one example of a chromatin remodeling complex that regulates the expression of many genes through chromatin remodeling.


Relation to other genomic fields

Epigenomics shares many commonalities with other genomics fields, in both methodology and in its abstract purpose. Epigenomics seeks to identify and characterize epigenetic modifications on a global level, similar to the study of the complete set of DNA in genomics or the complete set of proteins in a cell in proteomics. The logic behind performing epigenetic analysis on a global level is that inferences can be made about epigenetic modifications, which might not otherwise be possible through analysis of specific loci. As in the other genomics fields, epigenomics relies heavily on
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, which combines the disciplines of biology, mathematics and computer science. However while epigenetic modifications had been known and studied for decades, it is through these advancements in bioinformatics technology that have allowed analyses on a global scale. Many current techniques still draw on older methods, often adapting them to genomic assays as is described in the next section.


Methods


Histone modification assays

The cellular processes of
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
,
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritanc ...
and
DNA repair DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA dam ...
involve the interaction between genomic DNA and nuclear proteins. It had been known that certain regions within chromatin were extremely susceptible to
DNAse I Deoxyribonuclease I (usually called DNase I), is an endonuclease of the DNase family coded by the human gene DNASE1. DNase I is a nuclease that cleaves DNA preferentially at phosphodiester linkages adjacent to a pyrimidine nucleotide, yielding ...
digestion, which cleaves DNA in a low sequence specificity manner. Such hypersensitive sites were thought to be transcriptionally active regions, as evidenced by their association with
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template. Using the enzyme helicase, RNAP locally opens the ...
and topoisomerases I and II. It is now known that sensitivity to DNAse I regions correspond to regions of chromatin with loose DNA-histone association. Hypersensitive sites most often represent promoters regions, which require for DNA to be accessible for DNA binding transcriptional machinery to function.


ChIP-Chip and ChIP-Seq

Histone modification was first detected on a genome wide level through the coupling of chromatin immunoprecipitation (ChIP) technology with
DNA microarray A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
s, termed
ChIP-Chip ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray (''"chip"''). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA ''in vivo'' ...
. However instead of isolating a DNA-binding transcription factor or enhancer protein through chromatin immunoprecipitation, the proteins of interest are the modified histones themselves. First, histones are
cross-linked In chemistry and biology a cross-link is a bond or a short sequence of bonds that links one polymer chain to another. These links may take the form of covalent bonds or ionic bonds and the polymers can be either synthetic polymers or natural ...
to DNA in vivo through light chemical treatment (e.g.,
formaldehyde Formaldehyde ( , ) (systematic name methanal) is a naturally occurring organic compound with the formula and structure . The pure compound is a pungent, colourless gas that polymerises spontaneously into paraformaldehyde (refer to section F ...
). The cells are next lysed, allowing for the chromatin to be extracted and fragmented, either by
sonication A sonicator at the Weizmann Institute of Science during sonicationSonication is the act of applying sound energy to agitate particles in a sample, for various purposes such as the extraction of multiple compounds from plants, microalgae and seawe ...
or treatment with a non-specific restriction enzyme (e.g.,
micrococcal nuclease Micrococcal nuclease (, ''S7 Nuclease'', ''MNase'', ''spleen endonuclease'', ''thermonuclease'', ''nuclease T'', ''micrococcal endonuclease'', ''nuclease T, ''staphylococcal nuclease'', ''spleen phosphodiesterase'', ''Staphylococcus aureus nucle ...
). Modification-specific antibodies in turn, are used to immunoprecipitate the DNA-histone complexes. Following immunoprecipitation, the DNA is purified from the histones, amplified via PCR and labeled with a
fluorescent tag In molecular biology and biotechnology, a fluorescent tag, also known as a fluorescent label or fluorescent probe, is a molecule that is attached chemically to aid in the detection of a biomolecule such as a protein, antibody, or amino acid. Gener ...
(e.g., Cy5, Cy3). The final step involves hybridization of labeled DNA, both immunoprecipitated DNA and non-immunoprecipitated onto a microarray containing immobilized gDNA. Analysis of the relative signal intensity allows the sites of histone modification to be determined. ChIP-chip was used extensively to characterize the global histone modification patterns of
yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are estimated to constitut ...
. From these studies, inferences on the function of histone modifications were made; that transcriptional activation or repression was associated with certain histone modifications and by region. While this method was effective providing near full coverage of the yeast epigenome, its use in larger genomes such as humans is limited. In order to study histone modifications on a truly genome level, other high-throughput methods were coupled with the chromatin immunoprecipitation, namely: SAGE: serial analysis of gene expression (ChIP-SAGE), PET: paired end ditag sequencing ( ChIP-PET) and more recently, next-generation sequencing (ChIP-Seq). ChIP-seq follows the same protocol for chromatin immunoprecipitation but instead of amplification of purified DNA and hybridization to a microarray, the DNA fragments are directly sequenced using next generation parallel re-sequencing. It has proven to be an effective method for analyzing the global histone modification patterns and protein target sites, providing higher resolution than previous methods.


DNA methylation assays

Techniques for characterizing primary DNA sequences could not be directly applied to methylation assays. For example, when DNA was amplified in PCR or bacterial cloning techniques, the methylation pattern was not copied and thus the information lost. The DNA hybridization technique used in DNA assays, in which radioactive probes were used to map and identify DNA sequences, could not be used to distinguish between methylated and non-methylated DNA.


Restriction endonuclease based methods


=Non genome-wide approaches

= The earliest methylation detection assays used methylation modification sensitive
restriction endonucleases A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
. Genomic DNA was digested with both methylation-sensitive and insensitive restriction enzymes recognizing the same restriction site. The idea being that whenever the site was methylated, only the methylation insensitive enzyme could cleave at that position. By comparing restriction fragment sizes generated from the methylation-sensitive enzyme to those of the methylation-insensitive enzyme, it was possible to determine the methylation pattern of the region. This analysis step was done by amplifying the restriction fragments via PCR, separating them through
gel electrophoresis Gel electrophoresis is a method for separation and analysis of biomacromolecules ( DNA, RNA, proteins, etc.) and their fragments, based on their size and charge. It is used in clinical chemistry to separate proteins by charge or size (IEF ...
and analyzing them via
southern blot A Southern blot is a method used in molecular biology for detection of a specific DNA sequence in DNA samples. Southern blotting combines transfer of electrophoresis-separated DNA fragments to a filter membrane and subsequent fragment detecti ...
with probes for the region of interest. This technique was used to compare the DNA methylation modification patterns in the human adult and hemoglobin gene loci. Different regions of the gene (gamma delta beta globin) were known to be expressed at different stages of development. Consistent with a role of DNA methylation in gene repression, regions that were associated with high levels of DNA methylation were not actively expressed. This method was limited not suitable for studies on the global methylation pattern, or ‘methylome’. Even within specific loci it was not fully representative of the true methylation pattern as only those restriction sites with corresponding methylation sensitive and insensitive restriction assays could provide useful information. Further complications could arise when incomplete digestion of DNA by restriction enzymes generated false negative results.


=Genome wide approaches

= DNA methylation profiling on a large scale was first made possible through the Restriction Landmark Genome Scanning (RLGS) technique. Like the locus-specific DNA methylation assay, the technique identified methylated DNA via its digestion methylation sensitive enzymes. However it was the use of
two-dimensional gel electrophoresis Two-dimensional gel electrophoresis, abbreviated as 2-DE or 2-D electrophoresis, is a form of gel electrophoresis commonly used to analyze proteins. Mixtures of proteins are separated by two properties in two dimensions on 2D gels. 2-DE was first ...
that allowed be characterized on a broader scale. However it was not until the advent of microarray and next generation sequencing technology when truly high resolution and genome-wide DNA methylation became possible. As with RLGS, the endonuclease component is retained in the method but it is coupled to new technologies. One such approach is the differential methylation hybridization (DMH), in which one set of genomic DNA is digested with methylation-sensitive restriction enzymes and a parallel set of DNA is not digested. Both sets of DNA are subsequently amplified and each labelled with fluorescent dyes and used in two-colour array hybridization. The level of DNA methylation at a given loci is determined by the relative intensity ratios of the two dyes. Adaptation of next generation sequencing to DNA methylation assay provides several advantages over array hybridization. Sequence-based technology provides higher resolution to allele specific DNA methylation, can be performed on larger genomes, and does not require creation of DNA microarrays which require adjustments based on CpG density to properly function.


Bisulfite sequencing

Bisulfite sequencing Bisulfite sequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the mo ...
relies on chemical conversion of unmethylated cytosines exclusively, such that they can be identified through standard DNA sequencing techniques.
Sodium bisulfate Sodium bisulfate, also known as sodium hydrogen sulfate, is the sodium salt of the bisulfate anion, with the molecular formula NaHSO4. Sodium bisulfate is an acid salt formed by partial neutralization of sulfuric acid by an equivalent of sodium b ...
and alkaline treatment does this by converting unmethylated cytosine residues into
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by ...
while leaving methylated cytosine unaltered. Subsequent amplification and sequencing of untreated DNA and sodium bisulphite treated DNA allows for methylated sites to be identified. Bisulfite sequencing, like the traditional restriction based methods, was historically limited to methylation patterns of specific gene loci, until whole genome sequencing technologies became available. However, unlike traditional restriction based methods, bisulfite sequencing provided resolution on a nucleotide level. Limitations of the bisulfite technique include the incomplete conversion of cytosine to uracil, which is a source of false positives. Further, bisulfite treatment also causes DNA degradation and requires an additional purification step to remove the sodium bisulfite. Next-generation sequencing is well suited in complementing bisulfite sequencing in genome-wide methylation analysis. While this now allows for methylation pattern to be determined on the highest resolution possible, on the single nucleotide level, challenges still remain in the assembly step because of reduced sequence complexity in bisulphite treated DNA. Increases in read length seek to address this challenge, allowing for whole genome shotgun bisulphite sequencing (WGBS) to be performed. The WGBS approach using an Illumina Genome Analyzer platform and has already been implemented in ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
''. Reduced representation genomic methods based on bisulfite sequencing exist as well, and they are particularly suitable for species with large genome sizes.


Chromatin accessibility assays

Chromatin accessibility is the measure of how "accessible" or "open" a region of genome is to transcription or binding of transcription factors. The regions which are inaccessible (i.e. because they're bound by
nucleosome A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone proteins and resembles thread wrapped around a spool. The nucleosome is the fundamen ...
s) are not actively transcribed by the cell while open and accessible regions are actively transcribed. Changes in chromatin accessibility are important epigenetic regulatory processes that govern cell- or context-specific expression of genes. Assays such as MNase-seq, DNase-seq, ATAC-seq or FAIRE-seq are routinely used to understand the accessible chromatin landscape of cells. The main feature of all these methods is that they're able to selectively isolate either the DNA sequences that are bounded to the
histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn are wr ...
s, or those that are not. These sequences are then compared to a reference genome that allows to identify their relative position. MNase-seq and DNase-seq both follow the same principles, as they employ lytic enzymes that target nucleic acids to cut the DNA strands unbounded by nucleosomes or other proteic factors, while the bounded pieces are sheltered, and can be retrieved and analysed. Since active, unbound regions are destroyed, their detection can only be indirect, by sequencing with a
Next Generation Sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
technique and comparison with a reference. MNase-seq utilises a micrococcal nuclease that produces a single strand cleavage on the opposite strand of the target sequence. DNase-seq employs
DNase I Deoxyribonuclease I (usually called DNase I), is an endonuclease of the DNase family coded by the human gene DNASE1. DNase I is a nuclease that cleaves DNA preferentially at phosphodiester linkages adjacent to a pyrimidine nucleotide, yielding ...
, a non-specific double strand-cleaving endonuclease. This technique has been used to such an extent that nucleosome-free regions have been labelled as DHSs, DNase I hypersensitive sites, and has been ENCODE consortium's election method for genome wide chromatin accessibility analyses. The main issue of this technique is that the cleavage distribution can be biased, lowering the quality of the results. FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements) requires as its first step crosslinking of the DNA with nucleosomes, then DNA shearing by
sonication A sonicator at the Weizmann Institute of Science during sonicationSonication is the act of applying sound energy to agitate particles in a sample, for various purposes such as the extraction of multiple compounds from plants, microalgae and seawe ...
. The free and linked fragments are separated with a traditional phenol-chloroform extraction, since the proteic fraction is stuck in the interphase while the unlinked DNA shifts to the aqueous phase and can be analysed with various methods. Sonication produces random breaks, and therefore is not subject to any kind of bias, and is also the bigger length of the fragments (200-700 nt) makes this technique suitable for wider regions, while it's unable to resolve the single nucleosome. Unlike the nuclease-based methods, FAIRE-seq allows the direct identification of the transcriptionally active sites, and a less laborious sample preparation. ATAC-seq is based on the activity of Tn5 transposase. The transposase is used to insert tags in the genome, with higher frequency on regions not covered by proteic factors. The tags are then used as adapters for PRC or other analytical tools.


Direct detection

Polymerase sensitivity in
single-molecule real-time sequencing Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a ...
made it possible for scientists to directly detect epigenetic marks such as methylation as the polymerase moves along the DNA molecule being sequenced. Several projects have demonstrated the ability to collect genome-wide epigenetic data in bacteria. Nanopore sequencing is based on changes of electrolytic current signals according to base modifications (e.g. Methylation). A
polymerase A polymerase is an enzyme ( EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template strand using base- ...
mediates the entrance of ssDNA in the pore: the ion-current variation is modulated by a section of the pore and the consequently generated difference is recorded revealing the position of CpG. Discrimination between hydroxymethylation and
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These t ...
is possible thanks to solid-state
nanopore A nanopore is a pore of nanometer size. It may, for example, be created by a pore-forming protein or as a hole in synthetic materials such as silicon or graphene. When a nanopore is present in an electrically insulating membrane, it can be used as ...
s even if the current while passing through the high-field region of the pore may be slightly influenced in it. As a reference amplified DNA is used which will not present copied methylationed sites after the PCR process. The Oxford Nanopore Technologies
MinION Places *Minions, Cornwall, a village in the United Kingdom People * Frank Minion (born 1929), American jazz and bop singer *Fred Minion, English professional footballer *Joseph Minion (born 1957), American film director and screenwriter *Marcus F ...
sequencer is a technology where, according to a hidden
Markov Markov (Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics *Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at t ...
model, it is possible to distinguish unmethylated cytosine from the methylated one even without chemical treatment that acts to enhance the signal of that modification. The data are registered commonly in picoamperes during established time. Other devices are the Nanopolish and the SignaAlign: the former expresses the frequency of a methylation in a read while the latter gives a probability of it derived from the sum of all the reads. Single-molecule real-time sequencing (SMRT) is a single-molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a
zero-mode waveguide zero-mode waveguide is an optical waveguide that guides light energy into a volume that is small in all dimensions compared to the wavelength of the light. Zero-mode waveguides have been developed for rapid parallel sensing of zeptolitre sample vol ...
(ZMW). A single DNA polymerase enzyme is bound to the bottom of a ZMW with a single molecule of DNA as a template. Each of the four DNA bases is attached to one of four different
fluorescent dyes A fluorophore (or fluorochrome, similarly to a chromophore) is a fluorescent chemical compound that can re-emit light upon light excitation. Fluorophores typically contain several combined aromatic groups, or planar or cyclic molecules with se ...
. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and the detector detects the fluorescent signal of the nucleotide incorporation. As the sequencing occurs, the polymerase enzyme kinetics shift when it encounters a region of methylation or any other base modification. When the enzyme encounters chemically modified bases, it will slow down or speed up in a uniquely identifiable way. Fluorescence pulses in SMRT sequencing are characterized not only by their emission spectra but also by their duration and by the interval between successive pulses. These metrics, defined as
pulse width The pulse width is a measure of the elapsed time between the leading and trailing edges of a single pulse of energy. The measure is typically used with electrical signals and is widely used in the fields of radar and power supplies. There are two c ...
and interpulse duration (IPD), add valuable information about DNA polymerase kinetics. Pulse width is a function of all kinetic steps after nucleotide binding and up to fluorophore release, and IPD is determined by the kinetics of nucleotide binding and polymerase translocation. In 2010 a team of scientists demonstrated the use of single-molecule real-time sequencing for direct detection of modified nucleotide in the DNA template including
N6-methyladenosine ''N''6-Methyladenosine (m6A) was originally identified and partially characterised in the 1970s, and is an abundant modification in mRNA and DNA. It is found within some viruses, and most eukaryotes including mammals, insects, plants and yeast. I ...
, 5-methylcytosine and 5-hydroxylcytosine. These various modifications affect polymerase kinetics differently, allowing discrimination between them. In 2017, another team proposed a combined bisulfite conversion with third-generation single-molecule real-time sequencing, it is called single-molecule real-time bisulfite sequencing (SMRT-BS), which is an accurate targeted CpG methylation analysis method capable of a high degree of multiplying and long read lengths (1.5 kb) without the need for PCR amplicon sub-cloning.


Theoretical modeling approaches

First mathematical models for different nucleosome states affecting gene expression were introduced in 1980s ef Later, this idea was almost forgotten, until the experimental evidence has indicated a possible role of covalent histone modifications as an epigenetic code. In the next several years, high-throughput data have indeed uncovered the abundance of epigenetic modifications and their relation to chromatin functioning which has motivated new theoretical models for the appearance, maintaining and changing these patterns,. These models are usually formulated in the frame of one-dimensional lattice approaches.


See also

*
Epigenetics In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are "o ...
*
Epigenetic clock An epigenetic clock is a biochemical test that can be used to measure age. The test is based on DNA methylation levels, measuring the accumulation of methyl groups to one's DNA molecules. History The strong effects of age on DNA methylation le ...
*
Genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
*
Human Epigenome Project Human Epigenome Project (HEP) is a multinational science project, with the stated aim to "identify, catalog, and interpret genome-wide DNA methylation patterns of all human genes in all major tissues". It is financed by government funds as well as ...
*
Epigenomics AG Epigenomics AG is a molecular diagnostics company headquartered in Berlin, Germany with a wholly owned subsidiary, Epigenomics Inc. based in Seattle, WA. History Epigenomics was founded in Berlin, Germany in 1998 by serial entrepreneur Dr Alexand ...
* Single cell epigenomics


Notes


References

* *


Further reading

* * * {{Genomics Epigenetics Genomics