Bayesian Tool For Methylation Analysis
   HOME

TheInfoList



OR:

Bayesian tool for methylation analysis, also known as BATMAN, is a statistical tool for analysing methylated DNA immunoprecipitation (MeDIP) profiles. It can be applied to large datasets generated using either oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq), providing a quantitative estimation of absolute
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These t ...
state in a region of interest.


Theory

MeDIP (methylated DNA immunoprecipitation) is an experimental technique used to assess DNA methylation levels by using an antibody to isolate methylated DNA sequences. The isolated fragments of DNA are either hybridized to a microarray chip (MeDIP-chip) or sequenced by next-generation sequencing (MeDIP-seq). While this tells you what areas of the genome are methylated, it does not give absolute methylation levels. Imagine two different genomic regions, ''A'' and ''B''. Region ''A'' has six CpGs (DNA methylation in mammalian
somatic cell A somatic cell (from Ancient Greek σῶμα ''sôma'', meaning "body"), or vegetal cell, is any biological cell forming the body of a multicellular organism other than a gamete, germ cell, gametocyte or undifferentiated stem cell. Such cells compo ...
s generally occurs at CpG dinucleotides), three of which are methylated. Region ''B'' has three CpGs, all of which are methylated. As the antibody simply recognizes
methylated DNA Methylated DNA immunoprecipitation (MeDIP or mDIP) is a large-scale (chromosome- or genome-wide) purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments vi ...
, it will bind both these regions equally and subsequent steps will therefore show equal signals for these two regions. This does not give the full picture of methylation in these two regions (in region ''A'' only half the CpGs are methylated, whereas in region ''B'' all the CpGs are methylated). Therefore, to get the full picture of methylation for a given region you have to normalize the signal you get from the MeDIP experiment to the number of CpGs in the region, and this is what the Batman algorithm does. Analysing the MeDIP signal of the above example would give Batman scores of 0.5 for region ''A'' (i.e. the region is 50% methylated) and 1 for region ''B'' (i.e. The region is 100% methylated). In this way Batman converts the signals from MeDIP experiments to absolute methylation levels.


Development of Batman

The core principle of the Batman algorithm is to model the effects of varying density of CpG dinucleotides, and the effect this has on MeDIP enrichment of DNA fragments. The basic assumptions of Batman: # Almost all DNA methylation in
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
s happens at CpG dinucleotides. # Most CpG-poor regions are constitutively methylated while most CpG-rich regions (CpG islands) are constitutively unmethylated. # There are no fragment biases in MeDIP experiment (approximate range of DNA fragment sizes is 400–700 bp). # The errors on the
microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
are normally distributed with precision. # Only methylated CpGs contribute to the observed signal. # CpG methylation state is generally highly correlated over hundreds of bases, so CpGs grouped together in 50- or 100-bp windows would have the same methylation state. Basic parameters in Batman: # Ccp: coupling factor between probe p and CpG dinucleotide ''c'', is defined as the fraction of DNA molecules hybridizing to probe ''p'' that contain the CpG ''c''. # Ctot : total CpG influence parameter, is defined as the sum of coupling factors for any given probe, which provides a measure of local CpG density # mc : the methylation status at position ''c'', which represents the fraction of chromosomes in the sample on which it is methylated. mc is considered as a
continuous variable In mathematics and statistics, a quantitative variable may be continuous or discrete if they are typically obtained by ''measuring'' or ''counting'', respectively. If it can take on two particular real values such that it can also take on all re ...
since the majority samples used in MeDIP studies contain multiple cell-types. Based on these assumptions, the signal from the MeDIP channel of the MeDIP-chip or MeDIP-seq experiment depends on the degree of enrichment of DNA fragments overlapping that probe, which in turn depends on the amount of antibody binding, and thus to the number of methylated CpGs on those fragments. In Batman model, the complete dataset from a MeDIP/chip experiment, A, can be represented by a statistical model in the form of the following
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
: : f(A \mid m) = \prod_p \phi \left(A_p \mid A_\text + r\sum_c C_, \nu^ \right) , where \phi(''x'', ''μ'', ''σ''2) is a Gaussian probability density function. Standard Bayesian techniques can be used to infer ''f''(''m'', ''A''), that is, the distribution of likely methylation states given one or more sets of MeDIP-chip/MeDIP-seq outputs. To solve this inference problem, Batman uses
nested sampling The nested sampling algorithm is a computational approach to the Bayesian statistics problems of comparing models and generating samples from posterior distributions. It was developed in 2004 by physicist John Skilling. Background Bayes' theorem c ...
(http://www.inference.phy.cam.ac.uk/bayesys/) to generate 100 independent samples from ''f''(''m'', ''A'') for each tiled region of the genome, then summarizes the most likely methylation state in 100-bp windows by fitting beta distributions to these samples. The modes of the most likely beta distributions were used as final methylation calls.


Limitations

It may be useful to take the following points into account when considering using Batman: # Batman is not a piece of software; it is an algorithm performed using the command prompt. As such it is not especially user-friendly and is quite a computationally technical process. # Because it is non-commercial, there is very little support when using Batman beyond what is in the manual. # It is quite time-consuming (it can take several days to analyse one chromosome). (Note: In one government lab, running Batman on a set of 100 Agilent Human DNA Methylation Arrays (about 250,000 probes per array) took less than an hour to complete in Agilent's Genomic Workbench software. Our computer had a 2GHz processor, 24 GB RAM, 64-bit Windows 7.) # Copy number variation (CNV) has to be accounted for. For example, the score for a region with a
CNV value CNV may refer to: * Chinese New Version, a Chinese language Bible translation * Choroidal neovascularization in ophthalmology * City of North Vancouver in British Columbia, as opposed to its surrounding District of North Vancouver * Christelijk Nati ...
of 1.6 in a cancer (a loss of 0.4 compared to normal) would have to be multiplied by 1.25 (=2/1.6) to compensate for the loss. # One of the basic assumptions of Batman is that all DNA methylation occurs at CpG dinucleotides. While this is generally the case for vertebrate somatic cells, there are situations where there is widespread non-CpG methylation, such as in plant cells and embryonic stem cells.


References

{{DEFAULTSORT:Bayesian Tool For Methylation Analysis (Batman) Computational science Methylation analysis (Batman)