Volcano Plot (statistics)
   HOME

TheInfoList



OR:

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. It plots significance versus fold-change on the y and x axes, respectively. These plots are increasingly common in omic experiments such as
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
,
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
, and
metabolomics Metabolomics is the scientific study of chemical processes involving metabolites, the small molecule substrates, intermediates, and products of cell metabolism. Specifically, metabolomics is the "systematic study of the unique chemical fingerprin ...
where one often has a list of many thousands of replicate data points between two conditions and one wishes to quickly identify the most meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a
p value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
from an
ANOVA Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...
model) with the magnitude of the change, enabling quick visual identification of those data-points (genes, etc.) that display large magnitude changes that are also
statistically significant In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
. A volcano plot is constructed by plotting the negative logarithm of the
p value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
on the
y axis A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in ...
(usually base 10). This results in data points with low p values (highly significant) appearing toward the top of the plot. The
x axis A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in t ...
is the logarithm of the
fold change Fold change is a measure describing how much a quantity changes between an original and a subsequent measurement. It is defined as the ratio between the two quantities; for quantities ''A'' and ''B'' the fold change of ''B'' with respect to ''A'' ...
between the two conditions. The logarithm of the fold change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or right-hand sides. These represent values that display large magnitude fold changes (hence being left or right of center) as well as high
statistical significance In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
(hence being toward the top). Additional information can be added by coloring the points according to a third dimension of data (such as signal intensity), but this is not uniformly employed. Volcano plots are also used to graphically display a
significance analysis of microarrays Microarray analysis techniques are used in interpreting the data generated from experiments on DNA (Gene chip analysis), RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many ...
(SAM) gene selection criterion, an example of
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) * Regularization (solid modeling) * Regularization Law, an Israeli law intended to retroactively legalize settlements See also ...
. The concept of volcano plot can be generalized to other applications, where the
x axis A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in t ...
is related to a measure of the strength of a statistical signal, and
y axis A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in ...
is related to a measure of the
statistical significance In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
of the signal. For example, in a
genetic association Genetic association is when one or more genotypes within a population co-occur with a phenotypic trait more often than would be expected by chance occurrence. Studies of genetic association aim to test whether single-locus alleles or genotype fre ...
case-control study, such as
Genome-wide association study In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of Single-nucleotide polymorphism, genetic variants in different i ...
, a point in a volcano plot represents a
single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently lar ...
. Its x value can be the logarithm of the
odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due ...
and its y value can be -log10 of the
p value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
from a
Chi-square test A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables ...
or a Chi-square
test statistic A test statistic is a statistic (a quantity derived from the sample) used in statistical hypothesis testing.Berger, R. L.; Casella, G. (2001). ''Statistical Inference'', Duxbury Press, Second Edition (p.374) A hypothesis test is typically specif ...
. Volcano plots show a characteristic upwards two arm shape because the x axis, i.e. the underlying log2-fold changes, are generally
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
whereas the y axis, the log10-p values, tend toward greater significance for fold-changes that deviate more strongly from zero. The density of the normal distribution takes the form : y = e^. So the \ln of that is :\ln(y) = -x^2 and the negative \ln is :-\ln(y) = x^2 which is a parabola whose arms reach upwards on the left and right sides. The upper bound of the data is one parabola and the lower bound is another parabola.


References


External links


NCI Documentation describing statistical methods to analyze microarrays, including volcano plots


{{DEFAULTSORT:Volcano Plot (Statistics) Bioinformatics Statistical charts and diagrams