Detrended correspondence analysis (DCA) is a multivariate
statistical
Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
technique widely used by
ecologist
Ecology () is the study of the relationships between living organisms, including humans, and their physical environment. Ecology considers organisms at the individual, population, community, ecosystem, and biosphere level. Ecology overlaps wi ...
s to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify
ecological community data. DCA is frequently used to suppress artifacts inherent in most other
multivariate analyses when applied to
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
data.
History
DCA was created in 1979 by Mark Hill of the
United Kingdom's Institute for Terrestrial Ecology (now
merged into
Centre for Ecology and Hydrology
The UK Centre for Ecology & Hydrology (UKCEH) is a centre for excellence in environmental science across water, land and air.
The organisation has a long history of investigating, monitoring and modelling environmental change, and its science ma ...
) and implemented in
FORTRAN code package called DECORANA (Detrended Correspondence Analysis), a
correspondence analysis method. DCA is sometimes erroneously referred to as DECORANA; however, DCA is the underlying algorithm, while DECORANA is a tool implementing it.
Issues addressed
According to Hill and Gauch, DCA suppresses two artifacts inherent in most other multivariate analyses when applied to
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
data. An example is a time-series of plant species colonising a new habitat; early
successional species are replaced by mid-successional species, then by late successional ones (see example below). When such data are analysed by a standard
ordination
Ordination is the process by which individuals are Consecration, consecrated, that is, set apart and elevated from the laity class to the clergy, who are thus then authorization, authorized (usually by the religious denomination, denominational ...
such as a correspondence analysis:
* the ordination scores of the samples will exhibit the 'edge effect', i.e. the variance of the scores at the beginning and the end of a regular succession of species will be considerably smaller than that in the middle,
* when presented as a graph the points will be seen to follow a
horseshoe
A horseshoe is a fabricated product designed to protect a horse hoof from wear. Shoes are attached on the palmar surface (ground side) of the hooves, usually nailed through the insensitive hoof wall that is anatomically akin to the human toen ...
shaped curve rather than a straight line ('arch effect'), even though the process under analysis is a steady and continuous change that human intuition would prefer to see as a linear trend.
Outside ecology, the same artifacts occur when gradient data are analysed (e.g. soil properties along a transect running between 2 different geologies, or behavioural data over the lifespan of an individual) because the curved projection is an accurate representation of the shape of the data in multivariate space.
Ter Braak and Prentice (1987, p. 121) cite a
simulation
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
study analysing two-dimensional species packing models resulting in a better performance of DCA compared to CA.
Method
DCA is an
iterative algorithm
In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...
that has shown itself to be a highly reliable and useful tool for data exploration and summary in community ecology (Shaw 2003). It starts by running a standard ordination (CA or reciprocal averaging) on the data, to produce the initial horse-shoe curve in which the 1st ordination axis distorts into the 2nd axis. It then divides the first axis into segments (default = 26), and rescales each segment to have mean value of zero on the 2nd axis - this effectively squashes the curve flat. It also rescales the axis so that the ends are no longer compressed relative to the middle, so that 1 DCA unit approximates to the same rate of turnover all the way through the data: the rule of thumb is that 4 DCA units mean that there has been a total turnover in the community.
Ter Braak and Prentice (1987, p. 122) warn against the non-linear rescaling of the axes due to robustness issues and recommend using detrending-by-polynomials only.
Drawbacks
No
significance tests are available with DCA, although there is a constrained (canonical) version called DCCA in which the axes are forced by
Multiple linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
to correlate optimally with a
linear combination of other (usually environmental) variables; this allows testing of a null model by Monte-Carlo
permutation
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
analysis.
Example
The example shows an ideal data set: The species data is in rows, samples in columns. For each sample along the gradient, a new species is introduced but another species is no longer present. The result is a sparse matrix. Ones indicate the presence of a species in a sample. Except at the edges each sample contains five species.
The plot of the first two axes of the correspondence analysis result on the right hand side clearly shows the disadvantages of this procedure: the edge effect, i.e. the points are clustered at the edges of the first axis, and the arch effect.
Software
An open source implementation of DCA, based on the original FORTRAN code, is available
in the vegan R-package.
See also
*
Eigenanalysis
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...
*
Ordination (statistics)
Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). Ordination orders objects that are characterized by values on ...
*
Seriation (archaeology)
In archaeology, seriation is a Relative dating#Archaeology, relative dating method in which assemblage (archaeology), assemblages or artifact (archaeology), artifacts from numerous sites in the same culture are placed in chronological order. Wher ...
– including additional examples for the arch effect
*
Principal Component Analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
References
* Hill, M.O. (1979). ''DECORANA — A FORTRAN program for Detrended Correspondence Analysis and Reciprocal Averaging''. Section of Ecology and Systematics, Cornell University, Ithaca, New York, 52pp.
* Hill, M.O. and Gauch, H.G. (1980). Detrended Correspondence Analysis: An Improved Ordination Technique. ''Vegetatio'' 42, 47–58.
* Oksanen J and Minchin PR (1997). Instability of ordination results under changes in input data order: explanation and remedies. ''Journal of vegetation science'' 8, 447–454
* Shaw PJA (2003). ''Multivariate Statistics for the Environmental Sciences''. London: Hodder Arnold
* Ter Braak, C.J.F. and Prentice, I.C. (1988). A Theory of Gradient Analysis. ''Advances in Ecological Research'' 18, 271–371. {{ISBN, 0-12-013918-9. Reprinted in: Ter Braak, C.J.F. (1987). ''Unimodal models to relate species to environment''. Wageningen: PhD thesis Agricultural Mathematics Group, 101–146.
External links
PAST (PAlaeontological STatistics)— free software including DCA with modifications according to Oksanen and Minchin (1997)
WINBASP— free software including DCA with detrending-by-polynomials according to Ter Braak and Prentice (1988)
vegan: Community Ecology Packagefor
R — free software including the function decorana: Detrended Correspondence Analysis and Basic Reciprocal Averaging from Hill and Gauch (1980)
Dimension reduction