Cophenetic Correlation
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, and especially in
biostatistics Biostatistics (also known as biometry) are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experime ...
, cophenetic correlation (more precisely, the cophenetic correlation coefficient) is a measure of how faithfully a
dendrogram A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: * in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. ...
preserves the pairwise distances between the original unmodeled data points. Although it has been most widely applied in the field of biostatistics (typically to assess cluster-based models of DNA sequences, or other
taxonomic Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
models), it can also be used in other fields of inquiry where raw data tend to occur in clumps, or clusters. This coefficient has also been proposed for use as a test for nested clusters.


Calculating the cophenetic correlation coefficient

Suppose that the original data have been modeled using a cluster method to produce a dendrogram ; that is, a simplified model in which data that are "close" have been grouped into a hierarchical tree. Define the following distance measures. * x(i,j) = , X_i-X_j, , the Euclidean distance between the ''i''th and ''j''th observations. * t(i,j), the dendrogrammatic distance between the model points T_i and T_j. This distance is the height of the node at which these two points are first joined together. Then, letting \bar be the average of the ''x''(''i'', ''j''), and letting \bar be the average of the ''t''(''i'', ''j''), the cophenetic correlation coefficient ''c'' is given by : c = \frac .


Software implementation

It is possible to calculate the cophenetic correlation in R using the dendextend R package. In
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
, the
SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal ...
package also has an implementation. In
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
, the Statistic and Machine Learning toolbox contains an implementation.


See also

*
Cophenetic In the clustering of biological information such as data from microarray experiments, the cophenetic similarity or cophenetic distanceSokal, R. R. and F. J. Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11:33-40 of two ob ...


References


External links


Numerical example of cophenetic correlation

Computing and displaying Cophenetic distances
{{DEFAULTSORT:Cophenetic Correlation Covariance and correlation