A correlation coefficient is a
numerical measure of some type of
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
, meaning a statistical relationship between two
variables. The variables may be two
column
A column or pillar in architecture and structural engineering is a structural element that transmits, through compression, the weight of the structure above to other structural elements below. In other words, a column is a compression member ...
s of a given
data set of observations, often called a
sample, or two components of a
multivariate random variable with a known
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
*Probability distribution, the probability of a particular value or value range of a varia ...
.
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible agreement and 0 the strongest possible disagreement. As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by
outliers and the possibility of incorrectly being used to infer a
causal relationship between the variables (for more, see
Correlation does not imply causation).
Types
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.
Pearson
The
Pearson product-moment correlation coefficient, also known as , , or ''Pearson's'' , is a measure of the strength and direction of the ''linear'' relationship between two variables that is defined as the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
of the variables divided by the product of their standard deviations.
This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.
Intra-class
Intraclass correlation
In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly ...
(ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other.
Rank
Rank correlation is a measure of the relationship between the rankings of two variables, or two rankings of the same variable:
*
Spearman's rank correlation coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'', named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation ( statistical dependence betw ...
is a measure of how well the relationship between two variables can be described by a monotonic function.
*The
Kendall tau rank correlation coefficient
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A τ test is a ...
is a measure of the portion of ranks that match between two data sets.
*
Goodman and Kruskal's gamma is a measure of the strength of association of the cross tabulated data when both variables are measured at the ordinal level.
Tetrachoric and polychoric
The
polychoric correlation coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if:
# The two variables were measured on a continuous scale, instead of as ordered-category variables.
# The two continuous variables followed a
bivariate normal distribution.
When both variables are dichotomous instead of ordered-categorical, the
polychoric correlation coefficient is called the tetrachoric correlation coefficient.
See also
*
Correlation disattenuation
*
Coefficient of determination
*
Correlation and dependence
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...
*
Correlation ratio
*
Distance correlation
*
Goodness of fit
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
, any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model
*
Multiple correlation
*
Partial correlation
Notes
References
{{Portal bar, Mathematics
Correlation indicators
Mathematical terminology