A correlation coefficient is a
numerical measure
Measurement is the quantification of attributes of an object or event, which can be used to compare with other objects or events.
In other words, measurement is a process of determining how large or small a physical quantity is as compared ...
of some type of
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
, meaning a statistical relationship between two
variables. The variables may be two
column
A column or pillar in architecture and structural engineering is a structural element that transmits, through compression, the weight of the structure above to other structural elements below. In other words, a column is a compression member. ...
s of a given
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
of observations, often called a
sample
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of s ...
, or two components of a
multivariate random variable
In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. ...
with a known
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
* Probability distribution, the probability of a particular value or value range of a vari ...
.
Several types of correlation coefficient exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible agreement and 0 the strongest possible disagreement. As tools of analysis, correlation coefficients present certain problems, including the propensity of some types to be distorted by
outliers
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
and the possibility of incorrectly being used to infer a
causal relationship
Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state, or object (''a'' ''cause'') contributes to the production of another event, process, state, or object (an ''effect'') where the ca ...
between the variables (for more, see
Correlation does not imply causation
The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The id ...
).
Types
There are several different measures for the degree of correlation in data, depending on the kind of data: principally whether the data is a measurement, ordinal, or categorical.
Pearson
The
Pearson product-moment correlation coefficient
In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ...
, also known as , , or ''Pearson's'' , is a measure of the strength and direction of the ''linear'' relationship between two variables that is defined as the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...
of the variables divided by the product of their standard deviations.
This is the best-known and most commonly used type of correlation coefficient. When the term "correlation coefficient" is used without further qualification, it usually refers to the Pearson product-moment correlation coefficient.
Intra-class
Intraclass correlation
In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly u ...
(ICC) is a descriptive statistic that can be used, when quantitative measurements are made on units that are organized into groups; it describes how strongly units in the same group resemble each other.
Rank
Rank correlation
In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment o ...
is a measure of the relationship between the rankings of two variables, or two rankings of the same variable:
*
Spearman's rank correlation coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'', named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation ( statistical dependence between ...
is a measure of how well the relationship between two variables can be described by a monotonic function.
*The
Kendall tau rank correlation coefficient
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A τ test is a n ...
is a measure of the portion of ranks that match between two data sets.
*
Goodman and Kruskal's gamma
In statistics, Goodman and Kruskal's gamma is a measure of rank correlation, i.e., the similarity of the orderings of the data when ranked by each of the quantities. It measures the strength of association of the cross tabulated data when both va ...
is a measure of the strength of association of the cross tabulated data when both variables are measured at the ordinal level.
Tetrachoric and polychoric
The
polychoric correlation
In statistics, polychoric correlation{{Cite web, url=https://support.sas.com/documentation/cdl/en/procstat/65543/HTML/default/viewer.htm#procstat_corr_details14.htm, title=Base SAS(R) 9.3 Procedures Guide: Statistical Procedures, Second Edition, we ...
coefficient measures association between two ordered-categorical variables. It's technically defined as the estimate of the Pearson correlation coefficient one would obtain if:
# The two variables were measured on a continuous scale, instead of as ordered-category variables.
# The two continuous variables followed a
bivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One ...
.
When both variables are dichotomous instead of ordered-categorical, the
polychoric correlation
In statistics, polychoric correlation{{Cite web, url=https://support.sas.com/documentation/cdl/en/procstat/65543/HTML/default/viewer.htm#procstat_corr_details14.htm, title=Base SAS(R) 9.3 Procedures Guide: Statistical Procedures, Second Edition, we ...
coefficient is called the tetrachoric correlation coefficient.
See also
*
Correlation disattenuation
Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero (the underestimation of its absolute value), caused by errors in the independent variable.
Consider fitting a straight line f ...
*
Coefficient of determination
In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
It is a statistic used i ...
*
Correlation and dependence
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
*
Correlation ratio In statistics, the correlation ratio is a measure of the curvilinear relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. The measure is defined as the ''ratio'' of t ...
*
Distance correlation In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is ze ...
*
Goodness of fit
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measure ...
, any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model
*
Multiple correlation
In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the variable's values and the best predictions ...
*
Partial correlation
In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two va ...
Notes
References
{{Portal bar, Mathematics
Correlation indicators
Mathematical terminology