In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of
linear
Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between two sets of data. It is the ratio between the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...
of two variables and the product of their
standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1 (as 1 would represent an unrealistically perfect correlation).
Naming and history
It was developed by
Karl Pearson from a related idea introduced by
Francis Galton
Sir Francis Galton, FRS FRAI (; 16 February 1822 – 17 January 1911), was an English Victorian era polymath: a statistician, sociologist, psychologist, anthropologist, tropical explorer, geographer, inventor, meteorologist, proto- ...
in the 1880s, and for which the mathematical formula was derived and published by
Auguste Bravais
Auguste Bravais (; 23 August 1811, Annonay, Ardèche – 30 March 1863, Le Chesnay, France) was a French physicist known for his work in crystallography, the conception of Bravais lattices, and the formulation of Bravais law. Bravais also studied ...
in 1844. The naming of the coefficient is thus an example of
Stigler's Law
Stigler's law of eponymy, proposed by University of Chicago statistics professor Stephen Stigler in his 1980 publication ''Stigler’s law of eponymy'', states that no scientific discovery is named after its original discoverer. Examples include ...
.
Definition
Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The form of the definition involves a "product moment", that is, the mean (the first
moment about the origin) of the product of the mean-adjusted random variables; hence the modifier ''product-moment'' in the name.
For a population
Pearson's correlation coefficient, when applied to a
population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
, is commonly represented by the Greek letter ''ρ'' (rho) and may be referred to as the ''population correlation coefficient'' or the ''population Pearson correlation coefficient''. Given a pair of random variables
, the formula for ''ρ''
[Real Statistics Using Excel: Correlation: Basic Concepts](_blank)
retrieved 22 February 2015 is:
where:
*
is the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...
*
is the
standard deviation of
*
is the
standard deviation of
The formula for
can be expressed in terms of mean and expectation. Since
:
the formula for
can also be written as
where:
*
and
are defined as above
*
is the
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the '' ari ...
of
*
is the mean of
*
is the
expectation.
The formula for
can be expressed in terms of uncentered moments. Since
: