In
statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'', named after
Charles Spearman and often denoted by the Greek letter
(rho) or as
, is a
nonparametric
Nonparametric statistics is the branch of statistics that is not based solely on Statistical parameter, parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based ...
measure of
rank correlation (
statistical dependence between the
ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second.
In mathematics, this is known as a weak order or total preorder of o ...
s of two
variables). It assesses how well the relationship between two variables can be described using a
monotonic function.
The Spearman correlation between two variables is equal to the
Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.
Intuitively, the Spearman correlation between two variables will be high when observations have a similar (or identical for a correlation of 1)
rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully opposed for a correlation of −1) rank between the two variables.
Spearman's coefficient is appropriate for both
continuous and discrete
ordinal variables. Both Spearman's
and
Kendall's can be formulated as special cases of a more
general correlation coefficient.
Definition and calculation
The Spearman correlation coefficient is defined as the
Pearson correlation coefficient
In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...
between the
rank variables.
For a sample of size ''n'', the ''n''
raw scores
are converted to ranks
, and
is computed as
:
where
:
denotes the usual
Pearson correlation coefficient
In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...
, but applied to the rank variables,
:
is the
covariance of the rank variables,
:
and
are the
standard deviations of the rank variables.
Only if all ''n'' ranks are ''distinct integers'', it can be computed using the popular formula
:
where
:
is the difference between the two ranks of each observation,
: ''n'' is the number of observations.
Consider a bivariate sample
with corresponding ranks
.
Then the Spearman correlation coefficient of
is
:
where, as usual,
,
,
,
and
,
We shall show that
can be expressed purely in terms of
,
provided we assume that there be no ties within each sample.
Under this assumption, we have that
can be viewed as random variables
distributed like a uniformly distributed random variable,
, on
.
Hence