In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a rank correlation is any of several statistics that measure an ordinal association—the relationship between
ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second.
In mathematics, this is known as a weak order or total preorder of o ...
s of different
ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the
significance of the relation between them. For example, two common
nonparametric
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being dist ...
methods of significance that use rank correlation are the
Mann–Whitney U test
In statistics, the Mann–Whitney ''U'' test (also called the Mann–Whitney–Wilcoxon (MWW/MWU), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that, for randomly selected values ''X'' ...
and the
Wilcoxon signed-rank test
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples., p. 350 The one-sa ...
.
Context
If, for example, one variable is the identity of a college basketball program and another variable is the identity of a college football program, one could test for a relationship between the poll rankings of the two types of program: do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to likely be a coincidence.
If there is only one variable, the identity of a college football program, but it is subject to two different poll rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls' rankings can be measured with a rank correlation coefficient.
As another example, in a
contingency table
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business i ...
with ''low income'', ''medium income'', and ''high income'' in the row variable and educational level—''no high school'', ''high school'', ''university''—in the column variable),
a rank correlation measures the relationship between income and educational level.
Correlation coefficients
Some of the more popular rank
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
statistics include
#
Spearman's ρ
#
Kendall's τ
#
Goodman and Kruskal's γ
#
Somers' D
In statistics, Somers’ ''D'', sometimes incorrectly referred to as Somer’s ''D'', is a measure of ordinal association between two possibly dependent random variables and . Somers’ ''D'' takes values between -1 when all pairs of the variable ...
An increasing rank correlation
coefficient
In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves var ...
implies increasing agreement between rankings. The coefficient is inside the interval
minus;1, 1and assumes the value:
* 1 if the agreement between the two rankings is perfect; the two rankings are the same.
* 0 if the rankings are completely independent.
* −1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the other.
Following , a ranking can be seen as a
permutation
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
of a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
of objects. Thus we can look at observed rankings as data obtained when the sample space is (identified with) a
symmetric group
In abstract algebra, the symmetric group defined over any set is the group whose elements are all the bijections from the set to itself, and whose group operation is the composition of functions. In particular, the finite symmetric group \m ...
. We can then introduce a
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathem ...
, making the symmetric group into a
metric space
In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...
. Different metrics will correspond to different rank correlations.
General correlation coefficient
Kendall 1970
showed that his
(tau) and Spearman's
(rho) are particular cases of a general correlation coefficient.
Suppose we have a set of
objects, which are being considered in relation to two properties, represented by
and
, forming the sets of values
and
. To any pair of individuals, say the
-th and the
-th we assign a
-score, denoted by
, and a
-score, denoted by
. The only requirement for these functions is that they be anti-symmetric, so
and
. (Note that in particular
if
.) Then the generalized correlation coefficient
is defined as
:
Equivalently, if all coefficients are collected into matrices
and
, with
and
, then
:
where
is the
Frobenius inner product
In mathematics, the Frobenius inner product is a binary operation that takes two matrices and returns a scalar. It is often denoted \langle \mathbf,\mathbf \rangle_\mathrm. The operation is a component-wise inner product of two matrices as though t ...
and
the
Frobenius norm
In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions).
Preliminaries
Given a field K of either real or complex numbers, let K^ be the -vector space of matrices with m rows ...
. In particular, the general correlation coefficient is the cosine of the angle between the matrices
and
.
Kendall's τ as a particular case
If
,
are the ranks of the
-member according to the
-quality and
-quality respectively, then we can define
:
The sum
is the number of concordant pairs minus the number of discordant pairs (see
Kendall tau rank correlation coefficient
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A τ test is a n ...
). The sum
is just
, the number of terms
, as is
. Thus in this case,
:
Spearman’s ρ as a particular case
If
,
are the ranks of
the
-member according to the
and the
-quality respectively,
we may consider the matrices
defined by
:
:
The sums
and
are equal,
since both
and
range from
to
.
Hence
:
To simplify this expression,
let
denote the difference in the ranks for each
.
Further, let
be a uniformly distributed discrete random variables on
.
Since the ranks
are just permutations of
,
we can view both as being random variables distributed like
.
Using basic
summation results from discrete mathematics,
it is easy to see that for the uniformly distributed random variable,
,
we have
and
and thus
.
Now, observing symmetries allows us to compute the parts of
as follows:
:
and
:
Hence
:
where
is the difference between ranks,
which is exactly
Spearman's rank correlation coefficient
In statistics, Spearman's rank correlation coefficient or Spearman's ''ρ'', named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of rank correlation ( statistical dependence between ...
.
Rank-biserial correlation
Gene Glass (1965) noted that the rank-biserial can be derived from Spearman's
. "One can derive a coefficient defined on X, the dichotomous variable, and Y, the ranking variable, which estimates Spearman's rho between X and Y in the same way that biserial r estimates Pearson's r between two normal variables” (p. 91). The rank-biserial correlation had been introduced nine years before by Edward Cureton (1956) as a measure of rank correlation when the ranks are in two groups.
Kerby simple difference formula
Dave Kerby (2014) recommended the rank-biserial as the measure to introduce students to rank correlation, because the general logic can be explained at an introductory level. The rank-biserial is the correlation used with the
Mann–Whitney U test
In statistics, the Mann–Whitney ''U'' test (also called the Mann–Whitney–Wilcoxon (MWW/MWU), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that, for randomly selected values ''X'' ...
, a method commonly covered in introductory college courses on statistics. The data for this test consists of two groups; and for each member of the groups, the outcome is ranked for the study as a whole.
Kerby showed that this rank correlation can be expressed in terms of two concepts: the percent of data that support a stated hypothesis, and the percent of data that do not support it. The Kerby simple difference formula states that the rank correlation can be expressed as the difference between the proportion of favorable evidence (''f'') minus the proportion of unfavorable evidence (''u'').
:
Example and interpretation
To illustrate the computation, suppose a coach trains long-distance runners for one month using two methods. Group A has 5 runners, and Group B has 4 runners. The stated hypothesis is that method A produces faster runners. The race to assess the results finds that the runners from Group A do indeed run faster, with the following ranks: 1, 2, 3, 4, and 6. The slower runners from Group B thus have ranks of 5, 7, 8, and 9.
The analysis is conducted on pairs, defined as a member of one group compared to a member of the other group. For example, the fastest runner in the study is a member of four pairs: (1,5), (1,7), (1,8), and (1,9). All four of these pairs support the hypothesis, because in each pair the runner from Group A is faster than the runner from Group B. There are a total of 20 pairs, and 19 pairs support the hypothesis. The only pair that does not support the hypothesis are the two runners with ranks 5 and 6, because in this pair, the runner from Group B had the faster time. By the Kerby simple difference formula, 95% of the data support the hypothesis (19 of 20 pairs), and 5% do not support (1 of 20 pairs), so the rank correlation is r = .95 - .05 = .90.
The maximum value for the correlation is r = 1, which means that 100% of the pairs favor the hypothesis. A correlation of r = 0 indicates that half the pairs favor the hypothesis and half do not; in other words, the sample groups do not differ in ranks, so there is no evidence that they come from two different populations. An effect size of r = 0 can be said to describe no relationship between group membership and the members' ranks.
References
Further reading
*
*
*
*
*
*
External links
Brief guide by experimental psychologist Karl L. Weunsch- Nonparametric effect sizes (Copyright 2015 by Karl L. Weunsch)
{{Statistics, descriptive
Covariance and correlation
Nonparametric statistics
Rankings