In
statistics, Tschuprow's ''T'' is a measure of
association
Association may refer to:
*Club (organization), an association of two or more people united by a common interest or goal
*Trade association, an organization founded and funded by businesses that operate in a specific industry
*Voluntary associatio ...
between two
nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to
Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and ...
, coinciding with it for square
contingency tables
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...
.
It was published by
Alexander Tschuprow (alternative spelling: Chuprov) in 1939.
[Tschuprow, A. A. (1939) ''Principles of the Mathematical Theory of Correlation''; translated by M. Kantorowitsch. W. Hodge & Co.]
Definition
For an ''r'' × ''c'' contingency table with ''r'' rows and ''c'' columns, let
be the proportion of the population in cell
and let
:
and
Then the
mean square contingency is given as
:
and Tschuprow's ''T'' as
:
Properties
''T'' equals zero if and only if independence holds in the table, i.e., if and only if
. ''T'' equals one if and only there is perfect dependence in the table, i.e., if and only if for each ''i'' there is only one ''j'' such that
and vice versa. Hence, it can only equal 1 for square tables. In this it differs from
Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and ...
, which can be equal to 1 for any rectangular table.
Estimation
If we have a multinomial sample of size ''n'', the usual way to estimate ''T'' from the data is via the formula
:
where
is the proportion of the sample in cell
. This is the
empirical value of ''T''. With
the
Pearson chi-square statistic, this formula can also be written as
:
See also
Other measures of correlation for nominal data:
*
Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and ...
*
Phi coefficient
In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as ...
*
Uncertainty coefficient
In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy.
Definition
S ...
*
Lambda coefficient
Other related articles:
*
Effect size
In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
References
{{Reflist
* Liebetrau, A. (1983). Measures of Association (Quantitative Applications in the Social Sciences). Sage Publications
Summary statistics for contingency tables