In statistics, Tschuprow's ''T'' is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to

Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...

, coinciding with it for square

contingency tables In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business ...

. It was published by Alexander Tschuprow (alternative spelling: Chuprov) in 1939.Tschuprow, A. A. (1939) ''Principles of the Mathematical Theory of Correlation''; translated by M. Kantorowitsch. W. Hodge & Co.

Definition

For an ''r'' × ''c'' contingency table with ''r'' rows and ''c'' columns, let

\pi_

be the proportion of the population in cell

(i,j)

and let :

\pi_=\sum_^c\pi_

and

\pi_=\sum_^r\pi_.

Then the

mean square contingency In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a ...

is given as :

\phi^2 = \sum_^r\sum_^c\frac ,

and Tschuprow's ''T'' as :

T = \sqrt .

Properties

''T'' equals zero if and only if independence holds in the table, i.e., if and only if

\pi_=\pi_\pi_

. ''T'' equals one if and only there is perfect dependence in the table, i.e., if and only if for each ''i'' there is only one ''j'' such that

\pi_>0

and vice versa. Hence, it can only equal 1 for square tables. In this it differs from

, which can be equal to 1 for any rectangular table.

Estimation

If we have a multinomial sample of size ''n'', the usual way to estimate ''T'' from the data is via the formula :

\hat T = \sqrt ,

where

p_=n_/n

is the proportion of the sample in cell

(i,j)

. This is the

empirical value Empirical evidence for a proposition is evidence, i.e. what supports or counters this proposition, that is constituted by or accessible to sense experience or experimental procedure. Empirical evidence is of central importance to the sciences and ...

of ''T''. With

\chi^2

the Pearson chi-square statistic, this formula can also be written as :

\hat T = \sqrt .

References

{{Reflist * Liebetrau, A. (1983). Measures of Association (Quantitative Applications in the Social Sciences). Sage Publications Summary statistics for contingency tables

Definition

Properties

Estimation

See also

References