This a list of

statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...

procedures which can be used for the analysis of categorical data, also known as data on the

nominal scale Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...

and as categorical variables.

General tests

* Bowker's test of symmetry *

Categorical distribution In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that ca ...

, general model *

Chi-squared test A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format ...

Cochran–Armitage test for trend The Cochran–Armitage test for trend, named for William Cochran and Peter Armitage, is used in categorical data analysis when the aim is to assess for the presence of an association between a variable with two categories and an ordinal variable ...

Cochran–Mantel–Haenszel statistics In statistics, the Cochran–Mantel–Haenszel test (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome suc ...

Correspondence analysis Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rat ...

Cronbach's alpha Cronbach's alpha (Cronbach's \alpha), also known as tau-equivalent reliability (\rho_T) or coefficient alpha (coefficient \alpha), is a reliability coefficient that provides a method of measuring internal consistency of tests and measures. Numer ...

* Diagnostic odds ratio * G-test * Generalized estimating equations *

Generalized linear models In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...

Krichevsky–Trofimov estimator In information theory, given an unknown stationary source with alphabet ''A'' and a sample ''w'' from , the Krichevsky–Trofimov (KT) estimator produces an estimate ''p'i''(''w'') of the probability of each symbol ''i'' ∈ ''A''. ...

* Kuder–Richardson Formula 20 *

Linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...

Multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...

Multinomial logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the pro ...

Multinomial probit In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logi ...

Multiple correspondence analysis In statistics, multiple correspondence analysis (MCA) is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Eucl ...

Odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (du ...

Poisson regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the loga ...

* Powered partial least squares discriminant analysis * Qualitative variation * Randomization test for goodness of fit *

Relative risk The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association b ...

Stratified analysis In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...

* Tetrachoric correlation *

Uncertainty coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy. Definition ...

Wald test In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is th ...

Binomial data

Bernstein inequalities (probability theory) In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X'n'' be independent Bernoulli trial, Bernoulli random variab ...

Binomial regression In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial ha ...

Binomial proportion confidence interval In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trial, Bernoulli trials). In other words, a binomia ...

Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from th ...

Chernoff bound In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...

Gauss's inequality In probability theory, Gauss's inequality (or the Gauss inequality) gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mode. Let ''X'' be a unimodal random variable with mode ''m'', ...

Markov's inequality In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Marko ...

Rule of succession In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. The formula is still used, particularly to estimate underlying probabilities when ...

* Rule of three (medicine) * Vysochanskiï–Petunin inequality

2 × 2 tables

* Diagnostic odds ratio *

Fisher's exact test Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, a ...

* G-test *

McNemar's test In statistics, McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal fre ...

* Yates's correction for continuity

Measures of association

* Aickin's α * Andres and Marzo's delta * Bangdiwala's B * Bennett, Alpert, and Goldstein’s S * Brennan and Prediger’s κ * Coefficient of colligation - Yule's Y * Coefficient of consistency * Coefficient of raw agreement * Conger’s Kappa * Contingency coefficient – Pearson's C *

Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...

* Dice's coefficient *

Fleiss' kappa Fleiss' kappa (named after Joseph L. Fleiss) is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. This contrasts with o ...

Goodman and Kruskal's lambda In probability theory and statistics, Goodman & Kruskal's lambda (\lambda) is a measure of proportional reduction in error in cross tabulation analysis. For any sample with a nominal independent variable and dependent variable (or ones that can ...

* Guilford’s G * Gwet’s AC1 * Hanssen–Kuipers discriminant * Heidke skill score *

Jaccard index The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is fre ...

* Janson and Vegelius’ C * Kappa statistics * Klecka's tau * Krippendorff's Alpha * Kuipers performance index *

Matthews correlation coefficient In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a ...

Phi coefficient In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a ...

* Press' Q * Renkonen similarity index * Prevalence adjusted bias adjusted kappa * Sakoda's adjusted Pearson's C * Scott's Pi * Sørensen similarity index * Stouffer’s Z * True skill statistic *

Tschuprow's T In statistics, Tschuprow's ''T'' is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to Cramér's V, coinciding with it for square contingency tables. It was published by ...

* Tversky index * Von Eye's kappa

Categorical manifest variables as latent variable

Latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...

Item response theory In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measur ...

***

Rasch model The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...

Latent class analysis In statistics, a latent class model (LCM) relates a set of observed (usually discrete) multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is di ...

General tests

Binomial data

2 × 2 tables

Measures of association

Categorical manifest variables as latent variable

See also