This a list of
statistical
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...
procedures which can be used for the analysis of categorical data, also known as data on the
nominal scale
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
and as
categorical variables.
General tests
* Bowker's test of symmetry
*
Categorical distribution
In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that ca ...
, general model
*
Chi-squared test
A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format ...
*
Cochran–Armitage test for trend
The Cochran–Armitage test for trend, named for William Cochran and Peter Armitage, is used in categorical data analysis when the aim is to assess for the presence of an association between a variable with two categories and an ordinal variable ...
*
Cochran–Mantel–Haenszel statistics
In statistics, the Cochran–Mantel–Haenszel test (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome suc ...
*
Correspondence analysis Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rat ...
*
Cronbach's alpha
Cronbach's alpha (Cronbach's \alpha), also known as tau-equivalent reliability (\rho_T) or coefficient alpha (coefficient \alpha), is a reliability coefficient that provides a method of measuring internal consistency of tests and measures. Numer ...
*
Diagnostic odds ratio
*
G-test
*
Generalized estimating equations
*
Generalized linear models
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
*
Krichevsky–Trofimov estimator In information theory, given an unknown stationary source with alphabet ''A'' and a sample ''w'' from , the Krichevsky–Trofimov (KT) estimator produces an estimate ''p'i''(''w'') of the probability of each symbol ''i'' ∈ ''A''. ...
*
Kuder–Richardson Formula 20
*
Linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
*
Multinomial distribution
In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...
*
Multinomial logit
In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the pro ...
*
Multinomial probit
In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logi ...
*
Multiple correspondence analysis In statistics, multiple correspondence analysis (MCA) is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Eucl ...
*
Odds ratio
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (du ...
*
Poisson regression
In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the loga ...
* Powered partial least squares discriminant analysis
*
Qualitative variation
* Randomization test for goodness of fit
*
Relative risk
The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association b ...
*
Stratified analysis
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistic ...
*
Tetrachoric correlation
*
Uncertainty coefficient
In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy.
Definition
...
*
Wald test
In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is th ...
Binomial data
*
Bernstein inequalities (probability theory) In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X'n'' be independent Bernoulli trial, Bernoulli random variab ...
*
Binomial regression
In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial ha ...
*
Binomial proportion confidence interval
In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trial, Bernoulli trials). In other words, a binomia ...
*
Chebyshev's inequality
In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from th ...
*
Chernoff bound
In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...
*
Gauss's inequality
In probability theory, Gauss's inequality (or the Gauss inequality) gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mode.
Let ''X'' be a unimodal random variable with mode ''m'', ...
*
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Marko ...
*
Rule of succession
In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. The formula is still used, particularly to estimate underlying probabilities when ...
*
Rule of three (medicine)
*
Vysochanskiï–Petunin inequality
2 × 2 tables
*
Chi-squared test
A chi-squared test (also chi-square or test) is a statistical hypothesis test used in the analysis of contingency tables
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a matrix format ...
*
Diagnostic odds ratio
*
Fisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, a ...
*
G-test
*
Odds ratio
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (du ...
*
Relative risk
The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association b ...
*
McNemar's test
In statistics, McNemar's test is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal fre ...
*
Yates's correction for continuity
Measures of association
* Aickin's α
*
Andres and Marzo's delta
*
Bangdiwala's B
*
Bennett, Alpert, and Goldstein’s S
*
Brennan and Prediger’s κ
*
Coefficient of colligation - Yule's Y
*
Coefficient of consistency
*
Coefficient of raw agreement
*
Conger’s Kappa
*
Contingency coefficient – Pearson's C
*
Cramér's V In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ''c'') is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic an ...
*
Dice's coefficient
*
Fleiss' kappa
Fleiss' kappa (named after Joseph L. Fleiss) is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. This contrasts with o ...
*
Goodman and Kruskal's lambda
In probability theory and statistics, Goodman & Kruskal's lambda (\lambda) is a measure of proportional reduction in error in cross tabulation analysis. For any sample with a nominal independent variable and dependent variable (or ones that can ...
*
Guilford’s G
*
Gwet’s AC1
*
Hanssen–Kuipers discriminant
*
Heidke skill score
*
Jaccard index
The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is fre ...
*
Janson and Vegelius’ C
*
Kappa statistics
*
Klecka's tau
*
Krippendorff's Alpha
*
Kuipers performance index
*
Matthews correlation coefficient
In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a ...
*
Phi coefficient
In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a ...
*
Press' Q
*
Renkonen similarity index
*
Prevalence adjusted bias adjusted kappa
*
Sakoda's adjusted Pearson's C
*
Scott's Pi
*
Sørensen similarity index
*
Stouffer’s Z
*
True skill statistic
*
Tschuprow's T
In statistics, Tschuprow's ''T'' is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to Cramér's V, coinciding with it for square contingency tables.
It was published by ...
*
Tversky index
*
Von Eye's kappa
Categorical manifest variables as latent variable
*
Latent variable model
A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables.
It is assumed that the responses on the indicators or manifest variabl ...
**
Item response theory
In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measur ...
***
Rasch model
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...
**
Latent class analysis In statistics, a latent class model (LCM) relates a set of observed (usually discrete) multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is di ...
See also
*
Categorical distribution
In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that ca ...
{{statistics
Categorical data
In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group o ...
Analysis
Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (3 ...