In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, specifically
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, a binary regression estimates a relationship between one or more
explanatory variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
s and a single output
binary variable
Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra.
Binary data occurs in many different technical and scientific fields, wher ...
. Generally the probability of the two alternatives is modeled, instead of simply outputting a single value, as in
linear regression.
Binary regression is usually analyzed as a special case of
binomial regression
In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial h ...
, with a single outcome (
), and one of the two alternatives considered as "success" and coded as 1: the value is the
count
Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
of successes in 1 trial, either 0 or 1. The most common binary regression models are the
logit model
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression ana ...
(
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
) and the
probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
(
probit regression
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to e ...
).
Applications
Binary regression is principally applied either for prediction (
binary classification
Binary classification is the task of classifying the elements of a set into two groups (each called ''class'') on the basis of a classification rule. Typical binary classification problems include:
* Medical testing to determine if a patient has c ...
), or for estimating the
association
Association may refer to:
*Club (organization), an association of two or more people united by a common interest or goal
*Trade association, an organization founded and funded by businesses that operate in a specific industry
*Voluntary associatio ...
between the explanatory variables and the output. In economics, binary regressions are used to model
binary choice
In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such ...
.
Interpretations
Binary regression models can be interpreted as
latent variable model
A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables.
It is assumed that the responses on the indicators or manifest variabl ...
s, together with a measurement model; or as probabilistic models, directly modeling the probability.
Latent variable model
The latent variable interpretation has traditionally been used in
bioassay
A bioassay is an analytical method to determine the concentration or potency of a substance by its effect on living animals or plants (''in vivo''), or on living cells or tissues(''in vitro''). A bioassay can be either quantal or quantitative, dir ...
, yielding the
probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
, where normal variance and a cutoff are assumed. The latent variable interpretation is also used in
item response theory
In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring ...
(IRT).
Formally, the latent variable interpretation posits that the outcome ''y'' is related to a vector of explanatory variables ''x'' by
: