statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, specifically

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

, a binary regression estimates a relationship between one or more

explanatory variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...

s and a single output

binary variable Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...

. Generally the probability of the two alternatives is modeled, instead of simply outputting a single value, as in linear regression. Binary regression is usually analyzed as a special case of

binomial regression In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial h ...

, with a single outcome (

n = 1

), and one of the two alternatives considered as "success" and coded as 1: the value is the

count Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...

of successes in 1 trial, either 0 or 1. The most common binary regression models are the

logit model In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression ana ...

(

logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...

) and the

probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...

(

probit regression In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to e ...

Applications

Binary regression is principally applied either for prediction (

binary classification Binary classification is the task of classifying the elements of a set into two groups (each called ''class'') on the basis of a classification rule. Typical binary classification problems include: * Medical testing to determine if a patient has c ...

), or for estimating the

association Association may refer to: *Club (organization), an association of two or more people united by a common interest or goal *Trade association, an organization founded and funded by businesses that operate in a specific industry *Voluntary associatio ...

between the explanatory variables and the output. In economics, binary regressions are used to model

binary choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such ...

Interpretations

Binary regression models can be interpreted as

latent variable model A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables. It is assumed that the responses on the indicators or manifest variabl ...

s, together with a measurement model; or as probabilistic models, directly modeling the probability.

Latent variable model

The latent variable interpretation has traditionally been used in

bioassay A bioassay is an analytical method to determine the concentration or potency of a substance by its effect on living animals or plants (''in vivo''), or on living cells or tissues(''in vitro''). A bioassay can be either quantal or quantitative, dir ...

, yielding the

, where normal variance and a cutoff are assumed. The latent variable interpretation is also used in

item response theory In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring ...

(IRT). Formally, the latent variable interpretation posits that the outcome ''y'' is related to a vector of explanatory variables ''x'' by :

y=1^*>0 /math>

where y^*=x\beta +\varepsilon and \varepsilon \mid x\sim G,  is a vector of

parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

and ''G'' is a

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

. This model can be applied in many economic contexts. For instance, the outcome can be the decision of a manager whether invest to a program,

y^*

is the expected net discounted cash flow and ''x'' is a vector of variables which can affect the cash flow of this program. Then the manager will invest only when she expects the net discounted cash flow to be positive. Often, the

error term In mathematics and statistics, an error term is an additive type of error. Common examples include: * errors and residuals in statistics, e.g. in linear regression * the error term in numerical integration In analysis, numerical integration ...

\varepsilon

is assumed to follow a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

conditional on the explanatory variables ''x''. This generates the standard

.Bliss, C. I. (1934). "The Method of Probits". Science 79 (2037): 38–39.

Probabilistic model

The simplest direct probabilistic model is the

, which models the

log-odds In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations. Mathematically, the logit is the ...

as a linear function of the explanatory variable or variables. The logit model is "simplest" in the sense of

generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...

s (GLIM): the log-odds are the natural parameter for the

exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

of the Bernoulli distribution, and thus it is the simplest to use for computations. Another direct probabilistic model is the

linear probability model In statistics, a linear probability model (LPM) is a special case of a binary regression model. Here the dependent variable for each observation takes values which are either 0 or 1. The probability of observing a 0 or 1 in any one case is treated ...

, which models the probability itself as a linear function of the explanatory variables. A drawback of the linear probability model is that, for some values of the explanatory variables, the model will predict probabilities less than zero or greater than one.

References

* * {{statistics-stub Regression analysis

Applications

Interpretations

Latent variable model

Probabilistic model

See also

References