In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a probit model is a type of
regression where the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
can take only two values, for example married or not married. The word is a
portmanteau
A portmanteau word, or portmanteau (, ) is a blend of words[binary classification
Binary classification is the task of classifying the elements of a set into two groups (each called ''class'') on the basis of a classification rule. Typical binary classification problems include:
* Medical testing to determine if a patient has c ...](_blank)
model.
A
probit
In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and s ...
model is a popular specification for a
binary response model
In statistics, specifically regression analysis, a binary regression estimates a relationship between one or more explanatory variables and a single output binary variable. Generally the probability of the two alternatives is modeled, instead of si ...
. As such it treats the same set of problems as does
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
using similar techniques. When viewed in the
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
framework, the probit model employs a
probit
In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and s ...
link function
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and b ...
. It is most often estimated using the
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
procedure, such an estimation being called a probit regression.
Conceptual framework
Suppose a response variable ''Y'' is ''binary'', that is it can have only
two possible outcomes which we will denote as 1 and 0. For example, ''Y'' may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of
regressor
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s ''X'', which are assumed to influence the outcome ''Y''. Specifically, we assume that the model takes the form
:
where ''P'' is the
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and
is the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
of the standard
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
. The parameters ''β'' are typically estimated by
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
.
It is possible to motivate the probit model as a
latent variable model
A latent variable model is a statistical model that relates a set of observable variables (also called ''manifest variables'' or ''indicators'') to a set of latent variables.
It is assumed that the responses on the indicators or manifest variabl ...
. Suppose there exists an auxiliary random variable
:
where ''ε'' ~ ''N''(0, 1). Then ''Y'' can be viewed as an indicator for whether this latent variable is positive:
:
The use of the standard normal distribution causes no
loss of generality
''Without loss of generality'' (often abbreviated to WOLOG, WLOG or w.l.o.g.; less commonly stated as ''without any loss of generality'' or ''with no loss of generality'') is a frequently used expression in mathematics. The term is used to indicat ...
compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount.
To see that the two models are equivalent, note that
:
Model estimation
Maximum likelihood estimation
Suppose data set
contains ''n'' independent
statistical unit
In statistics, a unit is one member of a set of entities being studied. It is the main source for the mathematical abstraction of a "random variable". Common examples of a unit would be a single person, animal, plant, manufactured item, or country ...
s corresponding to the model above.
For the single observation, conditional on the vector of inputs of that observation, we have:
:
:
where
is a vector of
inputs, and
is a
vector of coefficients.
The likelihood of a single observation
is then
:
In fact, if
, then
, and if
, then
.
Since the observations are independent and identically distributed, then the likelihood of the entire sample, or the
joint likelihood, will be equal to the product of the likelihoods of the single observations:
:
The joint log-likelihood function is thus
:
The estimator
which maximizes this function will be
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
, asymptotically normal and
efficient provided that E
'XXexists and is not singular. It can be shown that this log-likelihood function is globally
concave
Concave or concavity may refer to:
Science and technology
* Concave lens
* Concave mirror
Mathematics
* Concave function, the negative of a convex function
* Concave polygon, a polygon which is not convex
* Concave set
* The concavity
In ca ...
in ''β'', and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.
Asymptotic distribution for
is given by
:
where
:
and
is the Probability Density Function (
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
) of standard normal distribution.
Semi-parametric and non-parametric maximum likelihood methods for probit-type and other related models are also available.
Berkson's minimum chi-square method
This method can be applied only when there are many observations of response variable
having the same value of the vector of regressors
(such situation may be referred to as "many observations per cell"). More specifically, the model can be formulated as follows.
Suppose among ''n'' observations
there are only ''T'' distinct values of the regressors, which can be denoted as
. Let
be the number of observations with
and
the number of such observations with
. We assume that there are indeed "many" observations per each "cell": for each
.
Denote
:
:
Then Berkson's minimum chi-square estimator is a
generalized least squares
In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinar ...
estimator in a regression of
on
with weights
:
:
It can be shown that this estimator is consistent (as ''n''→∞ and ''T'' fixed), asymptotically normal and efficient. Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts
,
, and
(for example in the analysis of voting behavior).
Gibbs sampling
Gibbs sampling
In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is dif ...
of a probit model is possible because regression models typically use normal
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
s over the weights, and this distribution is conjugate with the normal distribution of the errors (and hence of the latent variables ''Y''
*). The model can be described as
:
From this, we can determine the full conditional densities needed:
:
The result for β is given in the article on
Bayesian linear regression
Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients (as well ...
, although specified with different notation.
The only trickiness is in the last two equations. The notation