regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
where the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
can take only two values, for example married or not married. The word is a
portmanteau
A portmanteau word, or portmanteau (, ) is a blend of wordsbinary classification model.
A probit model is a popular specification for a binary response model. As such it treats the same set of problems as does
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
procedure, such an estimation being called a probit regression.
Conceptual framework
Suppose a response variable ''Y'' is ''binary'', that is it can have only two possible outcomes which we will denote as 1 and 0. For example, ''Y'' may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of
regressor
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or dema ...
s ''X'', which are assumed to influence the outcome ''Y''. Specifically, we assume that the model takes the form
:
where ''P'' is the
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
and is the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
of the standard
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
. The parameters ''β'' are typically estimated by
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
.
It is possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable
:
where ''ε'' ~ ''N''(0, 1). Then ''Y'' can be viewed as an indicator for whether this latent variable is positive:
:
The use of the standard normal distribution causes no loss of generality compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount.
To see that the two models are equivalent, note that
:
Model estimation
Maximum likelihood estimation
Suppose data set contains ''n'' independent statistical units corresponding to the model above.
For the single observation, conditional on the vector of inputs of that observation, we have:
:
:
where is a vector of inputs, and is a vector of coefficients.
The likelihood of a single observation is then
:
In fact, if , then , and if , then .
Since the observations are independent and identically distributed, then the likelihood of the entire sample, or the joint likelihood, will be equal to the product of the likelihoods of the single observations:
:
The joint log-likelihood function is thus
:
The estimator which maximizes this function will be
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
, asymptotically normal and efficient provided that E 'XXexists and is not singular. It can be shown that this log-likelihood function is globally concave in ''β'', and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.
Asymptotic distribution for is given by
:
where
:
and is the Probability Density Function ( PDF) of standard normal distribution.
Semi-parametric and non-parametric maximum likelihood methods for probit-type and other related models are also available.
Berkson's minimum chi-square method
This method can be applied only when there are many observations of response variable having the same value of the vector of regressors (such situation may be referred to as "many observations per cell"). More specifically, the model can be formulated as follows.
Suppose among ''n'' observations there are only ''T'' distinct values of the regressors, which can be denoted as . Let be the number of observations with and the number of such observations with . We assume that there are indeed "many" observations per each "cell": for each .
Denote
:
:
Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of on with weights :
:
It can be shown that this estimator is consistent (as ''n''→∞ and ''T'' fixed), asymptotically normal and efficient. Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts , , and (for example in the analysis of voting behavior).
Gibbs sampling
Gibbs sampling of a probit model is possible because regression models typically use normal prior distributions over the weights, and this distribution is conjugate with the normal distribution of the errors (and hence of the latent variables ''Y''*). The model can be described as
:
From this, we can determine the full conditional densities needed:
:
The result for β is given in the article on Bayesian linear regression, although specified with different notation.
The only trickiness is in the last two equations. The notation