In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, the multivariate probit model is a generalization of the
probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...
used to estimate several correlated binary outcomes jointly. For example, if it is believed that the decisions of sending at least one child to public school and that of voting in favor of a school budget are correlated (both decisions are binary), then the multivariate probit model would be appropriate for jointly predicting these two choices on an individual-specific basis. J.R. Ashford and R.R. Sowden initially proposed an approach for multivariate probit analysis.
Siddhartha Chib
Siddhartha Chib is an econometrician, statistician, and the Harry C. Hartkopf Professor of Econometrics and Statistics at Washington University in St. Louis. His work is primarily in Bayesian statistics, econometrics, and Markov chain Monte Carl ...
and Edward Greenberg extended this idea and also proposed simulation-based inference methods for the multivariate probit model which simplified and generalized parameter estimation.
Example: bivariate probit
In the ordinary probit model, there is only one binary dependent variable
and so only one
latent variable
In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...
is used. In contrast, in the bivariate probit model there are two binary dependent variables
and
, so there are two latent variables:
and
.
It is assumed that each observed variable takes on the value 1 if and only if its underlying continuous latent variable takes on a positive value:
:
:
with
:
and
:
Fitting the bivariate probit model involves estimating the values of
and
. To do so, the
likelihood of the model has to be maximized. This likelihood is
:
Substituting the latent variables
and
in the probability functions and taking logs gives
:
After some rewriting, the log-likelihood function becomes:
:
Note that
is the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
of the
bivariate normal distribution.
and
in the log-likelihood function are observed variables being equal to one or zero.
Multivariate Probit
For the general case,
where we can take
as choices and
as individuals or observations, the probability of observing choice
is
:
Where
and,
:
The log-likelihood function in this case would be
Except for
typically there is no closed form solution to the integrals in the log-likelihood equation. Instead simulation methods can be used to simulated the choice probabilities. Methods using importance sampling include the
GHK algorithm, AR (accept-reject), Stern's method. There are also MCMC approaches to this problem including CRB (Chib's method with
Rao–Blackwellization), CRT (Chib, Ritter, Tanner), ARK (accept-reject kernel), and ASK (adaptive sampling kernel). A variational approach scaling to large datasets is proposed in Probit-LMM.
The Multivariate Probit Model has been applied to simultaneously analyze consumer choice of multiple brands. It has been demonstrated that the Multivariate Probit model extends research possibilities in the demand area by relaxing the restrictive assumption of mutually exclusive alternatives, which characterizes multinomial discrete choice methods.
References
Further reading
*{{cite book , last=Greene , first=William H. , title=Econometric Analysis , edition=Seventh , publisher=Prentice-Hall , year=2012 , isbn=978-0-13-139538-1 , pages=778–799 , chapter=Bivariate and Multivariate Probit Models
Regression models