Multinomial Probit
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
econometrics Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
, the multinomial probit model is a generalization of the
probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
used when there are several possible categories that the
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
can fall into. As such, it is an alternative to the multinomial logit model as one method of
multiclass classification In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary c ...
. It is not to be confused with the ''multivariate'' probit model, which is used to model correlated binary outcomes for more than one independent variable.


General specification

It is assumed that we have a series of observations ''Y''''i'', for ''i'' = 1...''n'', of the outcomes of multi-way choices from a categorical distribution of size ''m'' (there are ''m'' possible choices). Along with each observation ''Y''''i'' is a set of ''k'' observed values ''x''''1,i'', ..., ''x''''k,i'' of explanatory variables (also known as
independent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s, predictor variables, features, etc.). Some examples: *The observed outcomes might be "has disease A, has disease B, has disease C, has none of the diseases" for a set of rare diseases with similar symptoms, and the explanatory variables might be characteristics of the patients thought to be pertinent (sex, race, age,
blood pressure Blood pressure (BP) is the pressure of circulating blood against the walls of blood vessels. Most of this pressure results from the heart pumping blood through the circulatory system. When used without qualification, the term "blood pressure" r ...
,
body-mass index Body mass index (BMI) is a value derived from the mass (weight) and height of a person. The BMI is defined as the body mass divided by the square of the body height, and is expressed in units of kg/m2, resulting from mass in kilograms and he ...
, presence or absence of various symptoms, etc.). *The observed outcomes are the votes of people for a given party or candidate in a multi-way election, and the explanatory variables are the demographic characteristics of each person (e.g. sex, race, age, income, etc.). The multinomial probit model is a
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
that can be used to predict the likely outcome of an unobserved multi-way trial given the associated explanatory variables. In the process, the model attempts to explain the relative effect of differing explanatory variables on the different outcomes. Formally, the outcomes ''Y''''i'' are described as being categorically-distributed data, where each outcome value ''h'' for observation ''i'' occurs with an unobserved probability ''p''''i,h'' that is specific to the observation ''i'' at hand because it is determined by the values of the explanatory variables associated with that observation. That is: :Y_i, x_,\ldots,x_ \ \sim \operatorname(p_,\ldots,p_),\texti = 1, \dots , n or equivalently :\Pr x_,\ldots,x_= p_,\texti = 1, \dots , n, for each of ''m'' possible values of ''h''.


Latent variable model

Multinomial probit is often written in terms of a latent variable model: : \begin Y_i^ &= \boldsymbol\beta_1 \cdot \mathbf_i + \varepsilon_1 \, \\ Y_i^ &= \boldsymbol\beta_2 \cdot \mathbf_i + \varepsilon_2 \, \\ \ldots & \ldots \\ Y_i^ &= \boldsymbol\beta_m \cdot \mathbf_i + \varepsilon_m \, \\ \end where : \boldsymbol\varepsilon \sim \mathcal(0,\boldsymbol\Sigma) Then : Y_i = \begin 1 & \textY_i^ > Y_i^,\ldots,Y_i^ \\ 2 & \textY_i^ > Y_i^,Y_i^,\ldots,Y_i^ \\ \ldots & \ldots \\ m &\text \end That is, : Y_i = \arg\max_^m Y_i^ Note that this model allows for arbitrary correlation between the
error variable In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is c ...
s, so that it doesn't necessarily respect
independence of irrelevant alternatives The independence of irrelevant alternatives (IIA), also known as binary independence or the independence axiom, is an axiom of decision theory and various social sciences. The term is used in different connotation in several contexts. Although it a ...
. When \scriptstyle\boldsymbol\Sigma is the identity matrix (such that there is no correlation or
heteroscedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
), the model is called independent probit.


Estimation

For details on how the equations are estimated, see the article
Probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
.


References

* {{cite book , last=Greene , first=William H. , authorlink=William Greene (economist) , title=Econometric Analysis , edition=Seventh , location=Boston , publisher=Pearson Education , year=2012 , isbn=978-0-273-75356-8 , pages=810–811 Regression analysis Statistical classification