In statistics, a tobit model is any of a class of
regression models
Regression or regressions may refer to:
Arts and entertainment
* ''Regression'' (film), a 2015 horror film by Alejandro Amenábar, starring Ethan Hawke and Emma Watson
* ''Regression'' (magazine), an Australian punk rock fanzine (1982–1984)
* ...
in which the observed range of the
dependent variable
A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
is
censored in some way. The term was coined by
Arthur Goldberger in reference to
James Tobin
James Tobin (March 5, 1918 – March 11, 2002) was an American economist who served on the Council of Economic Advisers and consulted with the Board of Governors of the Federal Reserve System, and taught at Harvard University, Harvard and Yale Uni ...
, who developed the model in 1958 to mitigate the problem of
zero-inflated data for observations of household expenditure on
durable goods. Because Tobin's method can be easily extended to handle
truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.
Tobin's idea was to modify the
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
so that it reflects the unequal
sampling probability for each observation depending on whether the
latent dependent variable fell above or below the determined threshold. For a sample that, as in Tobin's original case, was censored from below at zero, the sampling probability for each non-limit observation is simply the height of the appropriate
density function. For any limit observation, it is the cumulative distribution, i.e. the
integral
In mathematics, an integral is the continuous analog of a Summation, sum, which is used to calculate area, areas, volume, volumes, and their generalizations. Integration, the process of computing an integral, is one of the two fundamental oper ...
below zero of the appropriate density function. The tobit likelihood function is thus a mixture of densities and cumulative distribution functions.
The likelihood function
Below are the
likelihood
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...
and log likelihood functions for a type I tobit. This is a tobit that is censored from below at
when the latent variable
. In writing out the likelihood function, we first define an indicator function
:
:
Next, let
be the standard normal
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
and
to be the standard normal
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
. For a data set with ''N'' observations the likelihood function for a type I tobit is
:
and the log likelihood is given by
:
Reparametrization
The log-likelihood as stated above is not globally
concave, which complicates the
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
. Olsen suggested the simple reparametrization
and
, resulting in a transformed log-likelihood,
: