HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the Mills ratio (or Mills's ratio) of a
continuous random variable In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
X is the function : m(x) := \frac , where f(x) is the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
, and :\bar(x) := \Pr >x= \int_x^ f(u)\, du is the complementary cumulative distribution function (also called
survival function The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time. The survival function is also known as the survivor function or reliability function. The term ...
). The concept is named after
John P. Mills John is a common English name and surname: * John (given name) * John (surname) John may also refer to: New Testament Works * Gospel of John, a title often shortened to John * First Epistle of John, often shortened to 1 John * Second E ...
. The Mills ratio is related to the
hazard rate Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysi ...
''h''(''x'') which is defined as :h(x):=\lim_ \frac\Pr X > x/math> by :m(x) = \frac.


Example

If X has standard normal distribution then :m(x) \sim 1/x , \, where the sign \sim means that the quotient of the two functions converges to 1 as x\to+\infty, see Q-function for details. More precise asymptotics can be given.


Inverse Mills ratio

The inverse Mills ratio is the
ratio In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
of the
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
to the complementary cumulative distribution function of a distribution. Its use is often motivated by the following property of the truncated
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
. If ''X'' is a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
having a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
with mean ''μ'' and variance ''σ''2, then :\begin & \operatorname \ X > \alpha \,= \mu + \sigma \frac , \\ & \operatorname \ X < \alpha \,= \mu - \sigma \frac , \end where \alpha is a constant, \phi denotes the standard normal density function, and \Phi is the standard normal cumulative distribution function. The two fractions are the inverse Mills ratios.


Use in regression

A common application of the inverse Mills ratio (sometimes also called “non-selection hazard”) arises in
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
to take account of a possible
selection bias Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population int ...
. If a dependent variable is
censored Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governments ...
(i.e., not for all observations a positive outcome is observed) it causes a concentration of observations at zero values. This problem was first acknowledged by Tobin (1958), who showed that if this is not taken into consideration in the estimation procedure, an
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
estimation will produce biased parameter estimates. With censored dependent variables there is a violation of the Gauss–Markov assumption of zero
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
between independent variables and the
error term In mathematics and statistics, an error term is an additive type of error. Common examples include: * errors and residuals in statistics, e.g. in linear regression * the error term in numerical integration In analysis, numerical integration ...
.
James Heckman James Joseph Heckman (born April 19, 1944) is a Nobel Prize-winning American economist at the University of Chicago, where he is The Henry Schultz Distinguished Service Professor in Economics and the College; Professor at the Harris School of Pub ...
proposed a two-stage estimation procedure using the inverse Mills ratio to correct for the selection bias. In a first step, a regression for observing a positive outcome of the dependent variable is modeled with a
probit In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in data analysis and machine learning, in particular exploratory statistical graphics and s ...
model. The inverse Mills ratio must be generated from the estimation of a
probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
, a
logit In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations. Mathematically, the logit is the ...
cannot be used. The
probit model In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
assumes that the error term follows a standard normal distribution. The estimated parameters are used to calculate the inverse Mills ratio, which is then included as an additional explanatory variable in the OLS estimation.


See also

*
Heckman correction The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptu ...


References


External links

*{{mathworld, id=MillsRatio, title=Mills Ratio Theory of probability distributions Statistical ratios