
In
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
and
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the probit function is the
quantile function associated with the standard
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
. It has applications in data analysis and machine learning, in particular
exploratory statistical graphics and specialized
regression modeling of binary response variables.
Mathematically, the probit is the
inverse of the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
of the standard normal distribution, which is denoted as
, so the probit is defined as
:
.
Largely because of the
central limit theorem
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
, the standard normal distribution plays a fundamental role in probability theory and statistics. If we consider the familiar fact that the standard normal distribution places 95% of probability between −1.96 and 1.96 and is symmetric around zero, it follows that
:
The probit function gives the 'inverse' computation, generating a value of a standard normal random variable, associated with specified cumulative probability. Continuing the example,
:
.
In general,
:
:and
:
Conceptual development
The idea of the probit function was published by
Chester Ittner Bliss in a 1934 article in ''
Science
Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
'' on how to treat data such as the percentage of a pest killed by a
pesticide
Pesticides are substances that are used to control pests. They include herbicides, insecticides, nematicides, fungicides, and many others (see table). The most common of these are herbicides, which account for approximately 50% of all p ...
. Bliss proposed transforming the percentage killed into a "probability unit" (or "probit") which was linearly related to the modern definition (he defined it arbitrarily as equal to 0 for 0.0001 and 1 for 0.9999):
He included a table to aid other researchers to convert their kill percentages to his probit, which they could then plot against the logarithm of the dose and thereby, it was hoped, obtain a more or less straight line. Such a so-called
probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...
is still important in toxicology, as well as other fields. The approach is justified in particular if response variation can be rationalized as a
lognormal
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normal distribution, normally distributed. Thus, if the random variable is log-normally distributed ...
distribution of tolerances among subjects on test, where the tolerance of a particular subject is the dose just sufficient for the response of interest.
The method introduced by Bliss was carried forward in ''Probit Analysis'', an important text on toxicological applications by
D. J. Finney
David John Finney (3 January 1917 – 12 November 2018), was a British statistician
and Emeritus, Professor Emeritus of Statistics at the University of Edinburgh. He was Director of the Agricultural Research Council's Unit of Statistics from 195 ...
. Values tabled by Finney can be derived from probits as defined here by adding a value of 5. This distinction is summarized by Collett (p. 55): "The original definition of a probit
ith 5 addedwas primarily to avoid having to work with negative probits; ... This definition is still used in some quarters, but in the major statistical software packages for what is referred to as probit analysis, probits are defined without the addition of 5." Probit methodology, including numerical optimization for fitting of probit functions, was introduced before widespread availability of electronic computing. When using tables, it was convenient to have probits uniformly positive. Common areas of application do not require positive probits.
Diagnosing deviation of a distribution from normality
In addition to providing a basis for important types of regression, the probit function is useful in statistical analysis for diagnosing deviation from normality, according to the method of Q–Q plotting. If a set of data is actually a
sample of a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
, a plot of the values against their probit scores will be approximately linear. Specific deviations from normality such as
asymmetry
Asymmetry is the absence of, or a violation of, symmetry (the property of an object being invariant to a transformation, such as reflection). Symmetry is an important property of both physical and abstract systems and it may be displayed in pre ...
,
heavy tails, or
bimodality can be diagnosed based on detection of specific deviations from linearity. While the Q–Q plot can be used for comparison to any distribution family (not only the normal), the normal Q–Q plot is a relatively standard exploratory data analysis procedure because the assumption of normality is often a starting point for analysis.
Computation
The normal distribution CDF and its inverse are not available in
closed form, and computation requires careful use of numerical procedures. However, the functions are widely available in software for statistics and probability modeling, and in spreadsheets. In
Microsoft Excel
Microsoft Excel is a spreadsheet editor developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a ...
, for example, the probit function is available as norm.s.inv(p). In computing environments where numerical implementations of the inverse
error function
In mathematics, the error function (also called the Gauss error function), often denoted by , is a function \mathrm: \mathbb \to \mathbb defined as:
\operatorname z = \frac\int_0^z e^\,\mathrm dt.
The integral here is a complex Contour integrat ...
are available, the probit function may be obtained as
:
An example is
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
, where an 'erfinv' function is available. The language
Mathematica
Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...
implements 'InverseErf'. Other environments directly implement the probit function as is shown in the following session in the
R programming language
R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science.
The core R language is extended by a large number of so ...
.
> qnorm(0.025)
-1.959964
> pnorm(-1.96)
0.02499790
Details for computing the inverse error function can be found a
Wichura gives a fast algorithm for computing the probit function to 16 decimal places; this is used in R to generate random variates for the normal distribution.
An ordinary differential equation for the probit function
Another means of computation is based on forming a non-linear ordinary differential equation (ODE) for probit, as per the Steinbrecher and Shaw method.
Abbreviating the probit function as
, the ODE is
:
where
is the probability density function of .
In the case of the Gaussian:
:
Differentiating again:
:
with the centre (initial) conditions
:
:
This equation may be solved by several methods, including the classical power series approach. From this, solutions of arbitrarily high accuracy may be developed based on Steinbrecher's approach to the series for the inverse error function. The power series solution is given by
:
where the coefficients
satisfy the non-linear recurrence
:
with
. In this form the ratio
as
.
Logit

Closely related to the probit function (and
probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...
) are the
logit
In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in Data transformation (statistics), data transformations.
Ma ...
function and
logit model. The inverse of the logistic function is given by
:
Analogously to the probit model, we may assume that such a quantity is related linearly to a set of predictors, resulting in the
logit model, the basis in particular of
logistic regression
In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
model, the most prevalent form of
regression analysis for categorical response data. In current statistical practice, probit and logit regression models are often handled as cases of the
generalized linear model
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
.
See also
*
Detection error tradeoff graphs (DET graphs, an alternative to the ROC)
*
Logistic regression
In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
(a.k.a. logit model)
*
Logit
In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in Data transformation (statistics), data transformations.
Ma ...
*
Probit model
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to es ...
*
Multinomial probit
*
Q–Q plot
*
Continuous function
In mathematics, a continuous function is a function such that a small variation of the argument induces a small variation of the value of the function. This implies there are no abrupt changes in value, known as '' discontinuities''. More preci ...
*
Monotonic function
In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of or ...
*
Quantile function
*
Sigmoid function
A sigmoid function is any mathematical function whose graph of a function, graph has a characteristic S-shaped or sigmoid curve.
A common example of a sigmoid function is the logistic function, which is defined by the formula
:\sigma(x ...
*
Rankit analysis, also developed by Chester Bliss
*
Ridit scoring
References
{{reflist
External links
Which Link Function — Logit, Probit, or Cloglog? 12.04.2023
Statistical analysis
Single-equation methods (econometrics)
Normal distribution
Articles with example R code
ru:Пробит регрессия