The Pearson distribution is a family of
continuous
Continuity or continuous may refer to:
Mathematics
* Continuity (mathematics), the opposing concept to discreteness; common examples include
** Continuous probability distribution or random variable in probability and statistics
** Continuous ...
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s. It was first published by
Karl Pearson
Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university st ...
in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on
biostatistics
Biostatistics (also known as biometry) are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experime ...
.
History
The Pearson system was originally devised in an effort to model visibly
skew
Skew may refer to:
In mathematics
* Skew lines, neither parallel nor intersecting.
* Skew normal distribution, a probability distribution
* Skew field or division ring
* Skew-Hermitian matrix
* Skew lattice
* Skew polygon, whose vertices do not ...
ed observations. It was well known at the time how to adjust a theoretical model to fit the first two
cumulant
In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will ha ...
s or
moment
Moment or Moments may refer to:
* Present time
Music
* The Moments, American R&B vocal group Albums
* ''Moment'' (Dark Tranquillity album), 2020
* ''Moment'' (Speed album), 1998
* ''Moments'' (Darude album)
* ''Moments'' (Christine Guldbrand ...
s of observed data: Any
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
can be extended straightforwardly to form a
location-scale family. Except in
pathological
Pathology is the study of the causal, causes and effects of disease or injury. The word ''pathology'' also refers to the study of disease in general, incorporating a wide range of biology research fields and medical practices. However, when us ...
cases, a location-scale family can be made to fit the observed
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the ''arithme ...
(first cumulant) and
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
(second cumulant) arbitrarily well. However, it was not known how to construct probability distributions in which the
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal d ...
(standardized third cumulant) and
kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
(standardized fourth cumulant) could be adjusted equally freely. This need became apparent when trying to fit known theoretical models to observed data that exhibited skewness. Pearson's examples include survival data, which are usually asymmetric.
In his original paper, Pearson (1895, p. 360) identified four types of distributions (numbered I through IV) in addition to the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
(which was originally known as type V). The classification depended on whether the distributions were
support
Support may refer to:
Arts, entertainment, and media
* Supporting character
Business and finance
* Support (technical analysis)
* Child support
* Customer support
* Income Support
Construction
* Support (structure), or lateral support, a ...
ed on a bounded interval, on a half-line, or on the whole
real line
In elementary mathematics, a number line is a picture of a graduated straight line (geometry), line that serves as visual representation of the real numbers. Every point of a number line is assumed to correspond to a real number, and every real ...
; and whether they were potentially skewed or necessarily symmetric. A second paper (Pearson 1901) fixed two omissions: it redefined the type V distribution (originally just the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
, but now the
inverse-gamma distribution
In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...
) and introduced the type VI distribution. Together the first two papers cover the five main types of the Pearson system (I, III, IV, V, and VI). In a third paper, Pearson (1916) introduced further special cases and subtypes (VII through XII).
Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system, which was subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are characterized by two quantities, commonly referred to as β
1 and β
2. The first is the square of the
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal d ...
:
where γ
1 is the skewness, or third
standardized moment
In probability theory and statistics, a standardized moment of a probability distribution is a moment (often a higher degree central moment) that is normalized, typically by a power of the standard deviation, rendering the moment scale invariant. ...
. The second is the traditional
kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
, or fourth standardized moment: β
2 = γ
2 + 3. (Modern treatments define kurtosis γ
2 in terms of cumulants instead of moments, so that for a normal distribution we have γ
2 = 0 and β
2 = 3. Here we follow the historical precedent and use β
2.) The diagram on the right shows which Pearson type a given concrete distribution (identified by a point (β
1, β
2)) belongs to.
Many of the skewed and/or non-
mesokurtic
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real number, real-valued random variable. Like skew ...
distributions familiar to us today were still unknown in the early 1890s. What is now known as the
beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
had been used by
Thomas Bayes
Thomas Bayes ( ; 1701 7 April 1761) was an English statistician, philosopher and Presbyterian minister who is known for formulating a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would become his ...
as a
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
of the parameter of a
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...
in his 1763 work on
inverse probability
In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.
Today, the problem of determining an unobserved variable (by whatever method) is called inferential statistics, the method o ...
. The Beta distribution gained prominence due to its membership in Pearson's system and was known until the 1940s as the Pearson type I distribution. (Pearson's type II distribution is a special case of type I, but is usually no longer singled out.) The
gamma distribution
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distri ...
originated from Pearson's work (Pearson 1893, p. 331; Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III distribution, before acquiring its modern name in the 1930s and 1940s. Pearson's 1895 paper introduced the type IV distribution, which contains
Student's ''t''-distribution as a special case, predating
William Sealy Gosset
William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics. He pioneered small sa ...
's subsequent use by several years. His 1901 paper introduced the
inverse-gamma distribution
In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to ...
(type V) and the
beta prime distribution
In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta distribution of the second kindJohnson et al (1995), p 248) is an absolutely continuous probability distribution.
Definitions
...
(type VI).
Definition
A Pearson
density
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematical ...
''p'' is defined to be any valid solution to the
differential equation
In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, an ...
(cf. Pearson 1895, p. 381)
:
with:
:
According to Ord, Pearson devised the underlying form of Equation (1) on the basis of, firstly, the formula for the derivative of the logarithm of the density function of the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
(which gives a linear function) and, secondly, from a recurrence relation for values in the
probability mass function
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
of the
hypergeometric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
(which yields the linear-divided-by-quadratic structure).
In Equation (1), the parameter ''a'' determines a
stationary point
In mathematics, particularly in calculus, a stationary point of a differentiable function of one variable is a point on the graph of the function where the function's derivative is zero. Informally, it is a point where the function "stops" inc ...
, and hence under some conditions a
mode
Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to:
Arts and entertainment
* '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine
* ''Mode'' magazine, a fictional fashion magazine which is ...
of the distribution, since
:
follows directly from the differential equation.
Since we are confronted with a
first-order linear differential equation with variable coefficients, its solution is straightforward:
:
The integral in this solution simplifies considerably when certain special cases of the integrand are considered. Pearson (1895, p. 367) distinguished two main cases, determined by the sign of the
discriminant
In mathematics, the discriminant of a polynomial is a quantity that depends on the coefficients and allows deducing some properties of the roots without computing them. More precisely, it is a polynomial function of the coefficients of the origi ...
(and hence the number of real
root
In vascular plants, the roots are the organs of a plant that are modified to provide anchorage for the plant and take in water and nutrients into the plant body, which allows plants to grow taller and faster. They are most often below the sur ...
s) of the
quadratic function
In mathematics, a quadratic polynomial is a polynomial of degree two in one or more variables. A quadratic function is the polynomial function defined by a quadratic polynomial. Before 20th century, the distinction was unclear between a polynomial ...
:
Particular types of distribution
Case 1, negative discriminant
The Pearson type IV distribution
If the discriminant of the quadratic function (2) is negative (
), it has no real roots. Then define
:
Observe that is a well-defined real number and , because by assumption
and therefore . Applying these substitutions, the quadratic function (2) is transformed into
:
The absence of real roots is obvious from this formulation, because α
2 is necessarily positive.
We now express the solution to the differential equation (1) as a function of ''y'':
:
Pearson (1895, p. 362) called this the "trigonometrical case", because the integral
:
involves the
inverse trigonometric
Trigonometry () is a branch of mathematics that studies relationships between side lengths and angles of triangles. The field emerged in the Hellenistic world during the 3rd century BC from applications of geometry to astronomical studies. ...
arctan function. Then
:
Finally, let
:
Applying these substitutions, we obtain the parametric function:
:
This unnormalized density has
support
Support may refer to:
Arts, entertainment, and media
* Supporting character
Business and finance
* Support (technical analysis)
* Child support
* Customer support
* Income Support
Construction
* Support (structure), or lateral support, a ...
on the entire
real line
In elementary mathematics, a number line is a picture of a graduated straight line (geometry), line that serves as visual representation of the real numbers. Every point of a number line is assumed to correspond to a real number, and every real ...
. It depends on a
scale parameter
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.
Definition
If a family o ...
α > 0 and
shape parameter
In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP.
...
s ''m'' > 1/2 and ''ν''. One parameter was lost when we chose to find the solution to the differential equation (1) as a function of ''y'' rather than ''x''. We therefore reintroduce a fourth parameter, namely the
location parameter
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
''λ''. We have thus derived the density of the Pearson type IV distribution:
:
The
normalizing constant
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.
...
involves the
complex
Complex commonly refers to:
* Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe
** Complex system, a system composed of many components which may interact with each ...
Gamma function
In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
(Γ) and the
Beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^(1 ...
(B).
Notice that the
location parameter
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
''λ'' here is not the same as the original location parameter introduced in the general formulation, but is related via
:
The Pearson type VII distribution
The shape parameter ''ν'' of the Pearson type IV distribution controls its
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal d ...
. If we fix its value at zero, we obtain a symmetric three-parameter family. This special case is known as the Pearson type VII distribution (cf. Pearson 1916, p. 450). Its density is
:
where B is the
Beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^(1 ...
.
An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting
:
which requires ''m'' > 3/2. This entails a minor loss of generality but ensures that the
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
of the distribution exists and is equal to σ
2. Now the parameter ''m'' only controls the
kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
of the distribution. If ''m'' approaches infinity as ''λ'' and ''σ'' are held constant, the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
arises as a special case:
:
This is the density of a normal distribution with mean ''λ'' and standard deviation ''σ''.
It is convenient to require that ''m'' > 5/2 and to let
:
This is another specialization, and it guarantees that the first four moments of the distribution exist. More specifically, the Pearson type VII distribution parameterized in terms of (λ, σ, γ
2) has a mean of ''λ'',
standard deviation
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
of ''σ'',
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal d ...
of zero, and positive
excess kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
of γ
2.
Student's ''t''-distribution
The Pearson type VII distribution is equivalent to the non-standardized
Student's ''t''-distribution with parameters ν > 0, μ, σ
2 by applying the following substitutions to its original parameterization:
:
Observe that the constraint is satisfied.
The resulting density is
:
which is easily recognized as the density of a Student's ''t''-distribution.
This implies that the Pearson type VII distribution subsumes the standard
Student's ''t''-distribution and also the standard
Cauchy distribution
The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
. In particular, the standard Student's ''t''-distribution arises as a subcase, when ''μ'' = 0 and ''σ''
2 = 1, equivalent to the following substitutions:
:
The density of this restricted one-parameter family is a standard Student's ''t'':
:
Case 2, non-negative discriminant
If the quadratic function (2) has a non-negative discriminant (
), it has real roots ''a''
1 and ''a''
2 (not necessarily distinct):
:
In the presence of real roots the quadratic function (2) can be written as
:
and the solution to the differential equation is therefore
:
Pearson (1895, p. 362) called this the "logarithmic case", because the integral
:
involves only the
logarithm
In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 o ...
function and not the arctan function as in the previous case.
Using the substitution
:
we obtain the following solution to the differential equation (1):
:
Since this density is only known up to a hidden constant of proportionality, that constant can be changed and the density written as follows:
:
The Pearson type I distribution
The Pearson type I distribution (a generalization of the
beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
) arises when the roots of the quadratic equation (2) are of opposite sign, that is,
. Then the solution ''p'' is supported on the interval
. Apply the substitution
:
where