A quantile-parameterized distribution (QPD) is a probability distributions that is directly parameterized by data. They were created to meet the need for easy-to-use continuous probability distributions flexible enough to represent a wide range of uncertainties, such as those commonly encountered in business, economics, engineering, and science. Because QPDs are directly parameterized by data, they have the practical advantage of avoiding the intermediate step of

parameter estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value ...

, a time-consuming process that typically requires non-linear iterative methods to estimate probability-distribution parameters from data. Some QPDs have virtually unlimited shape flexibility and closed-form moments as well.

History

The development of quantile-parameterized distributions was inspired by the practical need for flexible continuous probability distributions that are easy to fit to data. Historically, the Pearson and

Johnson Johnson may refer to: People and fictional characters *Johnson (surname), a common surname in English * Johnson (given name), a list of people * List of people with surname Johnson, including fictional characters *Johnson (composer) (1953–2011) ...

families of distributions have been used when shape flexibility is needed. That is because both families can match the first four moments (mean, variance, skewness, and kurtosis) of any data set. In many cases, however, these distributions are either difficult to fit to data or not flexible enough to fit the data appropriately. For example, the

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

or (0, 1) in terms of two positive Statistical parameter, parameters, denoted by ''alpha'' (''α'') an ...

is a flexible Pearson distribution that is frequently used to model percentages of a population. However, if the characteristics of this population are such that the desired

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

(CDF) should run through certain specific CDF points, there may be no beta distribution that meets this need. Because the beta distribution has only two shape parameters, it cannot, in general, match even three specified CDF points. Moreover, the beta parameters that best fit such data can be found only by nonlinear iterative methods. Practitioners of

decision analysis Decision analysis (DA) is the Academic discipline, discipline comprising the philosophy, methodology, and professional practice necessary to address important Decision making, decisions in a formal manner. Decision analysis includes many procedures ...

, needing distributions easily parameterized by three or more CDF points (e.g., because such points were specified as the result of an expert-elicitation process), originally invented quantile-parameterized distributions for this purpose. Keelin and Powley (2011) provided the original definition. Subsequently, Keelin (2016) developed the

metalog distribution The metalog distribution is a flexible continuous probability distribution designed for ease of use in practice. Together with its transforms, the metalog family of continuous distributions is unique because it embodies ''all'' of following proper ...

s, a family of quantile-parameterized distributions that has virtually unlimited shape flexibility, simple equations, and closed-form moments.

Definition

Keelin and Powley define a quantile-parameterized distribution as one whose

quantile function In probability and statistics, the quantile function is a function Q: ,1\mapsto \mathbb which maps some probability x \in ,1/math> of a random variable v to the value of the variable y such that P(v\leq y) = x according to its probability distr ...

(inverse CDF) can be written in the form :

F^ (y)= \left\{
\begin{array}{cl}
L_0 & \text{for } y=0\\
\sum_{i=1}^n a_i g_i(y) & \text{for } 0 where

: \begin{array}{rcl}
L_0 &=& \lim_{y\rarr 0^+} F^{-1}(y) \\
L_1 &=& \lim_{y\rarr 1^-} F^{-1}(y)
\end{array} and the functions g_i(y) are continuously differentiable and linearly independent basis functions.  Here, essentially, L_0 and L_1 are the lower and upper bounds (if they exist) of a random variable with quantile function F^{-1}(y) .  These distributions are called quantile-parameterized because for a given set of quantile pairs \{(x_i, y_i) \mid i=1,\ldots,n\}, where x_i=F^{-1}(y_i), and a set of n basis functions g_i(y), the coefficients a_i can be determined by solving a set of linear equations. If one desires to use more quantile pairs than basis functions, then the coefficients a_i can be chosen to minimize the sum of squared errors between the stated quantiles x_i and F^{-1}(y_i) . Keelin and Powley illustrate this concept for a specific choice of basis functions that is a generalization of quantile function of the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

x=\mu+\sigma \Phi^{-1} (y)

, for which the mean

\mu

and standard deviation

\sigma

are linear functions of cumulative probability

y

: :

\mu(y)=a_1+a_4 y

\sigma(y)=a_2+a_3 y

The result is a four-parameter distribution that can be fit to a set of four quantile/probability pairs exactly, or to any number of such pairs by

linear least squares Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and ...

. Keelin and Powley call this the Simple Q-Normal distribution. Some skewed and symmetric Simple Q-Normal PDFs are shown in the figures below. SymmetricSimpleQNormalFree

Properties

QPD’s that meet Keelin and Powley’s definition have the following properties.

Probability density function

Differentiating

x=F^{-1} (y)=\sum_{i=1}^n a_i g_i (y)

with respect to

y

yields

dx/dy

. The reciprocal of this quantity,

dy/dx

, is the

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

(PDF) :

f(y) = \left( \sum_{i=1}^n
 a_i  \right)^{-1}

where

0 . Note that this PDF is expressed as a function of cumulative probability y rather than x . To plot it, as shown in the figures, vary y\in(0,1) parametrically. Plot x=F^{-1} (y) on the horizontal axis and f(y) on the vertical axis.

Feasibility

A function of the form of

F^{-1} (y)

is a feasible probability distribution if and only if

f(y)>0

for all

y \in (0,1)

. This implies a feasibility constraint on the set of coefficients

\boldsymbol a=(a_1,\ldots,a_n) \in \R^n

: :

\sum_{i=1}^n a_i  >0

for all

y \in (0,1)

In practical applications, feasibility must generally be checked rather than assumed.

Convexity

A QPD’s set of feasible coefficients

S_\boldsymbol a=\{\boldsymbol a\in\R^n \mid \sum_{i=1}^n a_i d g_i (y)/dy > 0

for all

y\in (0,1)\}

convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...

. Because

convex optimization Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems ...

requires convex feasible sets, this property simplifies optimization applications involving QPDs.

Fitting to data

The coefficients

\boldsymbol a

can be determined from data by

. Given

m

data points

(x_i,y_i)

that are intended to characterize the CDF of a QPD, and

m \times n

matrix

\boldsymbol Y

whose elements consist of

g_j (y_i)

, then, so long as

\boldsymbol Y^T \boldsymbol Y

is invertible, coefficients' column vector

\boldsymbol a

can be determined as

\boldsymbol a=(\boldsymbol Y^T \boldsymbol Y)^{-1} \boldsymbol Y^T \boldsymbol x

, where

m\geq n

and column vector

\boldsymbol x=(x_1,\ldots,x_m)

. If

m=n

, this equation reduces to

\boldsymbol a=\boldsymbol Y^{-1} \boldsymbol x

, where the resulting CDF runs through all data points exactly. An alternate method, implemented as a linear program, determines the coefficients by minimizing the sum of absolute distances between the CDF and the data subject to feasibility constraints.

Shape flexibility

A QPD with

n

terms, where

n\ge 2

, has

n-2

shape parameters. Thus, QPDs can be far more flexible than the

Pearson distribution The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics. History The Pearson syste ...

s, which have at most two shape parameters. For example, ten-term

s parameterized by 105 CDF points from 30 traditional source distributions (including normal, student-t, lognormal, gamma, beta, and extreme value) have been shown to approximate each such source distribution within a K–S distance of 0.001 or less.

Transformations

QPD transformations are governed by a general property of quantile functions: for any

x=Q(y)

and increasing function

t(x), x=t^{-1} (Q(y))

is a

. For example, the

of the

x=\mu+\sigma \Phi^{-1} (y)

, is a QPD by the Keelin and Powley definition. The natural logarithm,

t(x)=\ln(x-b_l)

, is an increasing function, so

x=b_l+e^{\mu+\sigma \Phi^{-1} (y)}

is the

of the

lognormal distribution In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normal distribution, normally distributed. Thus, if the random variable is log-normally distributed ...

with lower bound

b_l

. Importantly, this transformation converts an unbounded QPD into a semi-bounded QPD. Similarly, applying this log transformation to the unbounded metalog distribution yields the semi-bounded (log) metalog distribution; likewise, applying the logit transformation,

t(x)=\ln((x-b_l)/(b_u-x))

, yields the bounded (logit) metalog distribution with lower and upper bounds

b_l

and

b_u

, respectively. Moreover, by considering

t(x)

to be

F^{-1} (y)

distributed, where

F^{-1} (y)

is any QPD that meets Keelin and Powley’s definition, the transformed variable maintains the above properties of feasibility, convexity, and fitting to data. Such transformed QPDs have greater shape flexibility than the underlying

F^{-1} (y)

, which has

n-2

shape parameters; the log transformation has

n-1

shape parameters, and the logit transformation has

n

shape parameters. Moreover, such transformed QPDs share the same set of feasible coefficients as the underlying untransformed QPD.

Moments

The

k^{th}

moment of a QPD is: :

= \int_0^1 \left( \sum_{i=1}^n a_i g_i(y) \right)^k dy

Whether such moments exist in closed form depends on the choice of QPD basis functions

g_i (y)

. The unbounded

and polynomial QPDs are examples of QPDs for which moments exist in closed form as functions of the coefficients

a_i

Simulation

Since the quantile function

x=F^{-1}(y)

is expressed in closed form, Keelin and Powley QPDs facilitate

Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be det ...

. Substituting in uniformly distributed random samples of

y

produces random samples of

x

in closed form, thereby eliminating the need to invert a CDF expressed as

y=F(x)

Related distributions

The following probability distributions are QPDs according to Keelin and Powley’s definition: * The quantile function of the

x=\mu+\sigma \Phi^{-1} (y)

. * The quantile function of the

Gumbel distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Thi ...

x=\mu - \beta \ln(-\ln(y))

. * The quantile function of the

Cauchy distribution The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...

x=x_0+\gamma \tan pi(y-0.5) /math>.
* The quantile function of the

logistic distribution In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It rese ...

x=\mu+s \ln(y/(1-y) )

. * The unbounded

, which is a power series expansion of the

\mu

and

s

parameters of the logistic quantile function. * The semi-bounded and bounded metalog distributions, which are the log and logit transforms, respectively, of the unbounded metalog distribution. * The SPT (symmetric-percentile triplet) unbounded, semi-bounded, and bounded metalog distributions, which are parameterized by three CDF points and optional upper and lower bounds. * The Simple Q-Normal distribution * The metadistributions, including the meta-normal * Quantile functions expressed as

polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...

functions of cumulative probability

y

, including

Chebyshev polynomial The Chebyshev polynomials are two sequences of orthogonal polynomials related to the trigonometric functions, cosine and sine functions, notated as T_n(x) and U_n(x). They can be defined in several equivalent ways, one of which starts with tr ...

functions. Like the SPT metalog distributions, the Johnson Quantile-Parameterized Distributions (JQPDs) are parameterized by three quantiles. JQPDs do not meet Keelin and Powley’s QPD definition, but rather have their own properties. JQPDs are feasible for all SPT parameter sets that are consistent with the rules of probability.

Applications

The original applications of QPDs were by decision analysts wishing to conveniently convert expert-assessed quantiles (e.g., 10th, 50th, and 90th quantiles) into smooth continuous probability distributions. QPDs have also been used to fit output data from simulations in order to represent those outputs (both CDFs and PDFs) as closed-form continuous distributions. Used in this way, they are typically more stable and smoother than histograms. Similarly, since QPDs can impose fewer shape constraints than traditional distributions, they have been used to fit a wide range of empirical data in order to represent those data sets as continuous distributions (e.g., reflecting bimodality that may exist in the data in a straightforward manner). Quantile parameterization enables a closed-form QPD representation of known distributions whose CDFs otherwise have no closed-form expression. Keelin et al. (2019) apply this to the sum of independent identically distributed lognormal distributions, where quantiles of the sum can be determined by a large number of simulations. Nine such quantiles are used to parameterize a semi-bounded metalog distribution that runs through each of these nine quantiles exactly. QPDs have also been applied to assess the risks of asteroid impact, cybersecurity, biases in projections of oil-field production when compared to observed production after the fact, and future Canadian population projections based on combining the probabilistic views of multiple experts. See metalog distributions and Keelin (2016) for additional applications of the metalog distribution.

External links

* The Metalog Distributions
www.metalogs.org

References

{{reflist Continuous distributions Systems of probability distributions