
The metalog distribution is a flexible
continuous probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
designed for ease of use in practice. Together with its transforms, the metalog family of continuous distributions is unique because it embodies ''all'' of following properties: virtually unlimited shape flexibility; a choice among unbounded, semi-bounded, and bounded distributions; ease of fitting to data with linear least squares; simple, closed-form
quantile function
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value e ...
(inverse
CDF) equations that facilitate
simulation
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the ...
; a simple, closed-form
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
; and Bayesian updating in closed form in light of new data. Moreover, like a
Taylor series
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...
, metalog distributions may have any number of terms, depending on the degree of shape flexibility desired and other application needs.
Applications where metalog distributions can be useful typically involve fitting empirical data, simulated data, or
expert-elicited quantiles to smooth, continuous probability distributions. Fields of application are wide-ranging, and include economics, science, engineering, and numerous other fields. The metalog distributions, also known as the Keelin distributions, were first published in 2016
by Tom Keelin.
History
The history of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s can be viewed, in part, as a progression of developments towards greater flexibility in shape and bounds when
fitting to data. The
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
was first published in 1756, and
Bayes’ theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...
in 1763. The normal distribution laid the foundation for much of the development of classical statistics. In contrast, Bayes' theorem laid the foundation for the state-of-information,
belief-based probability representations. Because belief-based probabilities can take on any shape and may have natural bounds, probability distributions flexible enough to accommodate both were needed. Moreover, many empirical and experimental data sets exhibited shapes that could not be well matched by the normal or
other continuous distributions. So began the search for continuous probability distributions with flexible shapes and bounds.
Early in the 20th century, the
Pearson family of distributions, which includes the
normal,
beta
Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, víta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labi ...
,
uniform
A uniform is a variety of clothing worn by members of an organization while participating in that organization's activity. Modern uniforms are most often worn by armed forces and paramilitary organizations such as police, emergency services, se ...
,
gamma,
student-t,
chi-square,
F, and five others, emerged as a major advance in shape flexibility. These were followed by the
Johnson
Johnson is a surname of Anglo-Norman origin meaning "Son of John". It is the second most common in the United States and 154th most common in the world. As a common family name in Scotland, Johnson is occasionally a variation of ''Johnston'', a ...
distributions. Both families can represent the first four moments of data (
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
,
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
,
skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimo ...
, and
kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kur ...
) with smooth continuous curves. However, they have no ability to match fifth or higher-order moments. Moreover, for a given skewness and kurtosis, there is no choice of bounds. For example, matching the first four moments of a data set may yield a distribution with a negative lower bound, even though it might be known that the quantity in question cannot be negative. Finally, their equations include intractable integrals and complex statistical functions, so that fitting to data typically requires iterative methods.
Early in the 21st century,
decision analysts began working to develop continuous probability distributions that would exactly fit any specified three points on the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
for an uncertain quantity (e.g., expert-elicited
, and
quantiles). The Pearson and the Johnson family distributions were generally inadequate for this purpose. In addition, decision analysts also sought probability distributions that would be easy to parameterize with data (e.g., by using
linear least squares, or equivalently, multiple
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
). Introduced in 2011, the class of
quantile-parameterized distribution Quantile-parameterized distributions (QPDs) are probability distributions that are directly parameterized by data. They were motivated by the need for easy-to-use continuous probability distributions flexible enough to represent a wide range of unce ...
s (QPDs) accomplished both goals. While being a significant advance for this reason, the QPD originally used to illustrate this class of distributions, the Simple Q-Normal distribution,
had less shape flexibility than the Pearson and Johnson families, and lacked the ability to represent semi-bounded and bounded distributions. Shortly thereafter, Keelin
developed the family of metalog distributions, another instance of the QPD class, which is more shape-flexible than the Pearson and Johnson families, offers a choice of boundedness, has closed-form equations that can be fit to data with linear least squares, and has closed-form
quantile function
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value e ...
s, which facilitate
Monte Carlo simulation
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determ ...
.
Definition and quantile function
The metalog distribution is a generalization of the
logistic distribution
Logistic may refer to:
Mathematics
* Logistic function, a sigmoid function used in many fields
** Logistic map, a recurrence relation that sometimes exhibits chaos
** Logistic regression, a statistical model using the logistic function
** Logit ...
, where the term "metalog" is short for "metalogistic". Starting with the logistic
quantile function
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value e ...
,
, Keelin substituted power series expansions in cumulative probability
for the
and the
parameters, which control location and scale, respectively.
[Keelin TW (2016). "The Metalog Distributions." Decision Analysis. 13 (4): 243–277.](_blank)
/ref>
:
:
Keelin's rationale for this substitution was fivefold. First, the resulting quantile function would have significant shape flexibility, governed by the coefficients . Second, it would have a simple closed form that is linear in these coefficients, implying that they could easily be determined from CDF data by linear least squares. Third, the resulting quantile function would be smooth, differentiable, and analytic, ensuring that a smooth, closed-form PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
would be available. Fourth, simulation
A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the ...
would be facilitated by the resulting closed-form inverse CDF. Fifth, like a Taylor series
In mathematics, the Taylor series or Taylor expansion of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor se ...
, any number of terms could be used, depending on the degree of shape flexibility desired and other application needs.
Note that the subscripts of the -coefficients are such that and are in the expansion, and are in the expansion, and subscripts alternate thereafter. This ordering was chosen so that the first two terms in the resulting metalog quantile function correspond to the logistic distribution exactly; adding a third term with adjusts skewness; adding a fourth term with adjusts kurtosis primarily; and adding subsequent non-zero terms yields more nuanced shape refinements.
Rewriting the logistic quantile function to incorporate the above substitutions for and yields the metalog quantile function
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value e ...
, for cumulative probability