In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and
statistics, the beta distribution is a family of continuous
probability distributions defined on the interval
, 1
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
in terms of two positive
parameters
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as exponents of the random variable and control the
shape
A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type.
A plane shape or plane figure is constrained to lie ...
of the distribution.
The beta distribution has been applied to model the behavior of
random variables limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions.
In
Bayesian inference, the beta distribution is the
conjugate prior probability distribution for the
Bernoulli Bernoulli can refer to:
People
*Bernoulli family of 17th and 18th century Swiss mathematicians:
** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle
**Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...
,
binomial,
negative binomial
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-r ...
and
geometric distributions.
The formulation of the beta distribution discussed here is also known as the beta distribution of the first kind, whereas ''beta distribution of the second kind'' is an alternative name for the
beta prime distribution
In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta distribution of the second kindJohnson et al (1995), p 248) is an absolutely continuous probability distribution.
Definitions
...
. The generalization to multiple variables is called a
Dirichlet distribution
In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \bold ...
.
Definitions
Probability density function
The
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
(PDF) of the beta distribution, for , and shape parameters ''α'', ''β'' > 0, is a
power function
Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
of the variable ''x'' and of its
reflection Reflection or reflexion may refer to:
Science and technology
* Reflection (physics), a common wave phenomenon
** Specular reflection, reflection from a smooth surface
*** Mirror image, a reflection in a mirror or in water
** Signal reflection, in ...
as follows:
:
where Γ(''z'') is the
gamma function
In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...
. The
beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^( ...
,
, is a
normalization constant
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.
...
to ensure that the total probability is 1. In the above equations ''x'' is a
realization—an observed value that actually occurred—of a
random process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
''X''.
This definition includes both ends and , which is consistent with definitions for other
continuous distributions supported on a bounded interval which are special cases of the beta distribution, for example the
arcsine distribution, and consistent with several authors, like
N. L. Johnson and
S. Kotz.
However, the inclusion of and does not work for ; accordingly, several other authors, including
W. Feller,
choose to exclude the ends and , (so that the two ends are not actually part of the domain of the density function) and consider instead .
Several authors, including
N. L. Johnson and
S. Kotz,
use the symbols ''p'' and ''q'' (instead of ''α'' and ''β'') for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...
, because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters ''α'' and ''β'' approach the value of zero.
In the following, a random variable ''X'' beta-distributed with parameters ''α'' and ''β'' will be denoted by:
:
Other notations for beta-distributed random variables used in the statistical literature are
and
.
Cumulative distribution function
The
cumulative distribution function is
:
where
is the
incomplete beta function and
is the
regularized incomplete beta function.
Alternative parameterizations
Two parameters
=Mean and sample size
=
The beta distribution may also be reparameterized in terms of its mean ''μ'' and the sum of the two shape parameters (
p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = ''ν'' = ''α''·Posterior + ''β''·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = ''α''·Posterior + ''β'' Posterior − 2, or ''ν'' = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section
Bayesian inference for further details.) ν = α + β is referred to as the "sample size" of a Beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes theorem.
This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ ''θ'' ≤ 1) is drawn from a population-level Beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters α and β via
[
: ''α'' = ''μν'', ''β'' = (1 − ''μ'')''ν''
Under this parametrization, one may place an ]uninformative prior
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
probability over the mean, and a vague prior probability (such as an exponential or gamma distribution) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it.
=Mode and concentration
=
Concave
Concave or concavity may refer to:
Science and technology
* Concave lens
* Concave mirror
Mathematics
* Concave function, the negative of a convex function
* Concave polygon, a polygon which is not convex
* Concave set
* The concavity of a ...
beta distributions, which have , can be parametrized in terms of mode and "concentration". The mode, , and concentration, , can be used to define the usual shape parameters as follows:
:
For the mode, , to be well-defined, we need , or equivalently . If instead we define the concentration as , the condition simplifies to and the beta density at and can be written as:
:
where directly scales the sufficient statistics, and . Note also that in the limit, , the distribution becomes flat.
=Mean and variance
=
Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters ''α'' and ''β'', one can express the ''α'' and ''β'' parameters in terms of the mean (''μ'') and the variance (var):
:
This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters ''α'' and ''β''. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:
Four parameters
A beta distribution with the two shape parameters α and β is supported on the range ,1or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, ''a'', and maximum ''c'' (''c'' > ''a''), values of the distribution,[ by a linear transformation substituting the non-dimensional variable ''x'' in terms of the new variable ''y'' (with support 'a'',''c''or (''a'',''c'')) and the parameters ''a'' and ''c'':
:
The ]probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (''c''-''a''), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows:
::
That a random variable ''Y'' is Beta-distributed with four parameters α, β, ''a'', and ''c'' will be denoted by:
:
Some measures of central location are scaled (by (''c''-''a'')) and shifted (by ''a''), as follows:
:
Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.
The shape parameters of ''Y'' can be written in term of its mean and variance as
:
The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (c-a), linearly for the mean deviation and nonlinearly for the variance:
::
::
::
Since the skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
For a unimodal ...
and excess kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...
are non-dimensional quantities (as moments centered on the mean and normalized by the standard deviation), they are independent of the parameters ''a'' and ''c'', and therefore equal to the expressions given above in terms of ''X'' (with support ,1or (0,1)):
::
::
Properties
Measures of central tendency
Mode
The mode
Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to:
Arts and entertainment
* '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine
* ''Mode'' magazine, a fictional fashion magazine which is ...
of a Beta distributed random variable ''X'' with ''α'', ''β'' > 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:
:
When both parameters are less than one (''α'', ''β'' < 1), this is the anti-mode: the lowest point of the probability density curve.
Letting ''α'' = ''β'', the expression for the mode simplifies to 1/2, showing that for ''α'' = ''β'' > 1 the mode (resp. anti-mode when ), is at the center of the distribution: it is symmetric in those cases. See Shapes
A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type.
A plane shape or plane figure is constrained to lie o ...
section in this article for a full list of mode cases, for arbitrary values of ''α'' and ''β''. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of ''α'' = 2, ''β'' = 1 (or ''α'' = 1, ''β'' = 2), the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, where the value of the density function approaches infinity. For example, in the case ''α'' = ''β'' = 1/2, the Beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of these cases and whether the ends (''x'' = 0, and ''x'' = 1) can be called ''modes'' or not.
* Whether the ends are part of the domain of the density function
* Whether a singularity can ever be called a ''mode''
* Whether cases with two maxima should be called ''bimodal''
Median
The median of the beta distribution is the unique real number for which the regularized incomplete beta function . There is no general closed-form expression
In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th ro ...
for the median of the beta distribution for arbitrary values of ''α'' and ''β''. Closed-form expression
In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th ro ...
s for particular values of the parameters ''α'' and ''β'' follow:
* For symmetric cases ''α'' = ''β'', median = 1/2.
* For ''α'' = 1 and ''β'' > 0, median (this case is the mirror-image of the power function ,1distribution)
* For ''α'' > 0 and ''β'' = 1, median = (this case is the power function ,1distribution)
* For ''α'' = 3 and ''β'' = 2, median = 0.6142724318676105..., the real solution to the quartic equation
In mathematics, a quartic equation is one which can be expressed as a ''quartic function'' equaling zero. The general form of a quartic equation is
:ax^4+bx^3+cx^2+dx+e=0 \,
where ''a'' ≠ 0.
The quartic is the highest order polynomi ...
1 − 8''x''3 + 6''x''4 = 0, which lies in ,1
* For ''α'' = 2 and ''β'' = 3, median = 0.38572756813238945... = 1−median(Beta(3, 2))
The following are the limits with one parameter finite (non-zero) and the other approaching these limits:
:
A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula[
:
When α, β ≥ 1, the ]relative error
The approximation error in a data value is the discrepancy between an exact value and some '' approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute e ...
(the absolute error divided by the median) in this approximation is less than 4% and for both α ≥ 2 and β ≥ 2 it is less than 1%. The absolute error divided by the difference between the mean and the mode is similarly small:
Mean
The expected value (mean) (''μ'') of a Beta distribution random variable ''X'' with two parameters ''α'' and ''β'' is a function of only the ratio ''β''/''α'' of these parameters:
:
Letting in the above expression one obtains , showing that for the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression:
:
Therefore, for ''β''/''α'' → 0, or for ''α''/''β'' → ∞, the mean is located at the right end, . For these limit ratios, the beta distribution becomes a one-point degenerate distribution
In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...
with a Dirac delta function spike at the right end, , with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, .
Similarly, for ''β''/''α'' → ∞, or for ''α''/''β'' → 0, the mean is located at the left end, . The beta distribution becomes a 1-point Degenerate distribution
In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...
with a Dirac delta function spike at the left end, ''x'' = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, ''x'' = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits:
:
While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(''α'', ''β'') such that ) it is known that the sample mean (as an estimate of location) is not as robust
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta(''α'', ''β'') such that ), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ( p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(''α'', ''β'') such that ) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
s, since the probability for the time of the last visit to the origin in a random walk is distributed as the arcsine distribution Beta(1/2, 1/2):[ the mean of a number of realizations of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).
]
Geometric mean
The logarithm of the geometric mean ''GX'' of a distribution with random variable ''X'' is the arithmetic mean of ln(''X''), or, equivalently, its expected value:
:_than_the_standard_deviation_for_beta_distributions_with_tails_and_inflection_points_at_each_side_of_the_mode,_Beta(''α'', ''β'')_distributions_with_''α'',''β''_>_2,_as_it_depends_on_the_linear_(absolute)_deviations_rather_than_the_square_deviations_from_the_mean.__Therefore,_the_effect_of_very_large_deviations_from_the_mean_are_not_as_overly_weighted.
Using_
_to_the_Gamma_function,_Norman_Lloyd_Johnson, N.L.Johnson_and_Samuel_Kotz, S.Kotz
_derived_the_following_approximation_for_values_of_the_shape_parameters_greater_than_unity_(the_relative_error_for_this_approximation_is_only_−3.5%_for_''α''_=_''β''_=_1,_and_it_decreases_to_zero_as_''α''_→_∞,_''β''_→_∞):
:
At_the_limit_α_→_∞,_β_→_∞,_the_ratio_of_the_mean_absolute_deviation_to_the_standard_deviation_(for_the_beta_distribution)_becomes_equal_to_the_ratio_of_the_same_measures_for_the_normal_distribution:_
,_so_that_from_α_=_β_=_1_to_α,_β_→_∞_the_ratio_decreases_by_8.5%.__For_α_=_β_=_0_the_standard_deviation_is_exactly_equal_to_the_mean_absolute_deviation_around_the_mean._Therefore,_this_ratio_decreases_by_15%_from_α_=_β_=_0_to_α_=_β_=_1,_and_by_25%_from_α_=_β_=_0_to_α,_β_→_∞_._However,_for_skewed_beta_distributions_such_that_α_→_0_or_β_→_0,_the_ratio_of_the_standard_deviation_to_the_mean_absolute_deviation_approaches_infinity_(although_each_of_them,_individually,_approaches_zero)_because_the_mean_absolute_deviation_approaches_zero_faster_than_the_standard_deviation.
Using_the__