Beta distribution

HOME

TheInfoList

Click Here for Items Related To - Beta distribution

OR:

Beta distribution on: [Wikipedia] [Google] [Amazon]

probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...

and statistics, the beta distribution is a family of continuous probability distributions defined on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...

in terms of two positive

parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as exponents of the random variable and control the

shape A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie ...

of the distribution. The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions. In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the

Bernoulli Bernoulli can refer to: People *Bernoulli family of 17th and 18th century Swiss mathematicians: ** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle **Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...

, binomial,

negative binomial In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-r ...

and geometric distributions. The formulation of the beta distribution discussed here is also known as the beta distribution of the first kind, whereas ''beta distribution of the second kind'' is an alternative name for the

beta prime distribution In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta distribution of the second kindJohnson et al (1995), p 248) is an absolutely continuous probability distribution. Definitions ...

. The generalization to multiple variables is called a

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \bold ...

Definitions

Probability density function

The

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...

(PDF) of the beta distribution, for , and shape parameters ''α'', ''β'' > 0, is a

power function Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...

of the variable ''x'' and of its

reflection Reflection or reflexion may refer to: Science and technology * Reflection (physics), a common wave phenomenon ** Specular reflection, reflection from a smooth surface *** Mirror image, a reflection in a mirror or in water ** Signal reflection, in ...

as follows: :

& = \frac x^(1-x)^ \end

where Γ(''z'') is the

gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...

. The

beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^( ...

\Beta

, is a

normalization constant The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one. ...

to ensure that the total probability is 1. In the above equations ''x'' is a realization—an observed value that actually occurred—of a

random process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...

''X''. This definition includes both ends and , which is consistent with definitions for other continuous distributions supported on a bounded interval which are special cases of the beta distribution, for example the arcsine distribution, and consistent with several authors, like N. L. Johnson and S. Kotz. However, the inclusion of and does not work for ; accordingly, several other authors, including W. Feller, choose to exclude the ends and , (so that the two ends are not actually part of the domain of the density function) and consider instead . Several authors, including N. L. Johnson and S. Kotz, use the symbols ''p'' and ''q'' (instead of ''α'' and ''β'') for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabi ...

, because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters ''α'' and ''β'' approach the value of zero. In the following, a random variable ''X'' beta-distributed with parameters ''α'' and ''β'' will be denoted by: :

X \sim \operatorname(\alpha, \beta)

Other notations for beta-distributed random variables used in the statistical literature are

X \sim \mathcale(\alpha, \beta)

and

X \sim \beta_

Cumulative distribution function

The cumulative distribution function is :

F(x;\alpha,\beta) = \frac = I_x(\alpha,\beta)

where

\Beta(x;\alpha,\beta)

is the incomplete beta function and

I_x(\alpha,\beta)

is the regularized incomplete beta function.

Alternative parameterizations

Two parameters

=Mean and sample size

= The beta distribution may also be reparameterized in terms of its mean ''μ'' and the sum of the two shape parameters ( p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = ''ν'' = ''α''·Posterior + ''β''·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = ''α''·Posterior + ''β'' Posterior − 2, or ''ν'' = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section Bayesian inference for further details.) ν = α + β is referred to as the "sample size" of a Beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes theorem. This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ ''θ'' ≤ 1) is drawn from a population-level Beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters α and β via : ''α'' = ''μν'', ''β'' = (1 − ''μ'')''ν'' Under this parametrization, one may place an

uninformative prior In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...

probability over the mean, and a vague prior probability (such as an exponential or gamma distribution) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it.

=Mode and concentration

Concave Concave or concavity may refer to: Science and technology * Concave lens * Concave mirror Mathematics * Concave function, the negative of a convex function * Concave polygon, a polygon which is not convex * Concave set * The concavity of a ...

beta distributions, which have

\alpha,\beta>1

, can be parametrized in terms of mode and "concentration". The mode,

\omega=\frac

, and concentration,

\kappa = \alpha + \beta

, can be used to define the usual shape parameters as follows: :

\begin
\alpha &= \omega (\kappa - 2) + 1\\
\beta  &= (1 - \omega)(\kappa - 2) + 1
\end

For the mode,

0<\omega<1

, to be well-defined, we need

\alpha,\beta>1

, or equivalently

\kappa>2

. If instead we define the concentration as

c=\alpha+\beta-2

, the condition simplifies to

c>0

and the beta density at

\alpha=1+c\omega

and

\beta=1+c(1-\omega)

can be written as: :

f(x;\omega,c) = \frac

where

c

directly scales the sufficient statistics,

\log(x)

and

\log(1-x)

. Note also that in the limit,

c\to0

, the distribution becomes flat.

=Mean and variance

= Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters ''α'' and ''β'', one can express the ''α'' and ''β'' parameters in terms of the mean (''μ'') and the variance (var): :

\begin
\nu &= \alpha + \beta = \frac-1, \text\nu =(\alpha + \beta)  >0,\text\text< \mu(1-\mu)\\
\alpha&= \mu \nu =\mu \left(\frac-1\right), \text \text< \mu(1-\mu)\\
\beta &= (1 - \mu) \nu = (1 - \mu)\left(\frac-1\right), \text\text< \mu(1-\mu).
\end

This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters ''α'' and ''β''. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance: Mode Beta Distribution for both alpha and beta greater than 1 - J

Mode Beta Distribution for both alpha and beta greater than 1 - J

Mode Beta Distribution for both alpha and beta greater than 1 - another view - J

Skewness Beta Distribution for mean full range and variance between 0

Skewness Beta Distribution for mean and variance both full range - J

Excess Kurtosis Beta Distribution with mean for full range and variance from 0

Excess Kurtosis Beta Distribution with mean and variance for full range - J

Differential Entropy Beta Distribution with mean from 0

Four parameters

A beta distribution with the two shape parameters α and β is supported on the range ,1or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, ''a'', and maximum ''c'' (''c'' > ''a''), values of the distribution, by a linear transformation substituting the non-dimensional variable ''x'' in terms of the new variable ''y'' (with support 'a'',''c''or (''a'',''c'')) and the parameters ''a'' and ''c'': :

y = x(c-a) + a,  \textx = \frac.

The

of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (''c''-''a''), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows: ::

f(y; \alpha, \beta, a, c) = \frac =\frac=\frac.

That a random variable ''Y'' is Beta-distributed with four parameters α, β, ''a'', and ''c'' will be denoted by: :

Y \sim \operatorname(\alpha, \beta, a, c).

Some measures of central location are scaled (by (''c''-''a'')) and shifted (by ''a''), as follows: :

\begin
\mu_Y &= \mu_X(c-a) + a =  \left(\frac\right)(c-a) + a = \frac \\
\text(Y) &=\text(X)(c-a) + a  = \left(\frac\right)(c-a) + a = \frac\ ,\qquad \text \alpha, \beta>1 \\
\text(Y) &= \text(X)(c-a) + a  = \left (I_^(\alpha,\beta) \right )(c-a)+a \\
\end

Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can. The shape parameters of ''Y'' can be written in term of its mean and variance as :

\begin
\alpha &= \frac \\
\beta &=  -\frac \\
\end

The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (c-a), linearly for the mean deviation and nonlinearly for the variance: ::

\text(Y)=

(\text(X))(c-a) =\frac(c-a)

\text(Y) =\text(X)(c-a)^2 =\frac.

Since the

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...

and

excess kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosi ...

are non-dimensional quantities (as moments centered on the mean and normalized by the standard deviation), they are independent of the parameters ''a'' and ''c'', and therefore equal to the expressions given above in terms of ''X'' (with support ,1or (0,1)): ::

\text(Y) =\text(X) = \frac.

\text(Y) =\text(X)=\frac

Properties

Measures of central tendency

Mode

The

mode Mode ( la, modus meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * '' MO''D''E (magazine)'', a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is ...

of a Beta distributed random variable ''X'' with ''α'', ''β'' > 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression: :

\frac  .

When both parameters are less than one (''α'', ''β'' < 1), this is the anti-mode: the lowest point of the probability density curve. Letting ''α'' = ''β'', the expression for the mode simplifies to 1/2, showing that for ''α'' = ''β'' > 1 the mode (resp. anti-mode when ), is at the center of the distribution: it is symmetric in those cases. See

Shapes A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie o ...

section in this article for a full list of mode cases, for arbitrary values of ''α'' and ''β''. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of ''α'' = 2, ''β'' = 1 (or ''α'' = 1, ''β'' = 2), the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, where the value of the density function approaches infinity. For example, in the case ''α'' = ''β'' = 1/2, the Beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of these cases and whether the ends (''x'' = 0, and ''x'' = 1) can be called ''modes'' or not. Mode Beta Distribution for alpha and beta from 1 to 5 - J

Mode Beta Distribution for alpha and beta from 1 to 5 - J

* Whether the ends are part of the domain of the density function * Whether a singularity can ever be called a ''mode'' * Whether cases with two maxima should be called ''bimodal''

Median

The median of the beta distribution is the unique real number

x = I_^(\alpha,\beta)

for which the regularized incomplete beta function

I_x(\alpha,\beta) = \tfrac

. There is no general

closed-form expression In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th ro ...

for the median of the beta distribution for arbitrary values of ''α'' and ''β''.

Closed-form expression In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations. It may contain constants, variables, certain well-known operations (e.g., + − × ÷), and functions (e.g., ''n''th ro ...

s for particular values of the parameters ''α'' and ''β'' follow: * For symmetric cases ''α'' = ''β'', median = 1/2. * For ''α'' = 1 and ''β'' > 0, median

=1-2^

(this case is the mirror-image of the power function ,1distribution) * For ''α'' > 0 and ''β'' = 1, median =

2^

(this case is the power function ,1distribution) * For ''α'' = 3 and ''β'' = 2, median = 0.6142724318676105..., the real solution to the

quartic equation In mathematics, a quartic equation is one which can be expressed as a ''quartic function'' equaling zero. The general form of a quartic equation is :ax^4+bx^3+cx^2+dx+e=0 \, where ''a'' ≠ 0. The quartic is the highest order polynomi ...

1 − 8''x''³ + 6''x''⁴ = 0, which lies in ,1 * For ''α'' = 2 and ''β'' = 3, median = 0.38572756813238945... = 1−median(Beta(3, 2)) The following are the limits with one parameter finite (non-zero) and the other approaching these limits: :

\begin
\lim_ \text= \lim_ \text = 1,\\
\lim_ \text= \lim_ \text = 0.
\end

A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula :

\text \approx \frac \text \alpha, \beta \ge 1.

When α, β ≥ 1, the

relative error The approximation error in a data value is the discrepancy between an exact value and some '' approximation'' to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute e ...

(the absolute error divided by the median) in this approximation is less than 4% and for both α ≥ 2 and β ≥ 2 it is less than 1%. The absolute error divided by the difference between the mean and the mode is similarly small: Relative Error for Approximation to Median of Beta Distribution for alpha and beta from 1 to 5 - J

Relative Error for Approximation to Median of Beta Distribution for alpha and beta from 1 to 5 - J

Mean

The expected value (mean) (''μ'') of a Beta distribution random variable ''X'' with two parameters ''α'' and ''β'' is a function of only the ratio ''β''/''α'' of these parameters: :

&= \int_0^1 x f(x;\alpha,\beta)\,dx \\ &= \int_0^1 x \,\frac\,dx \\ &= \frac \\ &= \frac \end

Letting in the above expression one obtains , showing that for the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression: :

\begin
\lim_ \mu = 1\\
\lim_ \mu = 0
\end

Therefore, for ''β''/''α'' → 0, or for ''α''/''β'' → ∞, the mean is located at the right end, . For these limit ratios, the beta distribution becomes a one-point

degenerate distribution In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...

with a Dirac delta function spike at the right end, , with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, . Similarly, for ''β''/''α'' → ∞, or for ''α''/''β'' → 0, the mean is located at the left end, . The beta distribution becomes a 1-point

Degenerate distribution In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...

with a Dirac delta function spike at the left end, ''x'' = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, ''x'' = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :

\begin
\lim_ \mu = \lim_ \mu = 1\\
\lim_ \mu = \lim_ \mu = 0
\end

While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(''α'', ''β'') such that ) it is known that the sample mean (as an estimate of location) is not as

robust Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...

as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta(''α'', ''β'') such that ), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ( p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(''α'', ''β'') such that ) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for

random walk In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space. An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...

s, since the probability for the time of the last visit to the origin in a random walk is distributed as the arcsine distribution Beta(1/2, 1/2): the mean of a number of realizations of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).

Geometric mean

The logarithm of the geometric mean ''G_X'' of a distribution with random variable ''X'' is the arithmetic mean of ln(''X''), or, equivalently, its expected value: :

\ln G_X = \operatorname ln X /math>

For a beta distribution, the expected value integral gives:

: \begin
\operatorname ln X &= \int_0^1 \ln x\, f(x;\alpha,\beta)\,dx \\ pt &= \int_0^1 \ln x \,\frac\,dx \\ pt &= \frac \, \int_0^1 \frac\,dx \\ pt &= \frac \frac \int_0^1 x^(1-x)^\,dx \\ pt &= \frac \frac \\ pt &= \frac \\ pt &= \frac - \frac \\ pt &= \psi(\alpha) - \psi(\alpha + \beta)
\end where ''ψ'' is the

digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strict ...

. Therefore, the geometric mean of a beta distribution with shape parameters ''α'' and ''β'' is the exponential of the digamma functions of ''α'' and ''β'' as follows: :

G_X =e^= e^

While for a beta distribution with equal shape parameters α = β, it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: . The reason for this is that the logarithmic transformation strongly weights the values of ''X'' close to zero, as ln(''X'') strongly tends towards negative infinity as ''X'' approaches zero, while ln(''X'') flattens towards zero as . Along a line , the following limits apply: :

\begin
&\lim_ G_X = 0 \\
&\lim_ G_X =\tfrac
\end

Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :

\begin
\lim_ G_X = \lim_ G_X = 1\\
\lim_ G_X = \lim_ G_X = 0
\end

The accompanying plot shows the difference between the mean and the geometric mean for shape parameters α and β from zero to 2. Besides the fact that the difference between them approaches zero as α and β approach infinity and that the difference becomes large for values of α and β approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters α and β. The difference between the geometric mean and the mean is larger for small values of α in relation to β than when exchanging the magnitudes of β and α. N. L.Johnson and S. Kotz suggest the logarithmic approximation to the digamma function ''ψ''(''α'') ≈ ln(''α'' − 1/2) which results in the following approximation to the geometric mean: :

G_X \approx \frac\text \alpha, \beta > 1.

Numerical values for the

in this approximation follow: []; []; []; []; []; []; []; []. Similarly, one can calculate the value of shape parameters required for the geometric mean to equal 1/2. Given the value of the parameter ''β'', what would be the value of the other parameter, ''α'', required for the geometric mean to equal 1/2?. The answer is that (for ), the value of ''α'' required tends towards as . For example, all these couples have the same geometric mean of 1/2: [], [], [], [], [], [], []. The fundamental property of the geometric mean, which can be proven to be false for any other mean, is :

G\left(\frac\right) = \frac

This makes the geometric mean the only correct mean when averaging ''normalized'' results, that is results that are presented as ratios to reference values. This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the geometric mean ''G_X'' based on the random variable X, also another geometric mean appears naturally: the geometric mean based on the linear transformation ––, the mirror-image of ''X'', denoted by ''G''_(1−''X''): :

G_ = e^ = e^

Along a line , the following limits apply: :

\begin
&\lim_ G_ =0 \\
&\lim_ G_ =\tfrac
\end

Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :

\begin
\lim_ G_ = \lim_ G_ = 0\\
\lim_ G_ = \lim_ G_ = 1
\end

It has the following approximate value: :

G_ \approx \frac\text \alpha, \beta > 1.

Although both ''G''_''X'' and ''G''_(1−''X'') are asymmetric, in the case that both shape parameters are equal , the geometric means are equal: ''G''_''X'' = ''G''_(1−''X''). This equality follows from the following symmetry displayed between both geometric means: :

G_X (\Beta(\alpha, \beta) )=G_(\Beta(\beta, \alpha) ).

Harmonic mean

The inverse of the harmonic mean (''H_X'') of a distribution with random variable ''X'' is the arithmetic mean of 1/''X'', or, equivalently, its expected value. Therefore, the harmonic mean (''H_X'') of a beta distribution with shape parameters ''α'' and ''β'' is: :

\begin
H_X &= \frac \\
    &=\frac \\
    &=\frac \\
    &= \frac\text \alpha > 1 \text  \beta > 0 \\
\end

The harmonic mean (''H_X'') of a Beta distribution with ''α'' < 1 is undefined, because its defining expression is not bounded in

for shape parameter ''α'' less than unity. Letting ''α'' = ''β'' in the above expression one obtains :

H_X = \frac,

showing that for ''α'' = ''β'' the harmonic mean ranges from 0, for ''α'' = ''β'' = 1, to 1/2, for ''α'' = ''β'' → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :

\begin
&\lim_ H_X \text \\
&\lim_ H_X = \lim_ H_X  =  0 \\
&\lim_ H_X = \lim_ H_X = 1
\end

The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean ''H_X'' based on the random variable ''X'', also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − ''X''), the mirror-image of ''X'', denoted by ''H''_{1 − ''X''}: :

H_ = \frac = \frac \text \beta > 1, \text \alpha> 0.

The harmonic mean (''H''_{(1 − ''X'')}) of a Beta distribution with ''β'' < 1 is undefined, because its defining expression is not bounded in

for shape parameter ''β'' less than unity. Letting ''α'' = ''β'' in the above expression one obtains :

H_ = \frac,

\begin
&\lim_ H_ \text \\
&\lim_ H_ = \lim_ H_  =  0 \\
&\lim_ H_ = \lim_ H_ = 1
\end

Although both ''H''_''X'' and ''H''_1−''X'' are asymmetric, in the case that both shape parameters are equal ''α'' = ''β'', the harmonic means are equal: ''H''_''X'' = ''H''_1−''X''. This equality follows from the following symmetry displayed between both harmonic means: :

H_X (\Beta(\alpha, \beta) )=H_(\Beta(\beta, \alpha) ) \text \alpha, \beta> 1.

Measures of statistical dispersion

Variance

The

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

(the second moment centered on the mean) of a Beta distribution random variable ''X'' with parameters α and β is: :

= \frac

Letting α = β in the above expression one obtains :

\operatorname(X) = \frac,

showing that for ''α'' = ''β'' the variance decreases monotonically as increases. Setting in this expression, one finds the maximum variance var(''X'') = 1/4 which only occurs approaching the limit, at . The beta distribution may also be parametrized in terms of its mean ''μ'' and sample size () (see subsection Mean and sample size): :

\begin
  \alpha &= \mu \nu, \text\nu =(\alpha + \beta) >0\\
  \beta &= (1 - \mu) \nu, \text\nu =(\alpha + \beta)  >0.
\end

Using this parametrization, one can express the variance in terms of the mean ''μ'' and the sample size ''ν'' as follows: :

\operatorname(X) = \frac

Since , it follows that . For a symmetric distribution, the mean is at the middle of the distribution, , and therefore: :

\operatorname(X) = \frac \text \mu = \tfrac

Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: :

\begin
&\lim_ \operatorname(X) =\lim_ \operatorname(X) =\lim_ \operatorname(X) =\lim_ \operatorname(X) = \lim_ \operatorname(X) =\lim_ \operatorname(X) =\lim_ \operatorname(X) = 0\\
&\lim_ \operatorname(X) = \mu (1-\mu)
\end

Variance for Beta Distribution for alpha and beta ranging from 0 to 5 - J

Geometric variance and covariance

Beta distribution log geometric variances front view - J

Beta distribution log geometric variances back view - J

The logarithm of the geometric variance, ln(var_''GX''), of a distribution with random variable ''X'' is the second moment of the logarithm of ''X'' centered on the geometric mean of ''X'', ln(''G_X''): :

\end

and therefore, the geometric variance is: :

\operatorname_ = e^

In the

Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...

matrix, and the curvature of the log

likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...

, the logarithm of the geometric variance of the reflected variable 1 − ''X'' and the logarithm of the geometric covariance between ''X'' and 1 − ''X'' appear: :

\\ & \\ \operatorname_ &= e^ \end

For a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two Gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section . The

of the logarithmic variables and

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...

of ln ''X'' and ln(1−''X'') are: :

\psi_1(\alpha) - \psi_1(\alpha + \beta)

= \psi_1(\beta) - \psi_1(\alpha + \beta)

= -\psi_1(\alpha+\beta)

where the

trigamma function In mathematics, the trigamma function, denoted or , is the second of the polygamma functions, and is defined by : \psi_1(z) = \frac \ln\Gamma(z). It follows from this definition that : \psi_1(z) = \frac \psi(z) where is the digamma functio ...

, denoted ψ₁(α), is the second of the

polygamma function In mathematics, the polygamma function of order is a meromorphic function on the complex numbers \mathbb defined as the th derivative of the logarithm of the gamma function: :\psi^(z) := \frac \psi(z) = \frac \ln\Gamma(z). Thus :\psi^(z) ...

s, and is defined as the derivative of the

: :

\psi_1(\alpha) = \frac= \frac.

Therefore, :

\psi_1(\alpha) - \psi_1(\alpha + \beta)

= \psi_1(\beta) - \psi_1(\alpha + \beta)

= -\psi_1(\alpha+\beta)

The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters ''α'' and ''β''. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters α and β greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values ''α'' and ''β'' less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for ''α'' and ''β'' less than unity. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: :

\begin
&\lim_ \ln \operatorname_ =  \lim_ \ln \operatorname_  =\infty \\
&\lim_ \ln \operatorname_ = \lim_ \ln \operatorname_ = \lim_ \ln \operatorname_ = \lim_ \ln \operatorname_ = \lim_ \ln \operatorname_ =  \lim_ \ln \operatorname_ = 0\\
&\lim_ \ln \operatorname_ =  \psi_1(\alpha)\\
&\lim_  \ln \operatorname_ =  \psi_1(\beta)\\
&\lim_ \ln \operatorname_ = - \psi_1(\beta)\\
&\lim_  \ln \operatorname_ = - \psi_1(\alpha)
\end

Limits with two parameters varying: :

\begin
&\lim_( \lim_ \ln \operatorname_) = \lim_( \lim_ \ln \operatorname_) = \lim_ (\lim_ \ln \operatorname_) = \lim_( \lim_ \ln \operatorname_) =0\\
&\lim_ (\lim_ \ln \operatorname_) = \lim_ (\lim_ \ln \operatorname_) = \infty\\
&\lim_ (\lim_ \ln \operatorname_) = \lim_ (\lim_ \ln \operatorname_) = - \infty
\end

Although both ln(var_''GX'') and ln(var_{''G''(1 − ''X'')}) are asymmetric, when the shape parameters are equal, α = β, one has: ln(var_''GX'') = ln(var_''G(1−X)''). This equality follows from the following symmetry displayed between both log geometric variances: :

\ln \operatorname_(\Beta(\alpha, \beta))=\ln \operatorname_(\Beta(\beta, \alpha)).

The log geometric covariance is symmetric: :

\ln \operatorname_(\Beta(\alpha, \beta) )=\ln \operatorname_(\Beta(\beta, \alpha))

Mean absolute deviation around the mean

The

mean absolute deviation The average absolute deviation (AAD) of a data set is the average of the Absolute value, absolute Deviation (statistics), deviations from a central tendency, central point. It is a summary statistics, summary statistic of statistical dispersion or ...

around the mean for the beta distribution with shape parameters α and β is: :

]_=_\frac_

The_mean_absolute_deviation_around_the_mean_is_a_more_robust_ Robustness_is_the_property_of_being_strong_and_healthy_in_constitution._When_it_is_transposed_into_a_system,_it_refers_to_the_ability_of_tolerating_perturbations_that_might_affect_the_system’s_functional_body._In_the_same_line_''robustness''_ca_...

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

_of_

statistical_dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile ...

_than_the_standard_deviation_for_beta_distributions_with_tails_and_inflection_points_at_each_side_of_the_mode,_Beta(''α'', ''β'')_distributions_with_''α'',''β''_>_2,_as_it_depends_on_the_linear_(absolute)_deviations_rather_than_the_square_deviations_from_the_mean.__Therefore,_the_effect_of_very_large_deviations_from_the_mean_are_not_as_overly_weighted. Using_

Stirling's_approximation In mathematics, Stirling's approximation (or Stirling's formula) is an approximation for factorials. It is a good approximation, leading to accurate results even for small values of n. It is named after James Stirling, though a related but less p ...

_to_the_Gamma_function,_Norman_Lloyd_Johnson, N.L.Johnson_and_Samuel_Kotz, S.Kotz_derived_the_following_approximation_for_values_of_the_shape_parameters_greater_than_unity_(the_relative_error_for_this_approximation_is_only_−3.5%_for_''α''_=_''β''_=_1,_and_it_decreases_to_zero_as_''α''_→_∞,_''β''_→_∞): :

_\begin
\frac_&=\frac\\
&\approx_\sqrt_\left(1+\frac-\frac-\frac_\right),_\text_\alpha,_\beta_>_1.
\end

At_the_limit_α_→_∞,_β_→_∞,_the_ratio_of_the_mean_absolute_deviation_to_the_standard_deviation_(for_the_beta_distribution)_becomes_equal_to_the_ratio_of_the_same_measures_for_the_normal_distribution:_

\sqrt

.__For_α_=_β_=_1_this_ratio_equals_

\frac

,_so_that_from_α_=_β_=_1_to_α,_β_→_∞_the_ratio_decreases_by_8.5%.__For_α_=_β_=_0_the_standard_deviation_is_exactly_equal_to_the_mean_absolute_deviation_around_the_mean._Therefore,_this_ratio_decreases_by_15%_from_α_=_β_=_0_to_α_=_β_=_1,_and_by_25%_from_α_=_β_=_0_to_α,_β_→_∞_._However,_for_skewed_beta_distributions_such_that_α_→_0_or_β_→_0,_the_ratio_of_the_standard_deviation_to_the_mean_absolute_deviation_approaches_infinity_(although_each_of_them,_individually,_approaches_zero)_because_the_mean_absolute_deviation_approaches_zero_faster_than_the_standard_deviation. Using_the__parametrization_in_terms_of_mean_μ_and_sample_size_ν_=_α_+_β_>_0: :α_=_μν,_β_=_(1−μ)ν one_can_express_the_mean_absolute_deviation_around_the_mean_in_terms_of_the_mean_μ_and_the_sample_size_ν_as_follows: :

]_=_\frac

For_a_symmetric_distribution,_the_mean_is_at_the_middle_of_the_distribution,_μ_=_1/2,_and_therefore: :

]_\right_)_&=_0 \end

Also,_the_following_limits_(with_only_the_noted_variable_approaching_the_limit)_can_be_obtained_from_the_above_expressions: :

]_&=_0 \end

_Mean_absolute_difference

The_mean_absolute_difference_for_the_Beta_distribution_is: :

\mathrm_=_\int_0^1_\int_0^1_f(x;\alpha,\beta)\,f(y;\alpha,\beta)\,, x-y, \,dx\,dy_=_\left(\frac\right)\frac

The_Gini_coefficient_for_the_Beta_distribution_is_half_of_the_relative_mean_absolute_difference: :

\mathrm_=_\left(\frac\right)\frac

_Skewness

The_skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_(the_third_moment_centered_on_the_mean,_normalized_by_the_3/2_power_of_the_variance)_of_the_beta_distribution_is :

\gamma_1_=\frac_=_\frac_.

Letting_α_=_β_in_the_above_expression_one_obtains_γ₁_=_0,_showing_once_again_that_for_α_=_β_the_distribution_is_symmetric_and_hence_the_skewness_is_zero._Positive_skew_(right-tailed)_for_α_<_β,_negative_skew_(left-tailed)_for_α_>_β. Using_the__parametrization_in_terms_of_mean_μ_and_sample_size_ν_=_α_+_β: :

_\begin
__\alpha_&__=_\mu_\nu_,\text\nu_=(\alpha_+_\beta)__>0\\
__\beta_&__=_(1_-_\mu)_\nu_,_\text\nu_=(\alpha_+_\beta)__>0.
\end

one_can_express_the_skewness_in_terms_of_the_mean_μ_and_the_sample_size_ν_as_follows: :

\gamma_1_=\frac_=_\frac.

The_skewness_can_also_be_expressed_just_in_terms_of_the_variance_''var''_and_the_mean_μ_as_follows: :

\gamma_1_=\frac_=_\frac\text_\operatorname_<_\mu(1-\mu)

The_accompanying_plot_of_skewness_as_a_function_of_variance_and_mean_shows_that_maximum_variance_(1/4)_is_coupled_with_zero_skewness_and_the_symmetry_condition_(μ_=_1/2),_and_that_maximum_skewness_(positive_or_negative_infinity)_occurs_when_the_mean_is_located_at_one_end_or_the_other,_so_that_the_"mass"_of_the_probability_distribution_is_concentrated_at_the_ends_(minimum_variance). The_following_expression_for_the_square_of_the_skewness,_in_terms_of_the_sample_size_ν_=_α_+_β_and_the_variance_''var'',_is_useful_for_the_method_of_moments_estimation_of_four_parameters: :

(\gamma_1)^2_=\frac_=_\frac\bigg(\frac-4(1+\nu)\bigg)

This_expression_correctly_gives_a_skewness_of_zero_for_α_=_β,_since_in_that_case_(see_):_

\operatorname_=_\frac

. For_the_symmetric_case_(α_=_β),_skewness_=_0_over_the_whole_range,_and_the_following_limits_apply: :

\lim__\gamma_1_=_\lim__\gamma_1_=\lim__\gamma_1=\lim__\gamma_1=\lim__\gamma_1_=_0

For_the_asymmetric_cases_(α_≠_β)_the_following_limits_(with_only_the_noted_variable_approaching_the_limit)_can_be_obtained_from_the_above_expressions: :

_\begin
&\lim__\gamma_1_=\lim__\gamma_1_=_\infty\\
&\lim__\gamma_1__=_\lim__\gamma_1=_-_\infty\\
&\lim__\gamma_1_=_-\frac,\quad_\lim_(\lim__\gamma_1)_=_-\infty,\quad_\lim_(\lim__\gamma_1)_=_0\\
&\lim__\gamma_1_=_\frac,\quad_\lim_(\lim__\gamma_1)_=_\infty,\quad_\lim_(\lim__\gamma_1)_=_0\\
&\lim__\gamma_1_=_\frac,\quad_\lim_(\lim__\gamma_1)__=_\infty,\quad_\lim_(\lim__\gamma_1)_=_-_\infty
\end

_Kurtosis

The_beta_distribution_has_been_applied_in_acoustic_analysis_to_assess_damage_to_gears,_as_the_kurtosis_of_the_beta_distribution_has_been_reported_to_be_a_good_indicator_of_the_condition_of_a_gear._Kurtosis_has_also_been_used_to_distinguish_the_seismic_signal_generated_by_a_person's_footsteps_from_other_signals._As_persons_or_other_targets_moving_on_the_ground_generate_continuous_signals_in_the_form_of_seismic_waves,_one_can_separate_different_targets_based_on_the_seismic_waves_they_generate._Kurtosis_is_sensitive_to_impulsive_signals,_so_it's_much_more_sensitive_to_the_signal_generated_by_human_footsteps_than_other_signals_generated_by_vehicles,_winds,_noise,_etc.__Unfortunately,_the_notation_for_kurtosis_has_not_been_standardized._Kenney_and_Keeping__use_the_symbol_γ₂_for_the_excess_kurtosis_ In_probability_theory_and_statistics,_kurtosis_(from__el,_κυρτός,_''kyrtos''_or_''kurtos'',_meaning_"curved,_arching")_is_a_measure_of_the_"tailedness"_of_the_probability_distribution_of_a_real-valued_random_variable._Like_skewness,_kurtosi_...
,_but_Abramowitz_and_Stegun__use_different_terminology.__To_prevent_confusion__between_kurtosis_(the_fourth_moment_centered_on_the_mean,_normalized_by_the_square_of_the_variance)_and_excess_kurtosis,_when_using_symbols,_they_will_be_spelled_out_as_follows: :

\begin
\text
_____&=\text_-_3\\
_____&=\frac-3\\
_____&=\frac\\
_____&=\frac
_.
\end

Letting_α_=_β_in_the_above_expression_one_obtains :

\text_=-_\frac_\text\alpha=\beta_

. Therefore,_for_symmetric_beta_distributions,_the_excess_kurtosis_is_negative,_increasing_from_a_minimum_value_of_−2_at_the_limit_as__→_0,_and_approaching_a_maximum_value_of_zero_as__→_∞.__The_value_of_−2_is_the_minimum_value_of_excess_kurtosis_that_any_distribution_(not_just_beta_distributions,_but_any_distribution_of_any_possible_kind)_can_ever_achieve.__This_minimum_value_is_reached_when_all_the_probability_density_is_entirely_concentrated_at_each_end_''x''_=_0_and_''x''_=_1,_with_nothing_in_between:_a_2-point_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
_with_equal_probability_1/2_at_each_end_(a_coin_toss:_see_section_below_"Kurtosis_bounded_by_the_square_of_the_skewness"_for_further_discussion).__The_description_of_kurtosis_as_a_measure_of_the_"potential_outliers"_(or_"potential_rare,_extreme_values")_of_the_probability_distribution,_is_correct_for_all_distributions_including_the_beta_distribution._When_rare,_extreme_values_can_occur_in_the_beta_distribution,_the_higher_its_kurtosis;_otherwise,_the_kurtosis_is_lower._For_α_≠_β,_skewed_beta_distributions,_the_excess_kurtosis_can_reach_unlimited_positive_values_(particularly_for_α_→_0_for_finite_β,_or_for_β_→_0_for_finite_α)_because_the_side_away_from_the_mode_will_produce_occasional_extreme_values.__Minimum_kurtosis_takes_place_when_the_mass_density_is_concentrated_equally_at_each_end_(and_therefore_the_mean_is_at_the_center),_and_there_is_no_probability_mass_density_in_between_the_ends. Using_the__parametrization_in_terms_of_mean_μ_and_sample_size_ν_=_α_+_β: :

_\begin
__\alpha_&__=_\mu_\nu_,\text\nu_=(\alpha_+_\beta)__>0\\
__\beta_&__=_(1_-_\mu)_\nu_,_\text\nu_=(\alpha_+_\beta)__>0.
\end

one_can_express_the_excess_kurtosis_in_terms_of_the_mean_μ_and_the_sample_size_ν_as_follows: :

\text_=\frac\bigg_(\frac_-_1_\bigg_)

The_excess_kurtosis_can_also_be_expressed_in_terms_of_just_the_following_two_parameters:_the_variance_''var'',_and_the_sample_size_ν_as_follows: :

\text_=\frac\left(\frac_-_6_-_5_\nu_\right)\text\text<_\mu(1-\mu)

and,_in_terms_of_the_variance_''var''_and_the_mean_μ_as_follows: :

\text_=\frac\text\text<_\mu(1-\mu)

The_plot_of_excess_kurtosis_as_a_function_of_the_variance_and_the_mean_shows_that_the_minimum_value_of_the_excess_kurtosis_(−2,_which_is_the_minimum_possible_value_for_excess_kurtosis_for_any_distribution)_is_intimately_coupled_with_the_maximum_value_of_variance_(1/4)_and_the_symmetry_condition:_the_mean_occurring_at_the_midpoint_(μ_=_1/2)._This_occurs_for_the_symmetric_case_of_α_=_β_=_0,_with_zero_skewness.__At_the_limit,_this_is_the_2_point_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
_with_equal_probability_1/2_at_each__Dirac_delta_function_end_''x''_=_0_and_''x''_=_1_and_zero_probability_everywhere_else._(A_coin_toss:_one_face_of_the_coin_being_''x''_=_0_and_the_other_face_being_''x''_=_1.)__Variance_is_maximum_because_the_distribution_is_bimodal_with_nothing_in_between_the_two_modes_(spikes)_at_each_end.__Excess_kurtosis_is_minimum:_the_probability_density_"mass"_is_zero_at_the_mean_and_it_is_concentrated_at_the_two_peaks_at_each_end.__Excess_kurtosis_reaches_the_minimum_possible_value_(for_any_distribution)_when_the_probability_density_function_has_two_spikes_at_each_end:_it_is_bi-"peaky"_with_nothing_in_between_them. On_the_other_hand,_the_plot_shows_that_for_extreme_skewed_cases,_where_the_mean_is_located_near_one_or_the_other_end_(μ_=_0_or_μ_=_1),_the_variance_is_close_to_zero,_and_the_excess_kurtosis_rapidly_approaches_infinity_when_the_mean_of_the_distribution_approaches_either_end. Alternatively,_the_excess_kurtosis_can_also_be_expressed_in_terms_of_just_the_following_two_parameters:_the_square_of_the_skewness,_and_the_sample_size_ν_as_follows: :

\text_=\frac\bigg(\frac_(\text)^2_-_1\bigg)\text^2-2<_\text<_\frac_(\text)^2

From_this_last_expression,_one_can_obtain_the_same_limits_published_practically_a_century_ago_by_Karl_Pearson_in_his_paper,_for_the_beta_distribution_(see_section_below_titled_"Kurtosis_bounded_by_the_square_of_the_skewness")._Setting_α_+_β=_ν_=__0_in_the_above_expression,_one_obtains_Pearson's_lower_boundary_(values_for_the_skewness_and_excess_kurtosis_below_the_boundary_(excess_kurtosis_+_2_−_skewness²_=_0)_cannot_occur_for_any_distribution,_and_hence_Karl_Pearson_appropriately_called_the_region_below_this_boundary_the_"impossible_region")._The_limit_of_α_+_β_=_ν_→_∞_determines_Pearson's_upper_boundary. :

_\begin
&\lim_\text__=_(\text)^2_-_2\\
&\lim_\text__=_\tfrac_(\text)^2
\end

therefore: :

(\text)^2-2<_\text<_\tfrac_(\text)^2

Values_of_ν_=_α_+_β_such_that_ν_ranges_from_zero_to_infinity,_0_<_ν_<_∞,_span_the_whole_region_of_the_beta_distribution_in_the_plane_of_excess_kurtosis_versus_squared_skewness. For_the_symmetric_case_(α_=_β),_the_following_limits_apply: :

_\begin
&\lim__\text_=__-_2_\\
&\lim__\text_=_0_\\
&\lim__\text_=_-_\frac
\end

For_the_unsymmetric_cases_(α_≠_β)_the_following_limits_(with_only_the_noted_variable_approaching_the_limit)_can_be_obtained_from_the_above_expressions: :

_\begin
&\lim_\text__=\lim__\text__=_\lim_\text__=_\lim_\text__=\infty\\
&\lim_\text__=_\frac,\text__\lim_(\lim__\text)__=_\infty,\text__\lim_(\lim__\text)__=_0\\
&\lim_\text__=_\frac,\text__\lim_(\lim__\text)__=_\infty,\text__\lim_(\lim__\text)__=_0\\
&\lim__\text__=_-_6_+_\frac,\text__\lim_(\lim__\text)__=_\infty,\text__\lim_(\lim__\text)__=_\infty
\end

_Characteristic_function

The_Characteristic_function_(probability_theory), characteristic_function_is_the_Fourier_transform_of_the_probability_density_function.__The_characteristic_function_of_the_beta_distribution_is_confluent_hypergeometric_function, Kummer's_confluent_hypergeometric_function_(of_the_first_kind): :

\begin
\varphi_X(\alpha;\beta;t)
&=_\operatorname\left[e^\right]\\
&=_\int_0^1_e^_f(x;\alpha,\beta)_dx_\\
&=_1F_1(\alpha;_\alpha+\beta;_it)\!\\
&=\sum_^\infty_\frac__\\
&=_1__+\sum_^_\left(_\prod_^_\frac_\right)_\frac
\end

where :_

x^=x(x+1)(x+2)\cdots(x+n-1)

is_the_rising_factorial,_also_called_the_"Pochhammer_symbol".__The_value_of_the_characteristic_function_for_''t''_=_0,_is_one: :

_\varphi_X(\alpha;\beta;0)=_1F_1(\alpha;_\alpha+\beta;_0)_=_1__

. Also,_the_real_and_imaginary_parts_of_the_characteristic_function_enjoy_the_following_symmetries_with_respect_to_the_origin_of_variable_''t'': :

_\textrm_\left_[__1F_1(\alpha;_\alpha+\beta;_it)_\right_]_=_\textrm_\left_[__1F_1(\alpha;_\alpha+\beta;_-_it)_\right_]__

_\textrm_\left_[__1F_1(\alpha;_\alpha+\beta;_it)_\right_]_=_-_\textrm_\left__[__1F_1(\alpha;_\alpha+\beta;_-_it)_\right_]__

The_symmetric_case_α_=_β_simplifies_the_characteristic_function_of_the_beta_distribution_to_a_Bessel_function,_since_in_the_special_case_α_+_β_=_2α_the_confluent_hypergeometric_function_(of_the_first_kind)_reduces_to_a_Bessel_function_(the_modified_Bessel_function_of_the_first_kind_

I_

_)_using_Ernst_Kummer, Kummer's_second_transformation_as_follows: Another_example_of_the_symmetric_case_α_=_β_=_n/2_for_beamforming_applications_can_be_found_in_Figure_11_of_ :

\begin__1F_1(\alpha;2\alpha;_it)_&=_e^__0F_1_\left(;_\alpha+\tfrac;_\frac_\right)_\\
&=_e^_\left(\frac\right)^_\Gamma\left(\alpha+\tfrac\right)_I_\left(\frac\right).\end

In_the_accompanying_plots,_the_Complex_number, real_part_(Re)_of_the_Characteristic_function_(probability_theory), characteristic_function_of_the_beta_distribution_is_displayed_for_symmetric_(α_=_β)_and_skewed_(α_≠_β)_cases.

_Other_moments

_Moment_generating_function

It_also_follows_that_the_moment_generating_function_is :

\begin
M_X(\alpha;_\beta;_t)
&=_\operatorname\left[e^\right]_\\_pt&=_\int_0^1_e^_f(x;\alpha,\beta)\,dx_\\_pt&=__1F_1(\alpha;_\alpha+\beta;_t)_\\_pt&=_\sum_^\infty_\frac__\frac_\\_pt&=_1__+\sum_^_\left(_\prod_^_\frac_\right)_\frac
\end

In_particular_''M''_''X''(''α'';_''β'';_0)_=_1.

_Higher_moments

Using_the_moment_generating_function,_the_''k''-th_raw_moment_is_given_by_the_factor :

\prod_^_\frac_

multiplying_the_(exponential_series)_term_

\left(\frac\right)

_in_the_series_of_the_moment_generating_function :

\operatorname[X^k]=_\frac_=_\prod_^_\frac

where_(''x'')^(''k'')_is_a_Pochhammer_symbol_representing_rising_factorial._It_can_also_be_written_in_a_recursive_form_as :

\operatorname[X^k]_=_\frac\operatorname[X^].

Since_the_moment_generating_function_

M_X(\alpha;_\beta;_\cdot)

_has_a_positive_radius_of_convergence,_the_beta_distribution_is_Moment_problem, determined_by_its_moments.

_Moments_of_transformed_random_variables

_=Moments_of_linearly_transformed,_product_and_inverted_random_variables

= One_can_also_show_the_following_expectations_for_a_transformed_random_variable,_where_the_random_variable_''X''_is_Beta-distributed_with_parameters_α_and_β:_''X''_~_Beta(α,_β).__The_expected_value_of_the_variable_1 − ''X''_is_the_mirror-symmetry_of_the_expected_value_based_on_''X'': :

\begin
&_\operatorname[1-X]_=_\frac_\\
&_\operatorname[X_(1-X)]_=\operatorname[(1-X)X_]_=\frac
\end

Due_to_the_mirror-symmetry_of_the_probability_density_function_of_the_beta_distribution,_the_variances_based_on_variables_''X''_and_1 − ''X''_are_identical,_and_the_covariance_on_''X''(1 − ''X''_is_the_negative_of_the_variance: :

\operatorname[(1-X)]=\operatorname[X]_=_-\operatorname[X,(1-X)]=_\frac

These_are_the_expected_values_for_inverted_variables,_(these_are_related_to_the_harmonic_means,_see_): :

\begin
&_\operatorname_\left_[\frac_\right_]_=_\frac_\text_\alpha_>_1\\
&_\operatorname\left_[\frac_\right_]_=\frac_\text_\beta_>_1
\end

The_following_transformation_by_dividing_the_variable_''X''_by_its_mirror-image_''X''/(1 − ''X'')_results_in_the_expected_value_of_the_"inverted_beta_distribution"_or_beta_prime_distribution_ In_probability_theory_and__statistics,_the_beta_prime_distribution_(also_known_as_inverted_beta_distribution_or_beta_distribution_of_the_second_kindJohnson_et_al_(1995),_p_248)_is_an_absolutely_continuous_probability_distribution. __Definitions_ _...
_(also_known_as_beta_distribution_of_the_second_kind_or_Pearson_distribution, Pearson's_Type_VI): :

_\begin
&_\operatorname\left[\frac\right]_=\frac_\text\beta_>_1\\
&_\operatorname\left[\frac\right]_=\frac\text\alpha_>_1
\end_

Variances_of_these_transformed_variables_can_be_obtained_by_integration,_as_the_expected_values_of_the_second_moments_centered_on_the_corresponding_variables: :

\operatorname_\left[\frac_\right]_=\operatorname\left[\left(\frac_-_\operatorname\left[\frac_\right_]_\right_)^2\right]=

\operatorname\left_[\frac_\right_]_=\operatorname_\left_[\left_(\frac_-_\operatorname\left_[\frac_\right_]_\right_)^2_\right_]=_\frac_\text\alpha_>_2

The_following_variance_of_the_variable_''X''_divided_by_its_mirror-image_(''X''/(1−''X'')_results_in_the_variance_of_the_"inverted_beta_distribution"_or_beta_prime_distribution_ In_probability_theory_and__statistics,_the_beta_prime_distribution_(also_known_as_inverted_beta_distribution_or_beta_distribution_of_the_second_kindJohnson_et_al_(1995),_p_248)_is_an_absolutely_continuous_probability_distribution. __Definitions_ _...
_(also_known_as_beta_distribution_of_the_second_kind_or_Pearson_distribution, Pearson's_Type_VI): :

\operatorname_\left_[\frac_\right_]_=\operatorname_\left_[\left(\frac_-_\operatorname_\left_[\frac_\right_]_\right)^2_\right_]=\operatorname_\left_[\frac_\right_]_=

\operatorname_\left_[\left_(\frac_-_\operatorname_\left_[\frac_\right_]_\right_)^2_\right_]=_\frac_\text\beta_>_2

The_covariances_are: :

\operatorname\left_[\frac,\frac_\right_]_=_\operatorname\left[\frac,\frac_\right]_=\operatorname\left[\frac,\frac\right_]_=_\operatorname\left[\frac,\frac_\right]_=\frac_\text_\alpha,_\beta_>_1

These_expectations_and_variances_appear_in_the_four-parameter_Fisher_information_matrix_(.)

_=Moments_of_logarithmically_transformed_random_variables

= Expected_values_for_Logarithm_transformation, logarithmic_transformations_(useful_for_maximum_likelihood_estimates,_see_)_are_discussed_in_this_section.__The_following_logarithmic_linear_transformations_are_related_to_the_geometric_means_''G_X''_and__''G''_(1−''X'')_(see_): :

\begin
\operatorname[\ln(X)]_&=_\psi(\alpha)_-_\psi(\alpha_+_\beta)=_-_\operatorname\left[\ln_\left_(\frac_\right_)\right],\\
\operatorname[\ln(1-X)]_&=\psi(\beta)_-_\psi(\alpha_+_\beta)=_-_\operatorname_\left[\ln_\left_(\frac_\right_)\right].
\end

Where_the_digamma_function_ In_mathematics,_the_digamma_function_is_defined_as_the__logarithmic_derivative_of_the_gamma_function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It_is_the_first_of_the__polygamma_functions._It_is_strictly_increasing_and_strict_...
_ψ(α)_is_defined_as_the_logarithmic_derivative_of_the_gamma_function_ In__mathematics,_the_gamma_function_(represented_by_,_the_capital_letter__gamma_from_the_Greek_alphabet)_is_one_commonly_used_extension_of_the__factorial_function_to_complex_numbers._The_gamma_function_is_defined_for_all_complex_numbers_except_...
: :

\psi(\alpha)_=_\frac

Logit_transformations_are_interesting,_as_they_usually_transform_various_shapes_(including_J-shapes)_into_(usually_skewed)_bell-shaped_densities_over_the_logit_variable,_and_they_may_remove_the_end_singularities_over_the_original_variable: :

\begin
\operatorname\left[\ln_\left_(\frac_\right_)_\right]_&=\psi(\alpha)_-_\psi(\beta)=_\operatorname[\ln(X)]_+\operatorname_\left[\ln_\left_(\frac_\right)_\right],\\
\operatorname\left_[\ln_\left_(\frac_\right_)_\right_]_&=\psi(\beta)_-_\psi(\alpha)=_-_\operatorname_\left[\ln_\left_(\frac_\right)_\right]_.
\end

Johnson__considered_the_distribution_of_the_logit_-_transformed_variable_ln(''X''/1−''X''),_including_its_moment_generating_function_and_approximations_for_large_values_of_the_shape_parameters.__This_transformation_extends_the_finite_support_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
based_on_the_original_variable_''X''_to_infinite_support_in_both_directions_of_the_real_line_(−∞,_+∞). Higher_order_logarithmic_moments_can_be_derived_by_using_the_representation_of_a_beta_distribution_as_a_proportion_of_two_Gamma_distributions_and_differentiating_through_the_integral._They_can_be_expressed_in_terms_of_higher_order_poly-gamma_functions_as_follows: :

\begin
\operatorname_\left_[\ln^2(X)_\right_]_&=_(\psi(\alpha)_-_\psi(\alpha_+_\beta))^2+\psi_1(\alpha)-\psi_1(\alpha+\beta),_\\
\operatorname_\left_[\ln^2(1-X)_\right_]_&=_(\psi(\beta)_-_\psi(\alpha_+_\beta))^2+\psi_1(\beta)-\psi_1(\alpha+\beta),_\\
\operatorname_\left_[\ln_(X)\ln(1-X)_\right_]_&=(\psi(\alpha)_-_\psi(\alpha_+_\beta))(\psi(\beta)_-_\psi(\alpha_+_\beta))_-\psi_1(\alpha+\beta).
\end

therefore_the_variance__ In_probability_theory_and_statistics,_variance_is_the__expectation_of_the_squared__deviation_of_a__random_variable_from_its__population_mean_or__sample_mean._Variance_is_a_measure_of_dispersion,_meaning_it_is_a_measure_of_how_far_a_set_of_numbe_...
_of_the_logarithmic_variables_and_covariance_ In__probability_theory_and__statistics,_covariance_is_a_measure_of_the_joint_variability_of_two__random_variables._If_the_greater_values_of_one_variable_mainly_correspond_with_the_greater_values_of_the_other_variable,_and_the_same_holds_for_the__...
_of_ln(''X'')_and_ln(1−''X'')_are: :

\begin
\operatorname[\ln(X),_\ln(1-X)]_&=_\operatorname\left[\ln(X)\ln(1-X)\right]_-_\operatorname[\ln(X)]\operatorname[\ln(1-X)]_=_-\psi_1(\alpha+\beta)_\\
&_\\
\operatorname[\ln_X]_&=_\operatorname[\ln^2(X)]_-_(\operatorname[\ln(X)])^2_\\
&=_\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta)_\\
&=_\psi_1(\alpha)_+_\operatorname[\ln(X),_\ln(1-X)]_\\
&_\\
\operatorname_ln_(1-X)&=_\operatorname[\ln^2_(1-X)]_-_(\operatorname[\ln_(1-X)])^2_\\
&=_\psi_1(\beta)_-_\psi_1(\alpha_+_\beta)_\\
&=_\psi_1(\beta)_+_\operatorname[\ln_(X),_\ln(1-X)]
\end

where_the_trigamma_function_ In_mathematics,_the_trigamma_function,_denoted__or_,_is_the_second_of_the_polygamma_functions,_and_is_defined_by :_\psi_1(z)_=_\frac_\ln\Gamma(z). It_follows_from_this_definition_that :_\psi_1(z)_=_\frac_\psi(z) where__is_the_digamma_functio_...
,_denoted_ψ₁(α),_is_the_second_of_the_polygamma_function_ In_mathematics,_the_polygamma_function_of_order__is_a_meromorphic_function_on_the__complex_numbers_\mathbb_defined_as_the_th__derivative_of_the_logarithm_of_the_gamma_function: :\psi^(z)_:=_\frac_\psi(z)_=_\frac_\ln\Gamma(z). Thus :\psi^(z)__...
s,_and_is_defined_as_the_derivative_of_the_digamma_function: :

\psi_1(\alpha)_=_\frac=_\frac

. The_variances_and_covariance_of_the_logarithmically_transformed_variables_''X''_and_(1−''X'')_are_different,_in_general,_because_the_logarithmic_transformation_destroys_the_mirror-symmetry_of_the_original_variables_''X''_and_(1−''X''),_as_the_logarithm_approaches_negative_infinity_for_the_variable_approaching_zero. These_logarithmic_variances_and_covariance_are_the_elements_of_the_Fisher_information_ In_mathematical_statistics,_the_Fisher_information_(sometimes_simply_called_information)_is_a_way_of_measuring_the_amount_of_information_that_an_observable_random_variable_''X''_carries_about_an_unknown_parameter_''θ''_of_a_distribution_that_model_...
_matrix_for_the_beta_distribution.__They_are_also_a_measure_of_the_curvature_of_the_log_likelihood_function_(see_section_on_Maximum_likelihood_estimation). The_variances_of_the_log_inverse_variables_are_identical_to_the_variances_of_the_log_variables: :

\begin
\operatorname\left[\ln_\left_(\frac_\right_)_\right]_&_=\operatorname[\ln(X)]_=_\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta),_\\
\operatorname\left[\ln_\left_(\frac_\right_)_\right]_&=\operatorname_ln_(1-X)=_\psi_1(\beta)_-_\psi_1(\alpha_+_\beta),_\\
\operatorname\left[\ln_\left_(\frac_\right),_\ln_\left_(\frac\right_)_\right]_&=\operatorname[\ln(X),\ln(1-X)]=_-\psi_1(\alpha_+_\beta).\end

It_also_follows_that_the_variances_of_the_logit_transformed_variables_are: :

\operatorname\left[\ln_\left_(\frac_\right_)\right]=\operatorname\left[\ln_\left_(\frac_\right_)_\right]=-\operatorname\left_[\ln_\left_(\frac_\right_),_\ln_\left_(\frac_\right_)_\right]=_\psi_1(\alpha)_+_\psi_1(\beta)

_Quantities_of_information_(entropy)

Given_a_beta_distributed_random_variable,_''X''_~_Beta(''α'',_''β''),_the_information_entropy, differential_entropy_of_''X''_is_(measured_in_Nat_(unit), nats),_the_expected_value_of_the_negative_of_the_logarithm_of_the_probability_density_function_ In_probability_theory,_a_probability_density_function_(PDF),_or_density_of_a_continuous_random_variable,_is_a__function_whose_value_at_any_given_sample_(or_point)_in_the__sample_space_(the_set_of_possible_values_taken_by_the_random_variable)_ca_...
: :

\begin
h(X)_&=_\operatorname[-\ln(f(x;\alpha,\beta))]_\\_pt&=\int_0^1_-f(x;\alpha,\beta)\ln(f(x;\alpha,\beta))_\,_dx_\\_pt&=_\ln(\Beta(\alpha,\beta))-(\alpha-1)\psi(\alpha)-(\beta-1)\psi(\beta)+(\alpha+\beta-2)_\psi(\alpha+\beta)
\end

where_''f''(''x'';_''α'',_''β'')_is_the_probability_density_function_ In_probability_theory,_a_probability_density_function_(PDF),_or_density_of_a_continuous_random_variable,_is_a__function_whose_value_at_any_given_sample_(or_point)_in_the__sample_space_(the_set_of_possible_values_taken_by_the_random_variable)_ca_...
_of_the_beta_distribution: :

f(x;\alpha,\beta)_=_\frac_x^(1-x)^

The_digamma_function_ In_mathematics,_the_digamma_function_is_defined_as_the__logarithmic_derivative_of_the_gamma_function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It_is_the_first_of_the__polygamma_functions._It_is_strictly_increasing_and_strict_...
_''ψ''_appears_in_the_formula_for_the_differential_entropy_as_a_consequence_of_Euler's_integral_formula_for_the_harmonic_numbers_which_follows_from_the_integral: :

\int_0^1_\frac__\,_dx_=_\psi(\alpha)-\psi(1)

The_information_entropy, differential_entropy_of_the_beta_distribution_is_negative_for_all_values_of_''α''_and_''β''_greater_than_zero,_except_at_''α''_=_''β''_=_1_(for_which_values_the_beta_distribution_is_the_same_as_the_Uniform_distribution_(continuous), uniform_distribution),_where_the_information_entropy, differential_entropy_reaches_its_Maxima_and_minima, maximum_value_of_zero.__It_is_to_be_expected_that_the_maximum_entropy_should_take_place_when_the_beta_distribution_becomes_equal_to_the_uniform_distribution,_since_uncertainty_is_maximal_when_all_possible_events_are_equiprobable. For_''α''_or_''β''_approaching_zero,_the_information_entropy, differential_entropy_approaches_its_Maxima_and_minima, minimum_value_of_negative_infinity._For_(either_or_both)_''α''_or_''β''_approaching_zero,_there_is_a_maximum_amount_of_order:_all_the_probability_density_is_concentrated_at_the_ends,_and_there_is_zero_probability_density_at_points_located_between_the_ends._Similarly_for_(either_or_both)_''α''_or_''β''_approaching_infinity,_the_differential_entropy_approaches_its_minimum_value_of_negative_infinity,_and_a_maximum_amount_of_order.__If_either_''α''_or_''β''_approaches_infinity_(and_the_other_is_finite)_all_the_probability_density_is_concentrated_at_an_end,_and_the_probability_density_is_zero_everywhere_else.__If_both_shape_parameters_are_equal_(the_symmetric_case),_''α''_=_''β'',_and_they_approach_infinity_simultaneously,_the_probability_density_becomes_a_spike_(_Dirac_delta_function)_concentrated_at_the_middle_''x''_=_1/2,_and_hence_there_is_100%_probability_at_the_middle_''x''_=_1/2_and_zero_probability_everywhere_else.

The_(continuous_case)_information_entropy, differential_entropy_was_introduced_by_Shannon_in_his_original_paper_(where_he_named_it_the_"entropy_of_a_continuous_distribution"),_as_the_concluding_part_of_the_same_paper_where_he_defined_the_information_entropy, discrete_entropy.__It_is_known_since_then_that_the_differential_entropy_may_differ_from_the_infinitesimal_limit_of_the_discrete_entropy_by_an_infinite_offset,_therefore_the_differential_entropy_can_be_negative_(as_it_is_for_the_beta_distribution)._What_really_matters_is_the_relative_value_of_entropy. Given_two_beta_distributed_random_variables,_''X''₁_~_Beta(''α'',_''β'')_and_''X''₂_~_Beta(''α''′,_''β''′),_the_cross_entropy_is_(measured_in_nats) :

\begin
H(X_1,X_2)_&=_\int_0^1_-_f(x;\alpha,\beta)_\ln_(f(x;\alpha',\beta'))_\,dx_\\_pt&=_\ln_\left(\Beta(\alpha',\beta')\right)-(\alpha'-1)\psi(\alpha)-(\beta'-1)\psi(\beta)+(\alpha'+\beta'-2)\psi(\alpha+\beta).
\end

The_cross_entropy_has_been_used_as_an_error_metric_to_measure_the_distance_between_two_hypotheses.__Its_absolute_value_is_minimum_when_the_two_distributions_are_identical._It_is_the_information_measure_most_closely_related_to_the_log_maximum_likelihood_(see_section_on_"Parameter_estimation._Maximum_likelihood_estimation")). The_relative_entropy,_or_Kullback–Leibler_divergence_''D''_KL(''X''₁_, , _''X''₂),_is_a_measure_of_the_inefficiency_of_assuming_that_the_distribution_is_''X''₂_~_Beta(''α''′,_''β''′)__when_the_distribution_is_really_''X''₁_~_Beta(''α'',_''β'')._It_is_defined_as_follows_(measured_in_nats). :

\begin
D_(X_1, , X_2)_&=_\int_0^1_f(x;\alpha,\beta)_\ln_\left_(\frac_\right_)_\,_dx_\\_pt&=_\left_(\int_0^1_f(x;\alpha,\beta)_\ln_(f(x;\alpha,\beta))_\,dx_\right_)-_\left_(\int_0^1_f(x;\alpha,\beta)_\ln_(f(x;\alpha',\beta'))_\,_dx_\right_)\\_pt&=_-h(X_1)_+_H(X_1,X_2)\\_pt&=_\ln\left(\frac\right)+(\alpha-\alpha')\psi(\alpha)+(\beta-\beta')\psi(\beta)+(\alpha'-\alpha+\beta'-\beta)\psi_(\alpha_+_\beta).
\end_

The_relative_entropy,_or_Kullback–Leibler_divergence,_is_always_non-negative.__A_few_numerical_examples_follow: *''X''₁_~_Beta(1,_1)_and_''X''₂_~_Beta(3,_3);_''D''_KL(''X''₁_, , _''X''₂)_=_0.598803;_''D''_KL(''X''₂_, , _''X''₁)_=_0.267864;_''h''(''X''₁)_=_0;_''h''(''X''₂)_=_−0.267864 *''X''₁_~_Beta(3,_0.5)_and_''X''₂_~_Beta(0.5,_3);_''D''_KL(''X''₁_, , _''X''₂)_=_7.21574;_''D''_KL(''X''₂_, , _''X''₁)_=_7.21574;_''h''(''X''₁)_=_−1.10805;_''h''(''X''₂)_=_−1.10805. The_Kullback–Leibler_divergence_is_not_symmetric_''D''_KL(''X''₁_, , _''X''₂)_≠_''D''_KL(''X''₂_, , _''X''₁)__for_the_case_in_which_the_individual_beta_distributions_Beta(1,_1)_and_Beta(3,_3)_are_symmetric,_but_have_different_entropies_''h''(''X''₁)_≠_''h''(''X''₂)._The_value_of_the_Kullback_divergence_depends_on_the_direction_traveled:_whether_going_from_a_higher_(differential)_entropy_to_a_lower_(differential)_entropy_or_the_other_way_around._In_the_numerical_example_above,_the_Kullback_divergence_measures_the_inefficiency_of_assuming_that_the_distribution_is_(bell-shaped)_Beta(3,_3),_rather_than_(uniform)_Beta(1,_1)._The_"h"_entropy_of_Beta(1,_1)_is_higher_than_the_"h"_entropy_of_Beta(3,_3)_because_the_uniform_distribution_Beta(1,_1)_has_a_maximum_amount_of_disorder._The_Kullback_divergence_is_more_than_two_times_higher_(0.598803_instead_of_0.267864)_when_measured_in_the_direction_of_decreasing_entropy:_the_direction_that_assumes_that_the_(uniform)_Beta(1,_1)_distribution_is_(bell-shaped)_Beta(3,_3)_rather_than_the_other_way_around._In_this_restricted_sense,_the_Kullback_divergence_is_consistent_with_the_second_law_of_thermodynamics. The_Kullback–Leibler_divergence_is_symmetric_''D''_KL(''X''₁_, , _''X''₂)_=_''D''_KL(''X''₂_, , _''X''₁)_for_the_skewed_cases_Beta(3,_0.5)_and_Beta(0.5,_3)_that_have_equal_differential_entropy_''h''(''X''₁)_=_''h''(''X''₂). The_symmetry_condition: :

D_(X_1, , X_2)_=_D_(X_2, , X_1),\texth(X_1)_=_h(X_2),\text\alpha_\neq_\beta

follows_from_the_above_definitions_and_the_mirror-symmetry_''f''(''x'';_''α'',_''β'')_=_''f''(1−''x'';_''α'',_''β'')_enjoyed_by_the_beta_distribution.

_Relationships_between_statistical_measures

_Mean,_mode_and_median_relationship

If_1_<_α_<_β_then_mode_≤_median_≤_mean.Kerman_J_(2011)_"A_closed-form_approximation_for_the_median_of_the_beta_distribution".__Expressing_the_mode_(only_for_α,_β_>_1),_and_the_mean_in_terms_of_α_and_β: :_

_\frac_\le_\text_\le_\frac_,

If_1_<_β_<_α_then_the_order_of_the_inequalities_are_reversed._For_α,_β_>_1_the_absolute_distance_between_the_mean_and_the_median_is_less_than_5%_of_the_distance_between_the_maximum_and_minimum_values_of_''x''._On_the_other_hand,_the_absolute_distance_between_the_mean_and_the_mode_can_reach_50%_of_the_distance_between_the_maximum_and_minimum_values_of_''x'',_for_the_(Pathological_(mathematics), pathological)_case_of_α_=_1_and_β_=_1,_for_which_values_the_beta_distribution_approaches_the_uniform_distribution_and_the_information_entropy, differential_entropy_approaches_its_Maxima_and_minima, maximum_value,_and_hence_maximum_"disorder". For_example,_for_α_=_1.0001_and_β_=_1.00000001: *_mode___=_0.9999;___PDF(mode)_=_1.00010 *_mean___=_0.500025;_PDF(mean)_=_1.00003 *_median_=_0.500035;_PDF(median)_=_1.00003 *_mean_−_mode___=_−0.499875 *_mean_−_median_=_−9.65538_×_10⁻⁶ where_PDF_stands_for_the_value_of_the_probability_density_function_ In_probability_theory,_a_probability_density_function_(PDF),_or_density_of_a_continuous_random_variable,_is_a__function_whose_value_at_any_given_sample_(or_point)_in_the__sample_space_(the_set_of_possible_values_taken_by_the_random_variable)_ca_...
.

_Mean,_geometric_mean_and_harmonic_mean_relationship

It_is_known_from_the_inequality_of_arithmetic_and_geometric_means_that_the_geometric_mean_is_lower_than_the_mean.__Similarly,_the_harmonic_mean_is_lower_than_the_geometric_mean.__The_accompanying_plot_shows_that_for_α_=_β,_both_the_mean_and_the_median_are_exactly_equal_to_1/2,_regardless_of_the_value_of_α_=_β,_and_the_mode_is_also_equal_to_1/2_for_α_=_β_>_1,_however_the_geometric_and_harmonic_means_are_lower_than_1/2_and_they_only_approach_this_value_asymptotically_as_α_=_β_→_∞.

_Kurtosis_bounded_by_the_square_of_the_skewness

As_remarked_by_William_Feller, Feller,_in_the_Pearson_distribution, Pearson_system_the_beta_probability_density_appears_as_Pearson_distribution, type_I_(any_difference_between_the_beta_distribution_and_Pearson's_type_I_distribution_is_only_superficial_and_it_makes_no_difference_for_the_following_discussion_regarding_the_relationship_between_kurtosis_and_skewness)._Karl_Pearson_showed,_in_Plate_1_of_his_paper___published_in_1916,__a_graph_with_the_kurtosis_as_the_vertical_axis_(ordinate)_and_the_square_of_the_skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_as_the_horizontal_axis_(abscissa),_in_which_a_number_of_distributions_were_displayed.__The_region_occupied_by_the_beta_distribution_is_bounded_by_the_following_two_Line_(geometry), lines_in_the_(skewness²,kurtosis)_Cartesian_coordinate_system, plane,_or_the_(skewness²,excess_kurtosis)_Cartesian_coordinate_system, plane: :

(\text)^2+1<_\text<_\frac_(\text)^2_+_3

or,_equivalently, :

(\text)^2-2<_\text<_\frac_(\text)^2

At_a_time_when_there_were_no_powerful_digital_computers,_Karl_Pearson_accurately_computed_further_boundaries,_for_example,_separating_the_"U-shaped"_from_the_"J-shaped"_distributions._The_lower_boundary_line_(excess_kurtosis_+_2_−_skewness²_=_0)_is_produced_by_skewed_"U-shaped"_beta_distributions_with_both_values_of_shape_parameters_α_and_β_close_to_zero.__The_upper_boundary_line_(excess_kurtosis_−_(3/2)_skewness²_=_0)_is_produced_by_extremely_skewed_distributions_with_very_large_values_of_one_of_the_parameters_and_very_small_values_of_the_other_parameter.__Karl_Pearson_showed_that_this_upper_boundary_line_(excess_kurtosis_−_(3/2)_skewness²_=_0)_is_also_the_intersection_with_Pearson's_distribution_III,_which_has_unlimited_support_in_one_direction_(towards_positive_infinity),_and_can_be_bell-shaped_or_J-shaped._His_son,_Egon_Pearson,_showed_that_the_region_(in_the_kurtosis/squared-skewness_plane)_occupied_by_the_beta_distribution_(equivalently,_Pearson's_distribution_I)_as_it_approaches_this_boundary_(excess_kurtosis_−_(3/2)_skewness²_=_0)_is_shared_with_the_noncentral_chi-squared_distribution.__Karl_Pearson_(Pearson_1895,_pp. 357,_360,_373–376)_also_showed_that_the_gamma_distribution_is_a_Pearson_type_III_distribution._Hence_this_boundary_line_for_Pearson's_type_III_distribution_is_known_as_the_gamma_line._(This_can_be_shown_from_the_fact_that_the_excess_kurtosis_of_the_gamma_distribution_is_6/''k''_and_the_square_of_the_skewness_is_4/''k'',_hence_(excess_kurtosis_−_(3/2)_skewness²_=_0)_is_identically_satisfied_by_the_gamma_distribution_regardless_of_the_value_of_the_parameter_"k")._Pearson_later_noted_that_the_chi-squared_distribution_is_a_special_case_of_Pearson's_type_III_and_also_shares_this_boundary_line_(as_it_is_apparent_from_the_fact_that_for_the_chi-squared_distribution_the_excess_kurtosis_is_12/''k''_and_the_square_of_the_skewness_is_8/''k'',_hence_(excess_kurtosis_−_(3/2)_skewness²_=_0)_is_identically_satisfied_regardless_of_the_value_of_the_parameter_"k")._This_is_to_be_expected,_since_the_chi-squared_distribution_''X''_~_χ²(''k'')_is_a_special_case_of_the_gamma_distribution,_with_parametrization_X_~_Γ(k/2,_1/2)_where_k_is_a_positive_integer_that_specifies_the_"number_of_degrees_of_freedom"_of_the_chi-squared_distribution. An_example_of_a_beta_distribution_near_the_upper_boundary_(excess_kurtosis_−_(3/2)_skewness²_=_0)_is_given_by_α_=_0.1,_β_=_1000,_for_which_the_ratio_(excess_kurtosis)/(skewness²)_=_1.49835_approaches_the_upper_limit_of_1.5_from_below._An_example_of_a_beta_distribution_near_the_lower_boundary_(excess_kurtosis_+_2_−_skewness²_=_0)_is_given_by_α=_0.0001,_β_=_0.1,_for_which_values_the_expression_(excess_kurtosis_+_2)/(skewness²)_=_1.01621_approaches_the_lower_limit_of_1_from_above._In_the_infinitesimal_limit_for_both_α_and_β_approaching_zero_symmetrically,_the_excess_kurtosis_reaches_its_minimum_value_at_−2.__This_minimum_value_occurs_at_the_point_at_which_the_lower_boundary_line_intersects_the_vertical_axis_(ordinate)._(However,_in_Pearson's_original_chart,_the_ordinate_is_kurtosis,_instead_of_excess_kurtosis,_and_it_increases_downwards_rather_than_upwards). Values_for_the_skewness_and_excess_kurtosis_below_the_lower_boundary_(excess_kurtosis_+_2_−_skewness²_=_0)_cannot_occur_for_any_distribution,_and_hence_Karl_Pearson_appropriately_called_the_region_below_this_boundary_the_"impossible_region"._The_boundary_for_this_"impossible_region"_is_determined_by_(symmetric_or_skewed)_bimodal_"U"-shaped_distributions_for_which_the_parameters_α_and_β_approach_zero_and_hence_all_the_probability_density_is_concentrated_at_the_ends:_''x''_=_0,_1_with_practically_nothing_in_between_them._Since_for_α_≈_β_≈_0_the_probability_density_is_concentrated_at_the_two_ends_''x''_=_0_and_''x''_=_1,_this_"impossible_boundary"_is_determined_by_a_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
,_where_the_two_only_possible_outcomes_occur_with_respective_probabilities_''p''_and_''q''_=_1−''p''._For_cases_approaching_this_limit_boundary_with_symmetry_α_=_β,_skewness_≈_0,_excess_kurtosis_≈_−2_(this_is_the_lowest_excess_kurtosis_possible_for_any_distribution),_and_the_probabilities_are_''p''_≈_''q''_≈_1/2.__For_cases_approaching_this_limit_boundary_with_skewness,_excess_kurtosis_≈_−2_+_skewness²,_and_the_probability_density_is_concentrated_more_at_one_end_than_the_other_end_(with_practically_nothing_in_between),_with_probabilities_

p_=_\tfrac

_at_the_left_end_''x''_=_0_and_

q_=_1-p_=_\tfrac

_at_the_right_end_''x''_=_1.

_Symmetry

All_statements_are_conditional_on_α,_β_>_0 *_Probability_density_function_Symmetry, reflection_symmetry ::

f(x;\alpha,\beta)_=_f(1-x;\beta,\alpha)

*_Cumulative_distribution_function_Symmetry, reflection_symmetry_plus_unitary_Symmetry, translation ::

F(x;\alpha,\beta)_=_I_x(\alpha,\beta)_=_1-_F(1-_x;\beta,\alpha)_=_1_-_I_(\beta,\alpha)

*_Mode_Symmetry, reflection_symmetry_plus_unitary_Symmetry, translation ::

\operatorname(\Beta(\alpha,_\beta))=_1-\operatorname(\Beta(\beta,_\alpha)),\text\Beta(\beta,_\alpha)\ne_\Beta(1,1)

*_Median_Symmetry, reflection_symmetry_plus_unitary_Symmetry, translation ::

\operatorname_(\Beta(\alpha,_\beta)_)=_1_-_\operatorname_(\Beta(\beta,_\alpha))

*_Mean_Symmetry, reflection_symmetry_plus_unitary_Symmetry, translation ::

\mu_(\Beta(\alpha,_\beta)_)=_1_-_\mu_(\Beta(\beta,_\alpha)_)

*_Geometric_Means_each_is_individually_asymmetric,_the_following_symmetry_applies_between_the_geometric_mean_based_on_''X''_and_the_geometric_mean_based_on_its_reflection_Reflection_or_reflexion_may_refer_to: _Science_and_technology *_Reflection_(physics),_a_common_wave_phenomenon **_Specular_reflection,_reflection_from_a_smooth_surface ***_Mirror_image,_a_reflection_in_a_mirror_or_in_water **__Signal_reflection,_in__...
_(1-X) ::

G_X_(\Beta(\alpha,_\beta)_)=G_(\Beta(\beta,_\alpha)_)_

*_Harmonic_means_each_is_individually_asymmetric,_the_following_symmetry_applies_between_the_harmonic_mean_based_on_''X''_and_the_harmonic_mean_based_on_its_reflection_Reflection_or_reflexion_may_refer_to: _Science_and_technology *_Reflection_(physics),_a_common_wave_phenomenon **_Specular_reflection,_reflection_from_a_smooth_surface ***_Mirror_image,_a_reflection_in_a_mirror_or_in_water **__Signal_reflection,_in__...
_(1-X) ::

H_X_(\Beta(\alpha,_\beta)_)=H_(\Beta(\beta,_\alpha)_)_\text_\alpha,_\beta_>_1_

_. *_Variance_symmetry ::

\operatorname_(\Beta(\alpha,_\beta)_)=\operatorname_(\Beta(\beta,_\alpha)_)

*_Geometric_variances_each_is_individually_asymmetric,_the_following_symmetry_applies_between_the_log_geometric_variance_based_on_X_and_the_log_geometric_variance_based_on_its_reflection_Reflection_or_reflexion_may_refer_to: _Science_and_technology *_Reflection_(physics),_a_common_wave_phenomenon **_Specular_reflection,_reflection_from_a_smooth_surface ***_Mirror_image,_a_reflection_in_a_mirror_or_in_water **__Signal_reflection,_in__...
_(1-X) ::

\ln(\operatorname_(\Beta(\alpha,_\beta)))_=_\ln(\operatorname(\Beta(\beta,_\alpha)))_

*_Geometric_covariance_symmetry ::

\ln_\operatorname(\Beta(\alpha,_\beta))=\ln_\operatorname(\Beta(\beta,_\alpha))

*_Mean_absolute_deviation_around_the_mean_symmetry ::

]_(\Beta(\beta,_\alpha))

*_Skewness_Symmetry_(mathematics), skew-symmetry ::

\operatorname_(\Beta(\alpha,_\beta)_)=_-_\operatorname_(\Beta(\beta,_\alpha)_)

*_Excess_kurtosis_symmetry ::

\text_(\Beta(\alpha,_\beta)_)=_\text_(\Beta(\beta,_\alpha)_)

*_Characteristic_function_symmetry_of_Real_part_(with_respect_to_the_origin_of_variable_"t") ::

_\text_[_1F_1(\alpha;_\alpha+\beta;_it)_]_=_\text_[__1F_1(\alpha;_\alpha+\beta;_-_it)]__

*_Characteristic_function_Symmetry_(mathematics), skew-symmetry_of_Imaginary_part_(with_respect_to_the_origin_of_variable_"t") ::

_\text_[_1F_1(\alpha;_\alpha+\beta;_it)_]_=_-_\text_[__1F_1(\alpha;_\alpha+\beta;_-_it)_]__

*_Characteristic_function_symmetry_of_Absolute_value_(with_respect_to_the_origin_of_variable_"t") ::

_\text_[__1F_1(\alpha;_\alpha+\beta;_it)_]_=_\text_[__1F_1(\alpha;_\alpha+\beta;_-_it)_]__

*_Differential_entropy_symmetry ::

h(\Beta(\alpha,_\beta)_)=_h(\Beta(\beta,_\alpha)_)

*_Relative_Entropy_(also_called_Kullback–Leibler_divergence)_symmetry ::

D_(X_1, , X_2)_=_D_(X_2, , X_1),_\texth(X_1)_=_h(X_2)\text\alpha_\neq_\beta

*_Fisher_information_matrix_symmetry ::

__=__

_Geometry_of_the_probability_density_function

_Inflection_points

For_certain_values_of_the_shape_parameters_α_and_β,_the_probability_density_function_ In_probability_theory,_a_probability_density_function_(PDF),_or_density_of_a_continuous_random_variable,_is_a__function_whose_value_at_any_given_sample_(or_point)_in_the__sample_space_(the_set_of_possible_values_taken_by_the_random_variable)_ca_...
_has_inflection_points,_at_which_the_curvature_changes_sign.__The_position_of_these_inflection_points_can_be_useful_as_a_measure_of_the_Statistical_dispersion, dispersion_or_spread_of_the_distribution. Defining_the_following_quantity: :

\kappa_=\frac

Points_of_inflection_occur,_depending_on_the_value_of_the_shape_parameters_α_and_β,_as_follows: *(α_>_2,_β_>_2)_The_distribution_is_bell-shaped_(symmetric_for_α_=_β_and_skewed_otherwise),_with_two_inflection_points,_equidistant_from_the_mode: ::

x_=_\text_\pm_\kappa_=_\frac

*_(α_=_2,_β_>_2)_The_distribution_is_unimodal,_positively_skewed,_right-tailed,_with_one_inflection_point,_located_to_the_right_of_the_mode: ::

x_=\text_+_\kappa_=_\frac

*_(α_>_2,_β_=_2)_The_distribution_is_unimodal,_negatively_skewed,_left-tailed,_with_one_inflection_point,_located_to_the_left_of_the_mode: ::

x_=_\text_-_\kappa_=_1_-_\frac

*_(1_<_α_<_2,_β_>_2,_α+β>2)_The_distribution_is_unimodal,_positively_skewed,_right-tailed,_with_one_inflection_point,_located_to_the_right_of_the_mode: ::

x_=\text_+_\kappa_=_\frac

*(0_<_α_<_1,_1_<_β_<_2)_The_distribution_has_a_mode_at_the_left_end_''x''_=_0_and_it_is_positively_skewed,_right-tailed._There_is_one_inflection_point,_located_to_the_right_of_the_mode: ::

x_=_\frac

*(α_>_2,_1_<_β_<_2)_The_distribution_is_unimodal_negatively_skewed,_left-tailed,_with_one_inflection_point,_located_to_the_left_of_the_mode: ::

x_=\text_-_\kappa_=_\frac

*(1_<_α_<_2,__0_<_β_<_1)_The_distribution_has_a_mode_at_the_right_end_''x''=1_and_it_is_negatively_skewed,_left-tailed._There_is_one_inflection_point,_located_to_the_left_of_the_mode: ::

x_=_\frac

There_are_no_inflection_points_in_the_remaining_(symmetric_and_skewed)_regions:_U-shaped:_(α,_β_<_1)_upside-down-U-shaped:_(1_<_α_<_2,_1_<_β_<_2),_reverse-J-shaped_(α_<_1,_β_>_2)_or_J-shaped:_(α_>_2,_β_<_1) The_accompanying_plots_show_the_inflection_point_locations_(shown_vertically,_ranging_from_0_to_1)_versus_α_and_β_(the_horizontal_axes_ranging_from_0_to_5)._There_are_large_cuts_at_surfaces_intersecting_the_lines_α_=_1,_β_=_1,_α_=_2,_and_β_=_2_because_at_these_values_the_beta_distribution_change_from_2_modes,_to_1_mode_to_no_mode.

_Shapes

The_beta_density_function_can_take_a_wide_variety_of_different_shapes_depending_on_the_values_of_the_two_parameters_''α''_and_''β''.__The_ability_of_the_beta_distribution_to_take_this_great_diversity_of_shapes_(using_only_two_parameters)_is_partly_responsible_for_finding_wide_application_for_modeling_actual_measurements:

_=Symmetric_(''α''_=_''β'')

= *_the_density_function_is_symmetry, symmetric_about_1/2_(blue_&_teal_plots). *_median_=_mean_=_1/2. *skewness__=_0. *variance_=_1/(4(2α_+_1)) *α_=_β_<_1 **U-shaped_(blue_plot). **bimodal:_left_mode_=_0,__right_mode_=1,_anti-mode_=_1/2 **1/12_<_var(''X'')_<_1/4 **−2_<_excess_kurtosis(''X'')_<_−6/5 **_α_=_β_=_1/2_is_the__arcsine_distribution ***_var(''X'')_=_1/8 ***excess_kurtosis(''X'')_=_−3/2 ***CF_=_Rinc_(t)_ **_α_=_β_→_0_is_a_2-point_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
_with_equal_probability_1/2_at_each__Dirac_delta_function_end_''x''_=_0_and_''x''_=_1_and_zero_probability_everywhere_else._A_coin_toss:_one_face_of_the_coin_being_''x''_=_0_and_the_other_face_being_''x''_=_1. ***_

_\lim__\operatorname(X)_=_\tfrac_

***_

_\lim__\operatorname(X)_=_-_2

__a_lower_value_than_this_is_impossible_for_any_distribution_to_reach. ***_The_information_entropy, differential_entropy_approaches_a_Maxima_and_minima, minimum_value_of_−∞ *α_=_β_=_1 **the_uniform_distribution_(continuous), uniform_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
distribution **no_mode **var(''X'')_=_1/12 **excess_kurtosis(''X'')_=_−6/5 **The_(negative_anywhere_else)_information_entropy, differential_entropy_reaches_its_Maxima_and_minima, maximum_value_of_zero **CF_=_Sinc_(t) *''α''_=_''β''_>_1 **symmetric_unimodal **_mode_=_1/2. **0_<_var(''X'')_<_1/12 **−6/5_<_excess_kurtosis(''X'')_<_0 **''α''_=_''β''_=_3/2_is_a_semi-elliptic_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
distribution,_see:_Wigner_semicircle_distribution ***var(''X'')_=_1/16. ***excess_kurtosis(''X'')_=_−1 ***CF_=_2_Jinc_(t) **''α''_=_''β''_=_2_is_the_parabolic_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
distribution ***var(''X'')_=_1/20 ***excess_kurtosis(''X'')_=_−6/7 ***CF_=_3_Tinc_(t)_ **''α''_=_''β''_>_2_is_bell-shaped,_with_inflection_points_located_to_either_side_of_the_mode ***0_<_var(''X'')_<_1/20 ***−6/7_<_excess_kurtosis(''X'')_<_0 **''α''_=_''β''_→_∞_is_a_1-point_Degenerate_distribution_ In_mathematics,_a_degenerate_distribution_is,_according_to_some,_a_probability_distribution_in_a_space_with_support_only_on_a_manifold_of_lower_dimension,_and_according_to_others_a_distribution_with_support_only_at_a_single_point._By_the_latter_d_...
_with_a__Dirac_delta_function_spike_at_the_midpoint_''x''_=_1/2_with_probability_1,_and_zero_probability_everywhere_else._There_is_100%_probability_(absolute_certainty)_concentrated_at_the_single_point_''x''_=_1/2. ***

_\lim__\operatorname(X)_=_0_

***

_\lim__\operatorname(X)_=_0

***The_information_entropy, differential_entropy_approaches_a_Maxima_and_minima, minimum_value_of_−∞

_=Skewed_(''α''_≠_''β'')

= The_density_function_is_Skewness, skewed.__An_interchange_of_parameter_values_yields_the_mirror_image_(the_reverse)_of_the_initial_curve,_some_more_specific_cases: *''α''_<_1,_''β''_<_1 **_U-shaped **_Positive_skew_for_α_<_β,_negative_skew_for_α_>_β. **_bimodal:_left_mode_=_0,_right_mode_=_1,__anti-mode_=_

\tfrac_

**_0_<_median_<_1. **_0_<_var(''X'')_<_1/4 *α_>_1,_β_>_1 **_unimodal_(magenta_&_cyan_plots), **Positive_skew_for_α_<_β,_negative_skew_for_α_>_β. **

\text=_\tfrac_

**_0_<_median_<_1 **_0_<_var(''X'')_<_1/12 *α_<_1,_β_≥_1 **reverse_J-shaped_with_a_right_tail, **positively_skewed, **strictly_decreasing,_convex_function, convex **_mode_=_0 **_0_<_median_<_1/2. **_

0_<_\operatorname(X)_<_\tfrac,_

_(maximum_variance_occurs_for_

\alpha=\tfrac,_\beta=1

,_or_α_=_Φ_the_Golden_ratio, golden_ratio_conjugate) *α_≥_1,_β_<_1 **J-shaped_with_a_left_tail, **negatively_skewed, **strictly_increasing,_convex_function, convex **_mode_=_1 **_1/2_<_median_<_1 **_

0_<_\operatorname(X)_<_\tfrac,

_(maximum_variance_occurs_for_

\alpha=1,_\beta=\tfrac

,_or_β_=_Φ_the_Golden_ratio, golden_ratio_conjugate) *α_=_1,_β_>_1 **positively_skewed, **strictly_decreasing_(red_plot), **a_reversed_(mirror-image)_power_function__,1distribution **_mean_=_1_/_(β_+_1) **_median_=_1_-_1/2^1/β **_mode_=_0 **α_=_1,_1_<_β_<_2 ***concave_function, concave ***_

1-\tfrac<_\text_<_\tfrac

***_1/18_<_var(''X'')_<_1/12. **α_=_1,_β_=_2 ***a_straight_line_with_slope_−2,_the_right-triangular_distribution_with_right_angle_at_the_left_end,_at_''x''_=_0 ***_

\text=1-\tfrac_

***_var(''X'')_=_1/18 **α_=_1,_β_>_2 ***reverse_J-shaped_with_a_right_tail, ***convex_function, convex ***_

0_<_\text_<_1-\tfrac

***_0_<_var(''X'')_<_1/18 *α_>_1,_β_=_1 **negatively_skewed, **strictly_increasing_(green_plot), **the_power_function_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
distribution **_mean_=_α_/_(α_+_1) **_median_=_1/2^1/α_ **_mode_=_1 **2_>_α_>_1,_β_=_1 ***concave_function, concave ***_

\tfrac_<_\text_<_\tfrac

***_1/18_<_var(''X'')_<_1/12 **_α_=_2,_β_=_1 ***a_straight_line_with_slope_+2,_the_right-triangular_distribution_with_right_angle_at_the_right_end,_at_''x''_=_1 ***_

\text=\tfrac_

***_var(''X'')_=_1/18 **α_>_2,_β_=_1 ***J-shaped_with_a_left_tail,_convex_function, convex ***

\tfrac_<_\text_<_1

***_0_<_var(''X'')_<_1/18

_Related_distributions

_Transformations

*_If_''X''_~_Beta(''α'',_''β'')_then_1_−_''X''_~_Beta(''β'',_''α'')_Mirror_image, mirror-image_symmetry *_If_''X''_~_Beta(''α'',_''β'')_then_

\tfrac_\sim_(\alpha,\beta)

._The_beta_prime_distribution_ In_probability_theory_and__statistics,_the_beta_prime_distribution_(also_known_as_inverted_beta_distribution_or_beta_distribution_of_the_second_kindJohnson_et_al_(1995),_p_248)_is_an_absolutely_continuous_probability_distribution. __Definitions_ _...
,_also_called_"beta_distribution_of_the_second_kind". *_If_''X''_~_Beta(''α'',_''β'')_then_

\tfrac_-1_\sim_(\beta,\alpha)

._ *_If_''X''_~_Beta(''n''/2,_''m''/2)_then_

\tfrac_\sim_F(n,m)

_(assuming_''n''_>_0_and_''m''_>_0),_the_F-distribution, Fisher–Snedecor_F_distribution. *_If_

X_\sim_\operatorname\left(1+\lambda\tfrac,_1_+_\lambda\tfrac\right)

_then_min_+_''X''(max_−_min)_~_PERT(min,_max,_''m'',_''λ'')_where_''PERT''_denotes_a_PERT_distribution_used_in_PERT_analysis,_and_''m''=most_likely_value.Herrerías-Velasco,_José_Manuel_and_Herrerías-Pleguezuelo,_Rafael_and_René_van_Dorp,_Johan._(2011)._Revisiting_the_PERT_mean_and_Variance._European_Journal_of_Operational_Research_(210),_p._448–451._Traditionally_''λ''_=_4_in_PERT_analysis. *_If_''X''_~_Beta(1,_''β'')_then_''X''_~_Kumaraswamy_distribution_with_parameters_(1,_''β'') *_If_''X''_~_Beta(''α'',_1)_then_''X''_~_Kumaraswamy_distribution_with_parameters_(''α'',_1) *_If_''X''_~_Beta(''α'',_1)_then_−ln(''X'')_~_Exponential(''α'')

_Special_and_limiting_cases

*_Beta(1,_1)_~_uniform_distribution_(continuous), U(0,_1). *_Beta(n,_1)_~_Maximum_of_''n''_independent_rvs._with_uniform_distribution_(continuous), U(0,_1),_sometimes_called_a_''a_standard_power_function_distribution''_with_density_''n'' ''x''^''n''-1_on_that_interval. *_Beta(1,_n)_~_Minimum_of_''n''_independent_rvs._with_uniform_distribution_(continuous), U(0,_1) *_If_''X''_~_Beta(3/2,_3/2)_and_''r''_>_0_then_2''rX'' − ''r''_~_Wigner_semicircle_distribution. *_Beta(1/2,_1/2)_is_equivalent_to_the__arcsine_distribution._This_distribution_is_also_Jeffreys_prior_probability_for_the_Bernoulli_Bernoulli_can_refer_to: _People *Bernoulli_family_of_17th_and_18th_century_Swiss_mathematicians: **_Daniel_Bernoulli_(1700–1782),_developer_of_Bernoulli's_principle **Jacob_Bernoulli_(1654–1705),_also_known_as_Jacques,_after_whom_Bernoulli_numbe_...
_and__binomial_distributions._The_arcsine_probability_density_is_a_distribution_that_appears_in_several_random-walk_fundamental_theorems._In_a_fair_coin_toss_random_walk_ In__mathematics,_a_random_walk_is_a_random_process_that_describes_a_path_that_consists_of_a_succession_of_random_steps_on_some_mathematical_space. An_elementary_example_of_a_random_walk_is_the_random_walk_on_the_integer_number_line_\mathbb_Z_...
,_the_probability_for_the_time_of_the_last_visit_to_the_origin_is_distributed_as_an_(U-shaped)__arcsine_distribution.__In_a_two-player_fair-coin-toss_game,_a_player_is_said_to_be_in_the_lead_if_the_random_walk_(that_started_at_the_origin)_is_above_the_origin.__The_most_probable_number_of_times_that_a_given_player_will_be_in_the_lead,_in_a_game_of_length_2''N'',_is_not_''N''.__On_the_contrary,_''N''_is_the_least_likely_number_of_times_that_the_player_will_be_in_the_lead._The_most_likely_number_of_times_in_the_lead_is_0_or_2''N''_(following_the__arcsine_distribution). *_

\lim__n_\operatorname(1,n)_=__\operatorname(1)

__the_exponential_distribution. *_

\lim__n_\operatorname(k,n)_=_\operatorname(k,1)

_the_gamma_distribution. *_For_large_

n

\operatorname(\alpha_n,\beta_n)_\to_\mathcal\left(\frac,\frac\frac\right)

_the_normal_distribution._More_precisely,_if_

X_n_\sim_\operatorname(\alpha_n,\beta_n)

_then__

\sqrt\left(X_n_-\tfrac\right)

_converges_in_distribution_to_a_normal_distribution_with_mean_0_and_variance_

\tfrac

_as_''n''_increases.

_Derived_from_other_distributions

*_The_''k''th_order_statistic_of_a_sample_of_size_''n''_from_the_Uniform_distribution_(continuous), uniform_distribution_is_a_beta_random_variable,_''U''_(''k'')_~_Beta(''k'',_''n''+1−''k''). *_If_''X''_~_Gamma(α,_θ)_and_''Y''_~_Gamma(β,_θ)_are_independent,_then_

\tfrac_\sim_\operatorname(\alpha,_\beta)\,

. *_If_

X_\sim_\chi^2(\alpha)\,

_and_

Y_\sim_\chi^2(\beta)\,

_are_independent,_then_

\tfrac_\sim_\operatorname(\tfrac,_\tfrac)

. *_If_''X''_~_U(0,_1)_and_''α''_>_0_then_''X''^1/''α''_~_Beta(''α'',_1)._The_power_function_distribution. *_If_

_X_\sim\operatorname(k;n;p)

,_then_

\sim_\operatorname(\alpha,_\beta)

_for_discrete_values_of_''n''_and_''k''_where_

\alpha=k+1

_and_

\beta=n-k+1

. *_If_''X''_~_Cauchy(0,_1)_then_

\tfrac_\sim_\operatorname\left(\tfrac12,_\tfrac12\right)\,

_Combination_with_other_distributions

*_''X''_~_Beta(''α'',_''β'')_and_''Y''_~_F(2''β'',2''α'')_then__

\Pr(X_\leq_\tfrac_\alpha_)_=_\Pr(Y_\geq_x)\,

_for_all_''x''_>_0.

_Compounding_with_other_distributions

*_If_''p''_~_Beta(α,_β)_and_''X''_~_Bin(''k'',_''p'')_then_''X''_~_beta-binomial_distribution *_If_''p''_~_Beta(α,_β)_and_''X''_~_NB(''r'',_''p'')_then_''X''_~_beta_negative_binomial_distribution

_Generalisations

*_The_generalization_to_multiple_variables,_i.e._a_Dirichlet_distribution, multivariate_Beta_distribution,_is_called_a_Dirichlet_distribution_ In_probability_and__statistics,_the_Dirichlet_distribution_(after_Peter_Gustav_Lejeune_Dirichlet),_often_denoted_\operatorname(\boldsymbol\alpha),_is_a_family_of__continuous__multivariate__probability_distributions_parameterized_by_a_vector_\bold_...
._Univariate_marginals_of_the_Dirichlet_distribution_have_a_beta_distribution.__The_beta_distribution_is_Conjugate_prior, conjugate_to_the_binomial_and_Bernoulli_distributions_in_exactly_the_same_way_as_the_Dirichlet_distribution_ In_probability_and__statistics,_the_Dirichlet_distribution_(after_Peter_Gustav_Lejeune_Dirichlet),_often_denoted_\operatorname(\boldsymbol\alpha),_is_a_family_of__continuous__multivariate__probability_distributions_parameterized_by_a_vector_\bold_...
_is_conjugate_to_the_multinomial_distribution_and_categorical_distribution. *_The_Pearson_distribution#The_Pearson_type_I_distribution, Pearson_type_I_distribution_is_identical_to_the_beta_distribution_(except_for_arbitrary_shifting_and_re-scaling_that_can_also_be_accomplished_with_the_four_parameter_parametrization_of_the_beta_distribution). *_The_beta_distribution_is_the_special_case_of_the_noncentral_beta_distribution_where_

\lambda_=_0

\operatorname(\alpha,_\beta)_=_\operatorname(\alpha,\beta,0)

. *_The_generalized_beta_distribution_is_a_five-parameter_distribution_family_which_has_the_beta_distribution_as_a_special_case. *_The_matrix_variate_beta_distribution_is_a_distribution_for_positive-definite_matrices.

__Statistical_inference_

_Parameter_estimation

_Method_of_moments

_=Two_unknown_parameters

= Two_unknown_parameters_(

_(\hat,_\hat)

__of_a_beta_distribution_supported_in_the__,1interval)_can_be_estimated,_using_the_method_of_moments,_with_the_first_two_moments_(sample_mean_and_sample_variance)_as_follows.__Let: :_

\text=\bar_=_\frac\sum_^N_X_i

be_the_sample_mean_estimate_and :_

\text_=\bar_=_\frac\sum_^N_(X_i_-_\bar)^2

be_the_sample_variance_estimate.__The_method_of_moments_(statistics), method-of-moments_estimates_of_the_parameters_are :

\hat_=_\bar_\left(\frac_-_1_\right),

_if_

\bar_<\bar(1_-_\bar),

\hat_=_(1-\bar)_\left(\frac_-_1_\right),

_if_

\bar<\bar(1_-_\bar).

When_the_distribution_is_required_over_a_known_interval_other_than_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
with_random_variable_''X'',_say_[''a'',_''c'']_with_random_variable_''Y'',_then_replace_

\bar

_with_

\frac,

_and_

\bar

_with_

\frac

_in_the_above_couple_of_equations_for_the_shape_parameters_(see_the_"Alternative_parametrizations,_four_parameters"_section_below).,_where: :_

\text=\bar_=_\frac\sum_^N_Y_i

\text_=_\bar_=_\frac\sum_^N_(Y_i_-_\bar)^2

_=Four_unknown_parameters

= All_four_parameters_(

\hat,_\hat,_\hat,_\hat

_of_a_beta_distribution_supported_in_the_[''a'',_''c'']_interval_-see_section_Beta_distribution#Four_parameters_2, "Alternative_parametrizations,_Four_parameters"-)_can_be_estimated,_using_the_method_of_moments_developed_by_Karl_Pearson,_by_equating_sample_and_population_values_of_the_first_four_central_moments_(mean,_variance,_skewness_and_excess_kurtosis)._The_excess_kurtosis_was_expressed_in_terms_of_the_square_of_the_skewness,_and_the_sample_size_ν_=_α_+_β,_(see_previous_section_Beta_distribution#Kurtosis, "Kurtosis")_as_follows: :

\text_=\frac\left(\frac_(\text)^2_-_1\right)\text^2-2<_\text<_\tfrac_(\text)^2

One_can_use_this_equation_to_solve_for_the_sample_size_ν=_α_+_β_in_terms_of_the_square_of_the_skewness_and_the_excess_kurtosis_as_follows: :

\hat_=_\hat_+_\hat_=_3\frac

\text^2-2<_\text<_\tfrac_(\text)^2

This_is_the_ratio_(multiplied_by_a_factor_of_3)_between_the_previously_derived_limit_boundaries_for_the_beta_distribution_in_a_space_(as_originally_done_by_Karl_Pearson)_defined_with_coordinates_of_the_square_of_the_skewness_in_one_axis_and_the_excess_kurtosis_in_the_other_axis_(see_): The_case_of_zero_skewness,_can_be_immediately_solved_because_for_zero_skewness,_α_=_β_and_hence_ν_=_2α_=_2β,_therefore_α_=_β_=_ν/2 :_

\hat_=_\hat_=_\frac=_\frac

_\text=_0_\text_-2<\text<0

(Excess_kurtosis_is_negative_for_the_beta_distribution_with_zero_skewness,_ranging_from_-2_to_0,_so_that_

\hat

_-and_therefore_the_sample_shape_parameters-_is_positive,_ranging_from_zero_when_the_shape_parameters_approach_zero_and_the_excess_kurtosis_approaches_-2,_to_infinity_when_the_shape_parameters_approach_infinity_and_the_excess_kurtosis_approaches_zero). For_non-zero_sample_skewness_one_needs_to_solve_a_system_of_two_coupled_equations._Since_the_skewness_and_the_excess_kurtosis_are_independent_of_the_parameters_

\hat,_\hat

,_the_parameters_

\hat,_\hat

_can_be_uniquely_determined_from_the_sample_skewness_and_the_sample_excess_kurtosis,_by_solving_the_coupled_equations_with_two_known_variables_(sample_skewness_and_sample_excess_kurtosis)_and_two_unknowns_(the_shape_parameters): :

(\text)^2_=_\frac

\text_=\frac\left(\frac_(\text)^2_-_1\right)

\text^2-2<_\text<_\tfrac(\text)^2

resulting_in_the_following_solution: :_

\hat,_\hat_=_\frac_\left_(1_\pm_\frac_\right_)

\text\neq_0_\text_(\text)^2-2<_\text<_\tfrac_(\text)^2

Where_one_should_take_the_solutions_as_follows:_

\hat>\hat

_for_(negative)_sample_skewness_<_0,_and_

\hat<\hat

_for_(positive)_sample_skewness_>_0. The_accompanying_plot_shows_these_two_solutions_as_surfaces_in_a_space_with_horizontal_axes_of_(sample_excess_kurtosis)_and_(sample_squared_skewness)_and_the_shape_parameters_as_the_vertical_axis._The_surfaces_are_constrained_by_the_condition_that_the_sample_excess_kurtosis_must_be_bounded_by_the_sample_squared_skewness_as_stipulated_in_the_above_equation.__The_two_surfaces_meet_at_the_right_edge_defined_by_zero_skewness._Along_this_right_edge,_both_parameters_are_equal_and_the_distribution_is_symmetric_U-shaped_for_α_=_β_<_1,_uniform_for_α_=_β_=_1,_upside-down-U-shaped_for_1_<_α_=_β_<_2_and_bell-shaped_for_α_=_β_>_2.__The_surfaces_also_meet_at_the_front_(lower)_edge_defined_by_"the_impossible_boundary"_line_(excess_kurtosis_+_2_-_skewness²_=_0)._Along_this_front_(lower)_boundary_both_shape_parameters_approach_zero,_and_the_probability_density_is_concentrated_more_at_one_end_than_the_other_end_(with_practically_nothing_in_between),_with_probabilities_

p=\tfrac

_at_the_left_end_''x''_=_0_and_

q_=_1-p_=_\tfrac__

_at_the_right_end_''x''_=_1.__The_two_surfaces_become_further_apart_towards_the_rear_edge.__At_this_rear_edge_the_surface_parameters_are_quite_different_from_each_other.__As_remarked,_for_example,_by_Bowman_and_Shenton,_sampling_in_the_neighborhood_of_the_line_(sample_excess_kurtosis_-_(3/2)(sample_skewness)²_=_0)_(the_just-J-shaped_portion_of_the_rear_edge_where_blue_meets_beige),_"is_dangerously_near_to_chaos",_because_at_that_line_the_denominator_of_the_expression_above_for_the_estimate_ν_=_α_+_β_becomes_zero_and_hence_ν_approaches_infinity_as_that_line_is_approached.__Bowman_and_Shenton__write_that_"the_higher_moment_parameters_(kurtosis_and_skewness)_are_extremely_fragile_(near_that_line)._However,_the_mean_and_standard_deviation_are_fairly_reliable."_Therefore,_the_problem_is_for_the_case_of_four_parameter_estimation_for_very_skewed_distributions_such_that_the_excess_kurtosis_approaches_(3/2)_times_the_square_of_the_skewness.__This_boundary_line_is_produced_by_extremely_skewed_distributions_with_very_large_values_of_one_of_the_parameters_and_very_small_values_of_the_other_parameter.__See__for_a_numerical_example_and_further_comments_about_this_rear_edge_boundary_line_(sample_excess_kurtosis_-_(3/2)(sample_skewness)²_=_0).__As_remarked_by_Karl_Pearson_himself__this_issue_may_not_be_of_much_practical_importance_as_this_trouble_arises_only_for_very_skewed_J-shaped_(or_mirror-image_J-shaped)_distributions_with_very_different_values_of_shape_parameters_that_are_unlikely_to_occur_much_in_practice).__The_usual_skewed-bell-shape_distributions_that_occur_in_practice_do_not_have_this_parameter_estimation_problem. The_remaining_two_parameters_

\hat,_\hat

_can_be_determined_using_the_sample_mean_and_the_sample_variance_using_a_variety_of_equations.__One_alternative_is_to_calculate_the_support_interval_range_

(\hat-\hat)

_based_on_the_sample_variance_and_the_sample_kurtosis.__For_this_purpose_one_can_solve,_in_terms_of_the_range_

(\hat-_\hat)

,_the_equation_expressing_the_excess_kurtosis_in_terms_of_the_sample_variance,_and_the_sample_size_ν_(see__and_): :

\text_=\frac\bigg(\frac_-_6_-_5_\hat_\bigg)

to_obtain: :

_(\hat-_\hat)_=_\sqrt\sqrt

Another_alternative_is_to_calculate_the_support_interval_range_

(\hat-\hat)

_based_on_the_sample_variance_and_the_sample_skewness.__For_this_purpose_one_can_solve,_in_terms_of_the_range_

(\hat-\hat)

,_the_equation_expressing_the_squared_skewness_in_terms_of_the_sample_variance,_and_the_sample_size_ν_(see_section_titled_"Skewness"_and_"Alternative_parametrizations,_four_parameters"): :

(\text)^2_=_\frac\bigg(\frac-4(1+\hat)\bigg)

to_obtain: :

_(\hat-_\hat)_=_\frac\sqrt

The_remaining_parameter_can_be_determined_from_the_sample_mean_and_the_previously_obtained_parameters:_

(\hat-\hat),_\hat,_\hat_=_\hat+\hat

: :

__\hat_=_(\text)_-__\left(\frac\right)(\hat-\hat)_

and_finally,_

\hat=_(\hat-_\hat)_+_\hat__

. In_the_above_formulas_one_may_take,_for_example,_as_estimates_of_the_sample_moments: :

\begin
\text_&=\overline_=_\frac\sum_^N_Y_i_\\
\text_&=_\overline_Y_=_\frac\sum_^N_(Y_i_-_\overline)^2_\\
\text_&=_G_1_=_\frac_\frac_\\
\text_&=_G_2_=_\frac_\frac_-_\frac
\end

The_estimators_''G''₁_for_skewness, sample_skewness_and_''G''₂_for_kurtosis, sample_kurtosis_are_used_by_DAP_(software), DAP/SAS_System, SAS,_PSPP/SPSS,_and_Microsoft_Excel, Excel.__However,_they_are_not_used_by_BMDP_and_(according_to_)_they_were_not_used_by_MINITAB_in_1998._Actually,_Joanes_and_Gill_in_their_1998_study__concluded_that_the_skewness_and_kurtosis_estimators_used_in_BMDP_and_in_MINITAB_(at_that_time)_had_smaller_variance_and_mean-squared_error_in_normal_samples,_but_the_skewness_and_kurtosis_estimators_used_in__DAP_(software), DAP/SAS_System, SAS,_PSPP/SPSS,_namely_''G''₁_and_''G''₂,_had_smaller_mean-squared_error_in_samples_from_a_very_skewed_distribution.__It_is_for_this_reason_that_we_have_spelled_out_"sample_skewness",_etc.,_in_the_above_formulas,_to_make_it_explicit_that_the_user_should_choose_the_best_estimator_according_to_the_problem_at_hand,_as_the_best_estimator_for_skewness_and_kurtosis_depends_on_the_amount_of_skewness_(as_shown_by_Joanes_and_Gill).

_Maximum_likelihood

_=Two_unknown_parameters

= As_is_also_the_case_for_maximum_likelihood_estimates_for_the_gamma_distribution,_the_maximum_likelihood_estimates_for_the_beta_distribution_do_not_have_a_general_closed_form_solution_for_arbitrary_values_of_the_shape_parameters._If_''X''₁,_...,_''X_N''_are_independent_random_variables_each_having_a_beta_distribution,_the_joint_log_likelihood_function_for_''N''_independent_and_identically_distributed_random_variables, iid_observations_is: :

\begin
\ln\,_\mathcal_(\alpha,_\beta\mid_X)_&=_\sum_^N_\ln_\left_(\mathcal_i_(\alpha,_\beta\mid_X_i)_\right_)\\
&=_\sum_^N_\ln_\left_(f(X_i;\alpha,\beta)_\right_)_\\
&=_\sum_^N_\ln_\left_(\frac_\right_)_\\
&=_(\alpha_-_1)\sum_^N_\ln_(X_i)_+_(\beta-_1)\sum_^N__\ln_(1-X_i)_-_N_\ln_\Beta(\alpha,\beta)
\end

Finding_the_maximum_with_respect_to_a_shape_parameter_involves_taking_the_partial_derivative_with_respect_to_the_shape_parameter_and_setting_the_expression_equal_to_zero_yielding_the_maximum_likelihood_estimator_of_the_shape_parameters: :

\frac_=_\sum_^N_\ln_X_i_-N\frac=0

\frac_=_\sum_^N__\ln_(1-X_i)-_N\frac=0

where: :

\frac_=_-\frac+_\frac+_\frac=-\psi(\alpha_+_\beta)_+_\psi(\alpha)_+_0

\frac=_-_\frac+_\frac_+_\frac=-\psi(\alpha_+_\beta)_+_0_+_\psi(\beta)

since_the_digamma_function_ In_mathematics,_the_digamma_function_is_defined_as_the__logarithmic_derivative_of_the_gamma_function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It_is_the_first_of_the__polygamma_functions._It_is_strictly_increasing_and_strict_...
_denoted_ψ(α)_is_defined_as_the_logarithmic_derivative_of_the_gamma_function_ In__mathematics,_the_gamma_function_(represented_by_,_the_capital_letter__gamma_from_the_Greek_alphabet)_is_one_commonly_used_extension_of_the__factorial_function_to_complex_numbers._The_gamma_function_is_defined_for_all_complex_numbers_except_...
: :

\psi(\alpha)_=\frac_

To_ensure_that_the_values_with_zero_tangent_slope_are_indeed_a_maximum_(instead_of_a_saddle-point_or_a_minimum)_one_has_to_also_satisfy_the_condition_that_the_curvature_is_negative.__This_amounts_to_satisfying_that_the_second_partial_derivative_with_respect_to_the_shape_parameters_is_negative :

\frac=_-N\frac<0

\frac_=_-N\frac<0

using_the_previous_equations,_this_is_equivalent_to: :

\frac_=_\psi_1(\alpha)-\psi_1(\alpha_+_\beta)_>_0

\frac_=_\psi_1(\beta)_-\psi_1(\alpha_+_\beta)_>_0

where_the_trigamma_function_ In_mathematics,_the_trigamma_function,_denoted__or_,_is_the_second_of_the_polygamma_functions,_and_is_defined_by :_\psi_1(z)_=_\frac_\ln\Gamma(z). It_follows_from_this_definition_that :_\psi_1(z)_=_\frac_\psi(z) where__is_the_digamma_functio_...
,_denoted_''ψ''₁(''α''),_is_the_second_of_the_polygamma_function_ In_mathematics,_the_polygamma_function_of_order__is_a_meromorphic_function_on_the__complex_numbers_\mathbb_defined_as_the_th__derivative_of_the_logarithm_of_the_gamma_function: :\psi^(z)_:=_\frac_\psi(z)_=_\frac_\ln\Gamma(z). Thus :\psi^(z)__...
s,_and_is_defined_as_the_derivative_of_the_digamma_function: :

\psi_1(\alpha)_=_\frac=\,_\frac.

These_conditions_are_equivalent_to_stating_that_the_variances_of_the_logarithmically_transformed_variables_are_positive,_since: :

\operatorname[\ln_(X)]_=_\operatorname[\ln^2_(X)]_-_(\operatorname[\ln_(X)])^2_=_\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta)_

\operatorname_ln_(1-X)=_\operatorname[\ln^2_(1-X)]_-_(\operatorname[\ln_(1-X)])^2_=_\psi_1(\beta)_-_\psi_1(\alpha_+_\beta)_

Therefore,_the_condition_of_negative_curvature_at_a_maximum_is_equivalent_to_the_statements: :_

__\operatorname[\ln_(X)]_>_0

__\operatorname_ln_(1-X)>_0

Alternatively,_the_condition_of_negative_curvature_at_a_maximum_is_also_equivalent_to_stating_that_the_following_logarithmic_derivatives_of_the__geometric_means_''G_X''_and_''G_(1−X)''_are_positive,_since: :_

\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta)_=_\frac_>_0

\psi_1(\beta)__-_\psi_1(\alpha_+_\beta)_=_\frac_>_0

While_these_slopes_are_indeed_positive,_the_other_slopes_are_negative: :

\frac,_\frac_<_0.

The_slopes_of_the_mean_and_the_median_with_respect_to_''α''_and_''β''_display_similar_sign_behavior. From_the_condition_that_at_a_maximum,_the_partial_derivative_with_respect_to_the_shape_parameter_equals_zero,_we_obtain_the_following_system_of_coupled_maximum_likelihood_estimate_equations_(for_the_average_log-likelihoods)_that_needs_to_be_inverted_to_obtain_the__(unknown)_shape_parameter_estimates_

\hat,\hat

_in_terms_of_the_(known)_average_of_logarithms_of_the_samples_''X''₁,_...,_''X_N'': :

\begin
\hat[\ln_(X)]_&=_\psi(\hat)_-_\psi(\hat_+_\hat)=\frac\sum_^N_\ln_X_i_=__\ln_\hat_X_\\
\hat[\ln(1-X)]_&=_\psi(\hat)_-_\psi(\hat_+_\hat)=\frac\sum_^N_\ln_(1-X_i)=_\ln_\hat_
\end

where_we_recognize_

\log_\hat_X

_as_the_logarithm_of_the_sample__geometric_mean_and_

\log_\hat_

_as_the_logarithm_of_the_sample__geometric_mean_based_on_(1 − ''X''),_the_mirror-image_of ''X''._For_

\hat=\hat

,_it_follows_that__

\hat_X=\hat__

. :

\begin
\hat_X_&=_\prod_^N_(X_i)^_\\
\hat__&=_\prod_^N_(1-X_i)^
\end

These_coupled_equations_containing_digamma_function_ In_mathematics,_the_digamma_function_is_defined_as_the__logarithmic_derivative_of_the_gamma_function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It_is_the_first_of_the__polygamma_functions._It_is_strictly_increasing_and_strict_...
s_of_the_shape_parameter_estimates_

\hat,\hat

_must_be_solved_by_numerical_methods_as_done,_for_example,_by_Beckman_et_al._Gnanadesikan_et_al._give_numerical_solutions_for_a_few_cases._Norman_Lloyd_Johnson, N.L.Johnson_and_Samuel_Kotz, S.Kotz_suggest_that_for_"not_too_small"_shape_parameter_estimates_

\hat,\hat

,_the_logarithmic_approximation_to_the_digamma_function_

\psi(\hat)_\approx_\ln(\hat-\tfrac)

_may_be_used_to_obtain_initial_values_for_an_iterative_solution,_since_the_equations_resulting_from_this_approximation_can_be_solved_exactly: :

\ln_\frac__\approx__\ln_\hat_X_

\ln_\frac\approx_\ln_\hat__

which_leads_to_the_following_solution_for_the_initial_values_(of_the_estimate_shape_parameters_in_terms_of_the_sample_geometric_means)_for_an_iterative_solution: :

\hat\approx_\tfrac_+_\frac_\text_\hat_>1

\hat\approx_\tfrac_+_\frac_\text_\hat_>_1

Alternatively,_the_estimates_provided_by_the_method_of_moments_can_instead_be_used_as_initial_values_for_an_iterative_solution_of_the_maximum_likelihood_coupled_equations_in_terms_of_the_digamma_functions. When_the_distribution_is_required_over_a_known_interval_other_than_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
_with_random_variable_''X'',_say_[''a'',_''c'']_with_random_variable_''Y'',_then_replace_ln(''X_i'')_in_the_first_equation_with :

\ln_\frac,

and_replace_ln(1−''X_i'')_in_the_second_equation_with :

\ln_\frac

(see_"Alternative_parametrizations,_four_parameters"_section_below). If_one_of_the_shape_parameters_is_known,_the_problem_is_considerably_simplified.__The_following_logit_transformation_can_be_used_to_solve_for_the_unknown_shape_parameter_(for_skewed_cases_such_that_

\hat\neq\hat

,_otherwise,_if_symmetric,_both_-equal-_parameters_are_known_when_one_is_known): :

\hat_\left[\ln_\left(\frac_\right)_\right]=\psi(\hat)_-_\psi(\hat)=\frac\sum_^N_\ln\frac_=__\ln_\hat_X_-_\ln_\left(\hat_\right)_

This_logit_transformation_is_the_logarithm_of_the_transformation_that_divides_the_variable_''X''_by_its_mirror-image_(''X''/(1_-_''X'')_resulting_in_the_"inverted_beta_distribution"__or_beta_prime_distribution_ In_probability_theory_and__statistics,_the_beta_prime_distribution_(also_known_as_inverted_beta_distribution_or_beta_distribution_of_the_second_kindJohnson_et_al_(1995),_p_248)_is_an_absolutely_continuous_probability_distribution. __Definitions_ _...
_(also_known_as_beta_distribution_of_the_second_kind_or_Pearson_distribution, Pearson's_Type_VI)_with_support_[0,_+∞)._As_previously_discussed_in_the_section_"Moments_of_logarithmically_transformed_random_variables,"_the_logit_transformation_

\ln\frac

,_studied_by_Johnson,_extends_the_finite_support_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
based_on_the_original_variable_''X''_to_infinite_support_in_both_directions_of_the_real_line_(−∞,_+∞). If,_for_example,_

\hat

_is_known,_the_unknown_parameter_

\hat

_can_be_obtained_in_terms_of_the_inverse_digamma_function_of_the_right_hand_side_of_this_equation: :

\psi(\hat)=\frac\sum_^N_\ln\frac_+_\psi(\hat)_

\hat=\psi^(\ln_\hat_X_-_\ln_\hat__+_\psi(\hat))_

In_particular,_if_one_of_the_shape_parameters_has_a_value_of_unity,_for_example_for_

\hat_=_1

_(the_power_function_distribution_with_bounded_support_[0,1]),_using_the_identity_ψ(''x''_+_1)_=_ψ(''x'')_+_1/''x''_in_the_equation_

\psi(\hat)_-_\psi(\hat_+_\hat)=_\ln_\hat_X

,_the_maximum_likelihood_estimator_for_the_unknown_parameter_

\hat

_is,_exactly: :

\hat=_-_\frac=_-_\frac_

The_beta_has_support_[0,_1],_therefore_

\hat_X_<_1

,_and_hence_

(-\ln_\hat_X)_>0

,_and_therefore_

\hat_>0.

In_conclusion,_the_maximum_likelihood_estimates_of_the_shape_parameters_of_a_beta_distribution_are_(in_general)_a_complicated_function_of_the_sample__geometric_mean,_and_of_the_sample__geometric_mean_based_on_''(1−X)'',_the_mirror-image_of_''X''.__One_may_ask,_if_the_variance_(in_addition_to_the_mean)_is_necessary_to_estimate_two_shape_parameters_with_the_method_of_moments,_why_is_the_(logarithmic_or_geometric)_variance_not_necessary_to_estimate_two_shape_parameters_with_the_maximum_likelihood_method,_for_which_only_the_geometric_means_suffice?__The_answer_is_because_the_mean_does_not_provide_as_much_information_as_the_geometric_mean.__For_a_beta_distribution_with_equal_shape_parameters_''α'' = ''β'',_the_mean_is_exactly_1/2,_regardless_of_the_value_of_the_shape_parameters,_and_therefore_regardless_of_the_value_of_the_statistical_dispersion_(the_variance).__On_the_other_hand,_the_geometric_mean_of_a_beta_distribution_with_equal_shape_parameters_''α'' = ''β'',_depends_on_the_value_of_the_shape_parameters,_and_therefore_it_contains_more_information.__Also,_the_geometric_mean_of_a_beta_distribution_does_not_satisfy_the_symmetry_conditions_satisfied_by_the_mean,_therefore,_by_employing_both_the_geometric_mean_based_on_''X''_and_geometric_mean_based_on_(1 − ''X''),_the_maximum_likelihood_method_is_able_to_provide_best_estimates_for_both_parameters_''α'' = ''β'',_without_need_of_employing_the_variance. One_can_express_the_joint_log_likelihood_per_''N''_independent_and_identically_distributed_random_variables, iid_observations_in_terms_of_the_''sufficient_statistics''_(the_sample_geometric_means)_as_follows: :

\frac_=_(\alpha_-_1)\ln_\hat_X_+_(\beta-_1)\ln_\hat_-_\ln_\Beta(\alpha,\beta).

We_can_plot_the_joint_log_likelihood_per_''N''_observations_for_fixed_values_of_the_sample_geometric_means_to_see_the_behavior_of_the_likelihood_function_as_a_function_of_the_shape_parameters_α_and_β._In_such_a_plot,_the_shape_parameter_estimators_

\hat,\hat

_correspond_to_the_maxima_of_the_likelihood_function._See_the_accompanying_graph_that_shows_that_all_the_likelihood_functions_intersect_at_α_=_β_=_1,_which_corresponds_to_the_values_of_the_shape_parameters_that_give_the_maximum_entropy_(the_maximum_entropy_occurs_for_shape_parameters_equal_to_unity:_the_uniform_distribution).__It_is_evident_from_the_plot_that_the_likelihood_function_gives_sharp_peaks_for_values_of_the_shape_parameter_estimators_close_to_zero,_but_that_for_values_of_the_shape_parameters_estimators_greater_than_one,_the_likelihood_function_becomes_quite_flat,_with_less_defined_peaks.__Obviously,_the_maximum_likelihood_parameter_estimation_method_for_the_beta_distribution_becomes_less_acceptable_for_larger_values_of_the_shape_parameter_estimators,_as_the_uncertainty_in_the_peak_definition_increases_with_the_value_of_the_shape_parameter_estimators.__One_can_arrive_at_the_same_conclusion_by_noticing_that_the_expression_for_the_curvature_of_the_likelihood_function_is_in_terms_of_the_geometric_variances :

\frac=_-\operatorname_ln_X/math>
: \frac_=_-\operatorname[\ln_(1-X)] These_variances_(and_therefore_the_curvatures)_are_much_larger_for_small_values_of_the_shape_parameter_α_and_β._However,_for_shape_parameter_values_α,_β_>_1,_the_variances_(and_therefore_the_curvatures)_flatten_out.__Equivalently,_this_result_follows_from_the_Cramér–Rao_bound,_since_the_Fisher_information_
In_mathematical_statistics,_the_Fisher_information_(sometimes_simply_called_information)_is_a_way_of_measuring_the_amount_of_information_that_an_observable_random_variable_''X''_carries_about_an_unknown_parameter_''θ''_of_a_distribution_that_model_...

_matrix_components_for_the_beta_distribution_are_these_logarithmic_variances._The_Cramér–Rao_bound_states_that_the_variance__ In_probability_theory_and_statistics,_variance_is_the__expectation_of_the_squared__deviation_of_a__random_variable_from_its__population_mean_or__sample_mean._Variance_is_a_measure_of_dispersion,_meaning_it_is_a_measure_of_how_far_a_set_of_numbe_...
_of_any_''unbiased''_estimator_

\hat

_of_α_is_bounded_by_the_multiplicative_inverse, reciprocal_of_the_Fisher_information_ In_mathematical_statistics,_the_Fisher_information_(sometimes_simply_called_information)_is_a_way_of_measuring_the_amount_of_information_that_an_observable_random_variable_''X''_carries_about_an_unknown_parameter_''θ''_of_a_distribution_that_model_...
: :

\mathrm(\hat)\geq\frac\geq\frac

\mathrm(\hat)_\geq\frac\geq\frac

so_the_variance_of_the_estimators_increases_with_increasing_α_and_β,_as_the_logarithmic_variances_decrease. Also_one_can_express_the_joint_log_likelihood_per_''N''_independent_and_identically_distributed_random_variables, iid_observations_in_terms_of_the_digamma_function_ In_mathematics,_the_digamma_function_is_defined_as_the__logarithmic_derivative_of_the_gamma_function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It_is_the_first_of_the__polygamma_functions._It_is_strictly_increasing_and_strict_...
_expressions_for_the_logarithms_of_the_sample_geometric_means_as_follows: :

\frac_=_(\alpha_-_1)(\psi(\hat)_-_\psi(\hat_+_\hat))+(\beta-_1)(\psi(\hat)_-_\psi(\hat_+_\hat))-_\ln_\Beta(\alpha,\beta)

this_expression_is_identical_to_the_negative_of_the_cross-entropy_(see_section_on_"Quantities_of_information_(entropy)").__Therefore,_finding_the_maximum_of_the_joint_log_likelihood_of_the_shape_parameters,_per_''N''_independent_and_identically_distributed_random_variables, iid_observations,_is_identical_to_finding_the_minimum_of_the_cross-entropy_for_the_beta_distribution,_as_a_function_of_the_shape_parameters. :

\frac_=_-_H_=_-h_-_D__=_-\ln\Beta(\alpha,\beta)+(\alpha-1)\psi(\hat)+(\beta-1)\psi(\hat)-(\alpha+\beta-2)\psi(\hat+\hat)

with_the_cross-entropy_defined_as_follows: :

H_=_\int_^1_-_f(X;\hat,\hat)_\ln_(f(X;\alpha,\beta))_\,_X_

_=Four_unknown_parameters

= The_procedure_is_similar_to_the_one_followed_in_the_two_unknown_parameter_case._If_''Y''₁,_...,_''Y_N''_are_independent_random_variables_each_having_a_beta_distribution_with_four_parameters,_the_joint_log_likelihood_function_for_''N''_independent_and_identically_distributed_random_variables, iid_observations_is: :

\begin
\ln\,_\mathcal_(\alpha,_\beta,_a,_c\mid_Y)_&=_\sum_^N_\ln\,\mathcal_i_(\alpha,_\beta,_a,_c\mid_Y_i)\\
&=_\sum_^N_\ln\,f(Y_i;_\alpha,_\beta,_a,_c)_\\
&=_\sum_^N_\ln\,\frac\\
&=_(\alpha_-_1)\sum_^N__\ln_(Y_i_-_a)_+_(\beta-_1)\sum_^N__\ln_(c_-_Y_i)-_N_\ln_\Beta(\alpha,\beta)_-_N_(\alpha+\beta_-_1)_\ln_(c_-_a)
\end

\frac=_\sum_^N__\ln_(Y_i_-_a)_-_N(-\psi(\alpha_+_\beta)_+_\psi(\alpha))-_N_\ln_(c_-_a)=_0

\frac_=_\sum_^N__\ln_(c_-_Y_i)_-_N(-\psi(\alpha_+_\beta)__+_\psi(\beta))-_N_\ln_(c_-_a)=_0

\frac_=_-(\alpha_-_1)_\sum_^N__\frac_\,+_N_(\alpha+\beta_-_1)\frac=_0

\frac_=_(\beta-_1)_\sum_^N__\frac_\,-_N_(\alpha+\beta_-_1)_\frac_=_0

these_equations_can_be_re-arranged_as_the_following_system_of_four_coupled_equations_(the_first_two_equations_are_geometric_means_and_the_second_two_equations_are_the_harmonic_means)_in_terms_of_the_maximum_likelihood_estimates_for_the_four_parameters_

\hat,_\hat,_\hat,_\hat

: :

\frac\sum_^N__\ln_\frac_=_\psi(\hat)-\psi(\hat_+\hat_)=__\ln_\hat_X

\frac\sum_^N__\ln_\frac_=__\psi(\hat)-\psi(\hat_+_\hat)=__\ln_\hat_

\frac_=_\frac=__\hat_X

\frac_=_\frac_=__\hat_

with_sample_geometric_means: :

\hat_X_=_\prod_^_\left_(\frac_\right_)^

\hat__=_\prod_^_\left_(\frac_\right_)^

The_parameters_

\hat,_\hat

_are_embedded_inside_the_geometric_mean_expressions_in_a_nonlinear_way_(to_the_power_1/''N'').__This_precludes,_in_general,_a_closed_form_solution,_even_for_an_initial_value_approximation_for_iteration_purposes.__One_alternative_is_to_use_as_initial_values_for_iteration_the_values_obtained_from_the_method_of_moments_solution_for_the_four_parameter_case.__Furthermore,_the_expressions_for_the_harmonic_means_are_well-defined_only_for_

\hat,_\hat_>_1

,_which_precludes_a_maximum_likelihood_solution_for_shape_parameters_less_than_unity_in_the_four-parameter_case._Fisher's_information_matrix_for_the_four_parameter_case_is_Positive-definite_matrix, positive-definite_only_for_α,_β_>_2_(for_further_discussion,_see_section_on_Fisher_information_matrix,_four_parameter_case),_for_bell-shaped_(symmetric_or_unsymmetric)_beta_distributions,_with_inflection_points_located_to_either_side_of_the_mode._The_following_Fisher_information_components_(that_represent_the_expectations_of_the_curvature_of_the_log_likelihood_function)_have_mathematical_singularity, singularities_at_the_following_values: :

\alpha_=_2:_\quad_\operatorname_\left_[-_\frac_\frac_\right_]=__

\beta_=_2:_\quad_\operatorname\left_[-_\frac_\frac_\right_]_=__

\alpha_=_2:_\quad_\operatorname\left_[-_\frac\frac\right_]_=___

\beta_=_1:_\quad_\operatorname\left_[-_\frac\frac_\right_]_=____

(for_further_discussion_see_section_on_Fisher_information_matrix)._Thus,_it_is_not_possible_to_strictly_carry_on_the_maximum_likelihood_estimation_for_some_well_known_distributions_belonging_to_the_four-parameter_beta_distribution_family,_like_the_continuous_uniform_distribution, uniform_distribution_(Beta(1,_1,_''a'',_''c'')),_and_the__arcsine_distribution_(Beta(1/2,_1/2,_''a'',_''c'')).__Norman_Lloyd_Johnson, N.L.Johnson_and_Samuel_Kotz, S.Kotz_ignore_the_equations_for_the_harmonic_means_and_instead_suggest_"If_a_and_c_are_unknown,_and_maximum_likelihood_estimators_of_''a'',_''c'',_α_and_β_are_required,_the_above_procedure_(for_the_two_unknown_parameter_case,_with_''X''_transformed_as_''X''_=_(''Y'' − ''a'')/(''c'' − ''a''))_can_be_repeated_using_a_succession_of_trial_values_of_''a''_and_''c'',_until_the_pair_(''a'',_''c'')_for_which_maximum_likelihood_(given_''a''_and_''c'')_is_as_great_as_possible,_is_attained"_(where,_for_the_purpose_of_clarity,_their_notation_for_the_parameters_has_been_translated_into_the_present_notation).

_Fisher_information_matrix

Let_a_random_variable_X_have_a_probability_density_''f''(''x'';''α'')._The_partial_derivative_with_respect_to_the_(unknown,_and_to_be_estimated)_parameter_α_of_the_log_likelihood_function_ The_likelihood_function_(often_simply_called_the_likelihood)_represents_the_probability_of__random_variable_realizations_conditional_on_particular_values_of_the__statistical_parameters._Thus,_when_evaluated_on_a__given_sample,_the_likelihood_funct_...
_is_called_the_score_(statistics), score.__The_second_moment_of_the_score_is_called_the_Fisher_information_ In_mathematical_statistics,_the_Fisher_information_(sometimes_simply_called_information)_is_a_way_of_measuring_the_amount_of_information_that_an_observable_random_variable_''X''_carries_about_an_unknown_parameter_''θ''_of_a_distribution_that_model_...
: :

\mathcal(\alpha)=\operatorname_\left_[\left_(\frac_\ln_\mathcal(\alpha\mid_X)_\right_)^2_\right],

The_expected_value, expectation_of_the_score_(statistics), score_is_zero,_therefore_the_Fisher_information_is_also_the_second_moment_centered_on_the_mean_of_the_score:_the_variance__ In_probability_theory_and_statistics,_variance_is_the__expectation_of_the_squared__deviation_of_a__random_variable_from_its__population_mean_or__sample_mean._Variance_is_a_measure_of_dispersion,_meaning_it_is_a_measure_of_how_far_a_set_of_numbe_...
_of_the_score. If_the_log_likelihood_function_ The_likelihood_function_(often_simply_called_the_likelihood)_represents_the_probability_of__random_variable_realizations_conditional_on_particular_values_of_the__statistical_parameters._Thus,_when_evaluated_on_a__given_sample,_the_likelihood_funct_...
_is_twice_differentiable_with_respect_to_the_parameter_α,_and_under_certain_regularity_conditions,_then_the_Fisher_information_may_also_be_written_as_follows_(which_is_often_a_more_convenient_form_for_calculation_purposes): :

\mathcal(\alpha)_=_-_\operatorname_\left_[\frac_\ln_(\mathcal(\alpha\mid_X))_\right].

Thus,_the_Fisher_information_is_the_negative_of_the_expectation_of_the_second_derivative__with_respect_to_the_parameter_α_of_the_log_likelihood_function_ The_likelihood_function_(often_simply_called_the_likelihood)_represents_the_probability_of__random_variable_realizations_conditional_on_particular_values_of_the__statistical_parameters._Thus,_when_evaluated_on_a__given_sample,_the_likelihood_funct_...
._Therefore,_Fisher_information_is_a_measure_of_the_curvature_of_the_log_likelihood_function_of_α._A_low_curvature_(and_therefore_high_Radius_of_curvature_(mathematics), radius_of_curvature),_flatter_log_likelihood_function_curve_has_low_Fisher_information;_while_a_log_likelihood_function_curve_with_large_curvature_(and_therefore_low_Radius_of_curvature_(mathematics), radius_of_curvature)_has_high_Fisher_information._When_the_Fisher_information_matrix_is_computed_at_the_evaluates_of_the_parameters_("the_observed_Fisher_information_matrix")_it_is_equivalent_to_the_replacement_of_the_true_log_likelihood_surface_by_a_Taylor's_series_approximation,_taken_as_far_as_the_quadratic_terms.__The_word_information,_in_the_context_of_Fisher_information,_refers_to_information_about_the_parameters._Information_such_as:_estimation,_sufficiency_and_properties_of_variances_of_estimators.__The_Cramér–Rao_bound_states_that_the_inverse_of_the_Fisher_information_is_a_lower_bound_on_the_variance_of_any_

_of_a_parameter_α: :

\operatorname[\hat\alpha]_\geq_\frac.

The_precision_to_which_one_can_estimate_the_estimator_of_a_parameter_α_is_limited_by_the_Fisher_Information_of_the_log_likelihood_function._The_Fisher_information_is_a_measure_of_the_minimum_error_involved_in_estimating_a_parameter_of_a_distribution_and_it_can_be_viewed_as_a_measure_of_the_resolving_power_of_an_experiment_needed_to_discriminate_between_two_alternative_hypothesis_of_a_parameter. When_there_are_''N''_parameters :

_\begin_\theta_1_\\_\theta__\\_\dots_\\_\theta__\end,

then_the_Fisher_information_takes_the_form_of_an_''N''×''N''_positive_semidefinite_matrix, positive_semidefinite_symmetric_matrix,_the_Fisher_Information_Matrix,_with_typical_element: :

_=\operatorname_\left_[\left_(\frac_\ln_\mathcal_\right)_\left(\frac_\ln_\mathcal_\right)_\right_].

Under_certain_regularity_conditions,_the_Fisher_Information_Matrix_may_also_be_written_in_the_following_form,_which_is_often_more_convenient_for_computation: :

__=_-_\operatorname_\left_[\frac_\ln_(\mathcal)_\right_]\,.

With_''X''₁,_...,_''X_N''_iid_random_variables,_an_''N''-dimensional_"box"_can_be_constructed_with_sides_''X''₁,_...,_''X_N''._Costa_and_Cover__show_that_the_(Shannon)_differential_entropy_''h''(''X'')_is_related_to_the_volume_of_the_typical_set_(having_the_sample_entropy_close_to_the_true_entropy),_while_the_Fisher_information_is_related_to_the_surface_of_this_typical_set.

_=Two_parameters

= For_''X''₁,_...,_''X''_''N''_independent_random_variables_each_having_a_beta_distribution_parametrized_with_shape_parameters_''α''_and_''β'',_the_joint_log_likelihood_function_for_''N''_independent_and_identically_distributed_random_variables, iid_observations_is: :

\ln_(\mathcal_(\alpha,_\beta\mid_X)_)=_(\alpha_-_1)\sum_^N_\ln_X_i_+_(\beta-_1)\sum_^N__\ln_(1-X_i)-_N_\ln_\Beta(\alpha,\beta)_

therefore_the_joint_log_likelihood_function_per_''N''_independent_and_identically_distributed_random_variables, iid_observations_is: :

\frac_\ln(\mathcal_(\alpha,_\beta\mid_X))_=_(\alpha_-_1)\frac\sum_^N__\ln_X_i_+_(\beta-_1)\frac\sum_^N__\ln_(1-X_i)-\,_\ln_\Beta(\alpha,\beta)

For_the_two_parameter_case,_the_Fisher_information_has_4_components:_2_diagonal_and_2_off-diagonal._Since_the_Fisher_information_matrix_is_symmetric,_one_of_these_off_diagonal_components_is_independent._Therefore,_the_Fisher_information_matrix_has_3_independent_components_(2_diagonal_and_1_off_diagonal). _ Aryal_and_Nadarajah_calculated_Fisher's_information_matrix_for_the_four-parameter_case,_from_which_the_two_parameter_case_can_be_obtained_as_follows: :

-_\frac=__\operatorname[\ln_(X)]=_\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta)_=_=_\operatorname\left_[-_\frac_\right_]_=_\ln_\operatorname__

-_\frac_=_\operatorname_ln_(1-X)=_\psi_1(\beta)_-_\psi_1(\alpha_+_\beta)_=_=__\operatorname\left_[-_\frac_\right]=_\ln_\operatorname__

-_\frac_=_\operatorname[\ln_X,\ln(1-X)]__=_-\psi_1(\alpha+\beta)_=_=__\operatorname\left_[-_\frac_\right]_=_\ln_\operatorname_

Since_the_Fisher_information_matrix_is_symmetric :

_\mathcal_=_\mathcal_=_\ln_\operatorname_

The_Fisher_information_components_are_equal_to_the_log_geometric_variances_and_log_geometric_covariance._Therefore,_they_can_be_expressed_as_trigamma_function_ In_mathematics,_the_trigamma_function,_denoted__or_,_is_the_second_of_the_polygamma_functions,_and_is_defined_by :_\psi_1(z)_=_\frac_\ln\Gamma(z). It_follows_from_this_definition_that :_\psi_1(z)_=_\frac_\psi(z) where__is_the_digamma_functio_...
s,_denoted_ψ₁(α),__the_second_of_the_polygamma_function_ In_mathematics,_the_polygamma_function_of_order__is_a_meromorphic_function_on_the__complex_numbers_\mathbb_defined_as_the_th__derivative_of_the_logarithm_of_the_gamma_function: :\psi^(z)_:=_\frac_\psi(z)_=_\frac_\ln\Gamma(z). Thus :\psi^(z)__...
s,_defined_as_the_derivative_of_the_digamma_function: :

\psi_1(\alpha)_=_\frac=\,_\frac._

These_derivatives_are_also_derived_in_the__and_plots_of_the_log_likelihood_function_are_also_shown_in_that_section.___contains_plots_and_further_discussion_of_the_Fisher_information_matrix_components:_the_log_geometric_variances_and_log_geometric_covariance_as_a_function_of_the_shape_parameters_α_and_β.___contains_formulas_for_moments_of_logarithmically_transformed_random_variables._Images_for_the_Fisher_information_components_

\mathcal_,_\mathcal_

_and_

\mathcal_

_are_shown_in_. The_determinant_of_Fisher's_information_matrix_is_of_interest_(for_example_for_the_calculation_of_Jeffreys_prior_probability).__From_the_expressions_for_the_individual_components_of_the_Fisher_information_matrix,_it_follows_that_the_determinant_of_Fisher's_(symmetric)_information_matrix_for_the_beta_distribution_is: :

\begin
\det(\mathcal(\alpha,_\beta))&=_\mathcal__\mathcal_-\mathcal__\mathcal__\\_pt&=(\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta))(\psi_1(\beta)_-_\psi_1(\alpha_+_\beta))-(_-\psi_1(\alpha+\beta))(_-\psi_1(\alpha+\beta))\\_pt&=_\psi_1(\alpha)\psi_1(\beta)-(_\psi_1(\alpha)+\psi_1(\beta))\psi_1(\alpha_+_\beta)\\_pt\lim__\det(\mathcal(\alpha,_\beta))_&=\lim__\det(\mathcal(\alpha,_\beta))_=_\infty\\_pt\lim__\det(\mathcal(\alpha,_\beta))_&=\lim__\det(\mathcal(\alpha,_\beta))_=_0
\end

From_Sylvester's_criterion_(checking_whether_the_diagonal_elements_are_all_positive),_it_follows_that_the_Fisher_information_matrix_for_the_two_parameter_case_is_Positive-definite_matrix, positive-definite_(under_the_standard_condition_that_the_shape_parameters_are_positive_''α'' > 0_and ''β'' > 0).

_=Four_parameters

= If_''Y''₁,_...,_''Y_N''_are_independent_random_variables_each_having_a_beta_distribution_with_four_parameters:_the_exponents_''α''_and_''β'',_and_also_''a''_(the_minimum_of_the_distribution_range),_and_''c''_(the_maximum_of_the_distribution_range)_(section_titled_"Alternative_parametrizations",_"Four_parameters"),_with_probability_density_function_ In_probability_theory,_a_probability_density_function_(PDF),_or_density_of_a_continuous_random_variable,_is_a__function_whose_value_at_any_given_sample_(or_point)_in_the__sample_space_(the_set_of_possible_values_taken_by_the_random_variable)_ca_...
: :

f(y;_\alpha,_\beta,_a,_c)_=_\frac_=\frac=\frac.

the_joint_log_likelihood_function_per_''N''_independent_and_identically_distributed_random_variables, iid_observations_is: :

\frac_\ln(\mathcal_(\alpha,_\beta,_a,_c\mid_Y))=_\frac\sum_^N__\ln_(Y_i_-_a)_+_\frac\sum_^N__\ln_(c_-_Y_i)-_\ln_\Beta(\alpha,\beta)_-_(\alpha+\beta_-1)_\ln_(c-a)_

For_the_four_parameter_case,_the_Fisher_information_has_4*4=16_components.__It_has_12_off-diagonal_components_=_(4×4_total_−_4_diagonal)._Since_the_Fisher_information_matrix_is_symmetric,_half_of_these_components_(12/2=6)_are_independent._Therefore,_the_Fisher_information_matrix_has_6_independent_off-diagonal_+_4_diagonal_=_10_independent_components.__Aryal_and_Nadarajah_calculated_Fisher's_information_matrix_for_the_four_parameter_case_as_follows: :

-_\frac_\frac=__\operatorname[\ln_(X)]=_\psi_1(\alpha)_-_\psi_1(\alpha_+_\beta)_=_\mathcal_=_\operatorname\left_[-_\frac_\frac_\right_]_=_\ln_(\operatorname)_

-\frac_\frac_=_\operatorname_ln_(1-X)=_\psi_1(\beta)_-_\psi_1(\alpha_+_\beta)_=_=__\operatorname_\left_[-_\frac_\frac_\right_]_=_\ln(\operatorname)_

-\frac_\frac_=_\operatorname[\ln_X,(1-X)]__=_-\psi_1(\alpha+\beta)_=\mathcal_=__\operatorname_\left_[-_\frac\frac_\right_]_=_\ln(\operatorname_)

In_the_above_expressions,_the_use_of_''X''_instead_of_''Y''_in_the_expressions_var[ln(''X'')]_=_ln(var_''GX'')_is_''not_an_error''._The_expressions_in_terms_of_the_log_geometric_variances_and_log_geometric_covariance_occur_as_functions_of_the_two_parameter_''X''_~_Beta(''α'',_''β'')_parametrization_because_when_taking_the_partial_derivatives_with_respect_to_the_exponents_(''α'',_''β'')_in_the_four_parameter_case,_one_obtains_the_identical_expressions_as_for_the_two_parameter_case:_these_terms_of_the_four_parameter_Fisher_information_matrix_are_independent_of_the_minimum_''a''_and_maximum_''c''_of_the_distribution's_range._The_only_non-zero_term_upon_double_differentiation_of_the_log_likelihood_function_with_respect_to_the_exponents_''α''_and_''β''_is_the_second_derivative_of_the_log_of_the_beta_function:_ln(B(''α'',_''β''))._This_term_is_independent_of_the_minimum_''a''_and_maximum_''c''_of_the_distribution's_range._Double_differentiation_of_this_term_results_in_trigamma_functions.__The_sections_titled_"Maximum_likelihood",_"Two_unknown_parameters"_and_"Four_unknown_parameters"_also_show_this_fact. The_Fisher_information_for_''N''_i.i.d._samples_is_''N''_times_the_individual_Fisher_information_(eq._11.279,_page_394_of_Cover_and_Thomas).__(Aryal_and_Nadarajah_take_a_single_observation,_''N''_=_1,_to_calculate_the_following_components_of_the_Fisher_information,_which_leads_to_the_same_result_as_considering_the_derivatives_of_the_log_likelihood_per_''N''_observations._Moreover,_below_the_erroneous_expression_for_

_

_in_Aryal_and_Nadarajah_has_been_corrected.) :

\begin
\alpha_>_2:_\quad_\operatorname\left_[-_\frac_\frac_\right_]_&=__=\frac_\\
\beta_>_2:_\quad_\operatorname\left[-\frac_\frac_\right_]_&=_\mathcal__=_\frac_\\
\operatorname\left[-_\frac_\frac_\right_]_&=____=_\frac_\\
\alpha_>_1:_\quad_\operatorname\left[-_\frac_\frac_\right_]_&=\mathcal___=_\frac_\\
\operatorname\left[-_\frac_\frac_\right_]_&=___=_\frac_\\
\operatorname\left[-_\frac_\frac_\right_]_&=___=_-\frac_\\
\beta_>_1:_\quad_\operatorname\left[-_\frac_\frac_\right_]_&=_\mathcal___=_-\frac
\end

The_lower_two_diagonal_entries_of_the_Fisher_information_matrix,_with_respect_to_the_parameter_"a"_(the_minimum_of_the_distribution's_range):_

\mathcal_

,_and_with_respect_to_the_parameter_"c"_(the_maximum_of_the_distribution's_range):_

\mathcal_

_are_only_defined_for_exponents_α_>_2_and_β_>_2_respectively._The_Fisher_information_matrix_component_

\mathcal_

_for_the_minimum_"a"_approaches_infinity_for_exponent_α_approaching_2_from_above,_and_the_Fisher_information_matrix_component_

\mathcal_

_for_the_maximum_"c"_approaches_infinity_for_exponent_β_approaching_2_from_above. The_Fisher_information_matrix_for_the_four_parameter_case_does_not_depend_on_the_individual_values_of_the_minimum_"a"_and_the_maximum_"c",_but_only_on_the_total_range_(''c''−''a'').__Moreover,_the_components_of_the_Fisher_information_matrix_that_depend_on_the_range_(''c''−''a''),_depend_only_through_its_inverse_(or_the_square_of_the_inverse),_such_that_the_Fisher_information_decreases_for_increasing_range_(''c''−''a''). The_accompanying_images_show_the_Fisher_information_components_

\mathcal_

_and_

\mathcal_

._Images_for_the_Fisher_information_components_

\mathcal_

_and_

\mathcal_

_are_shown_in__.__All_these_Fisher_information_components_look_like_a_basin,_with_the_"walls"_of_the_basin_being_located_at_low_values_of_the_parameters. The_following_four-parameter-beta-distribution_Fisher_information_components_can_be_expressed_in_terms_of_the_two-parameter:_''X''_~_Beta(α,_β)_expectations_of_the_transformed_ratio_((1-''X'')/''X'')_and_of_its_mirror_image_(''X''/(1-''X'')),_scaled_by_the_range_(''c''−''a''),_which_may_be_helpful_for_interpretation: :

\mathcal__=\frac=_\frac_\text\alpha_>_1

\mathcal__=_-\frac=-_\frac\text\beta>_1

These_are_also_the_expected_values_of_the_"inverted_beta_distribution"_or_beta_prime_distribution_ In_probability_theory_and__statistics,_the_beta_prime_distribution_(also_known_as_inverted_beta_distribution_or_beta_distribution_of_the_second_kindJohnson_et_al_(1995),_p_248)_is_an_absolutely_continuous_probability_distribution. __Definitions_ _...
_(also_known_as_beta_distribution_of_the_second_kind_or_Pearson_distribution, Pearson's_Type_VI)__and_its_mirror_image,_scaled_by_the_range_(''c'' − ''a''). Also,_the_following_Fisher_information_components_can_be_expressed_in_terms_of_the_harmonic_(1/X)_variances_or_of_variances_based_on_the_ratio_transformed_variables_((1-X)/X)_as_follows: :

\begin
\alpha_>_2:_\quad_\mathcal__&=\operatorname_\left_[\frac_\right]_\left_(\frac_\right_)^2_=\operatorname_\left_[\frac_\right_]_\left_(\frac_\right)^2_=_\frac_\\
\beta_>_2:_\quad_\mathcal__&=_\operatorname_\left_[\frac_\right_]_\left_(\frac_\right_)^2_=_\operatorname_\left_[\frac_\right_]_\left_(\frac_\right_)^2__=\frac__\\
\mathcal__&=\operatorname_\left_[\frac,\frac_\right_]\frac__=_\operatorname_\left_[\frac,\frac_\right_]_\frac_=\frac
\end

See_section_"Moments_of_linearly_transformed,_product_and_inverted_random_variables"_for_these_expectations. The_determinant_of_Fisher's_information_matrix_is_of_interest_(for_example_for_the_calculation_of_Jeffreys_prior_probability).__From_the_expressions_for_the_individual_components,_it_follows_that_the_determinant_of_Fisher's_(symmetric)_information_matrix_for_the_beta_distribution_with_four_parameters_is: :

\begin
\det(\mathcal(\alpha,\beta,a,c))_=__&_-\mathcal_^2_\mathcal__\mathcal_+\mathcal__\mathcal__\mathcal__\mathcal_+\mathcal_^2_\mathcal_^2_-\mathcal__\mathcal__\mathcal_^2\\
&__-\mathcal__\mathcal__\mathcal__\mathcal_+\mathcal_^2_\mathcal__\mathcal_+2_\mathcal__\mathcal__\mathcal__\mathcal_\\
&_-2\mathcal__\mathcal__\mathcal__\mathcal_+\mathcal_^2_\mathcal_^2-\mathcal__\mathcal__\mathcal_^2+\mathcal__\mathcal_^2_\mathcal_\\
&_-\mathcal__\mathcal__\mathcal__\mathcal_-\mathcal__\mathcal__\mathcal__\mathcal_+\mathcal__\mathcal__\mathcal__\mathcal_\\
&_-\mathcal__\mathcal__\mathcal__\mathcal_+\mathcal__\mathcal__\mathcal__\mathcal_-\mathcal__\mathcal_^2_\mathcal_\\
&_+2_\mathcal__\mathcal__\mathcal__\mathcal_-\mathcal__\mathcal_^2_\mathcal_-\mathcal_^2_\mathcal__\mathcal_+\mathcal__\mathcal__\mathcal__\mathcal_\text\alpha,_\beta>_2
\end

Using_Sylvester's_criterion_(checking_whether_the_diagonal_elements_are_all_positive),_and_since_diagonal_components_

_

_and_

_

_have_Mathematical_singularity, singularities_at_α=2_and_β=2_it_follows_that_the_Fisher_information_matrix_for_the_four_parameter_case_is_Positive-definite_matrix, positive-definite_for_α>2_and_β>2.__Since_for_α_>_2_and_β_>_2_the_beta_distribution_is_(symmetric_or_unsymmetric)_bell_shaped,_it_follows_that_the_Fisher_information_matrix_is_positive-definite_only_for_bell-shaped_(symmetric_or_unsymmetric)_beta_distributions,_with_inflection_points_located_to_either_side_of_the_mode._Thus,_important_well_known_distributions_belonging_to_the_four-parameter_beta_distribution_family,_like_the_parabolic_distribution_(Beta(2,2,a,c))_and_the_continuous_uniform_distribution, uniform_distribution_(Beta(1,1,a,c))_have_Fisher_information_components_(

\mathcal_,\mathcal_,\mathcal_,\mathcal_

)_that_blow_up_(approach_infinity)_in_the_four-parameter_case_(although_their_Fisher_information_components_are_all_defined_for_the_two_parameter_case).__The_four-parameter_Wigner_semicircle_distribution_(Beta(3/2,3/2,''a'',''c''))_and__arcsine_distribution_(Beta(1/2,1/2,''a'',''c''))_have_negative_Fisher_information_determinants_for_the_four-parameter_case.

_Bayesian_inference

The_use_of_Beta_distributions_in__Bayesian_inference_is_due_to_the_fact_that_they_provide_a_family_of__conjugate_prior_probability_distributions_for__binomial_(including_Bernoulli_Bernoulli_can_refer_to: _People *Bernoulli_family_of_17th_and_18th_century_Swiss_mathematicians: **_Daniel_Bernoulli_(1700–1782),_developer_of_Bernoulli's_principle **Jacob_Bernoulli_(1654–1705),_also_known_as_Jacques,_after_whom_Bernoulli_numbe_...
)_and_geometric_distributions.__The_domain_of_the_beta_distribution_can_be_viewed_as_a_probability,_and_in_fact_the_beta_distribution_is_often_used_to_describe_the_distribution_of_a_probability_value_''p'': :

P(p;\alpha,\beta)_=_\frac.

Examples_of_beta_distributions_used_as_prior_probabilities_to_represent_ignorance_of_prior_parameter_values_in_Bayesian_inference_are_Beta(1,1),_Beta(0,0)_and_Beta(1/2,1/2).

_Rule_of_succession

A_classic_application_of_the_beta_distribution_is_the_rule_of_succession,_introduced_in_the_18th_century_by_Pierre-Simon_Laplace__in_the_course_of_treating_the_sunrise_problem.__It_states_that,_given_''s''_successes_in_''n''_conditional_independence, conditionally_independent_Bernoulli_trials_with_probability_''p,''_that_the_estimate_of_the_expected_value_in_the_next_trial_is_

\frac

.__This_estimate_is_the_expected_value_of_the_posterior_distribution_over_''p,''_namely_Beta(''s''+1,_''n''−''s''+1),_which_is_given_by_Bayes'_rule_if_one_assumes_a_uniform_prior_probability_over_''p''_(i.e.,_Beta(1,_1))_and_then_observes_that_''p''_generated_''s''_successes_in_''n''_trials.__Laplace's_rule_of_succession_has_been_criticized_by_prominent_scientists.__R._T._Cox_described_Laplace's_application_of_the_rule_of_succession_to_the_sunrise_problem_(_p. 89)_as_"a_travesty_of_the_proper_use_of_the_principle."__Keynes_remarks__(_Ch.XXX,_p. 382)__"indeed_this_is_so_foolish_a_theorem_that_to_entertain_it_is_discreditable."__Karl_Pearson__showed_that_the_probability_that_the_next_(''n'' + 1)_trials_will_be_successes,_after_n_successes_in_n_trials,_is_only_50%,_which_has_been_considered_too_low_by_scientists_like_Jeffreys_and_unacceptable_as_a_representation_of_the_scientific_process_of_experimentation_to_test_a_proposed_scientific_law.__As_pointed_out_by_Jeffreys_(_p. 128)_(crediting_C._D._Broad_)_Laplace's_rule_of_succession_establishes_a_high_probability_of_success_((n+1)/(n+2))_in_the_next_trial,_but_only_a_moderate_probability_(50%)_that_a_further_sample_(n+1)_comparable_in_size_will_be_equally_successful.__As_pointed_out_by_Perks,_"The_rule_of_succession_itself_is_hard_to_accept._It_assigns_a_probability_to_the_next_trial_which_implies_the_assumption_that_the_actual_run_observed_is_an_average_run_and_that_we_are_always_at_the_end_of_an_average_run._It_would,_one_would_think,_be_more_reasonable_to_assume_that_we_were_in_the_middle_of_an_average_run._Clearly_a_higher_value_for_both_probabilities_is_necessary_if_they_are_to_accord_with_reasonable_belief."_These_problems_with_Laplace's_rule_of_succession_motivated_Haldane,_Perks,_Jeffreys_and_others_to_search_for_other_forms_of_prior_probability_(see_the_next_).__According_to_Jaynes,_the_main_problem_with_the_rule_of_succession_is_that_it_is_not_valid_when_s=0_or_s=n_(see_rule_of_succession,_for_an_analysis_of_its_validity).

_Bayes-Laplace_prior_probability_(Beta(1,1))

The_beta_distribution_achieves_maximum_differential_entropy_for_Beta(1,1):_the_Uniform_density, uniform_probability_density,_for_which_all_values_in_the_domain_of_the_distribution_have_equal_density.__This_uniform_distribution_Beta(1,1)_was_suggested_("with_a_great_deal_of_doubt")_by_Thomas_Bayes_as_the_prior_probability_distribution_to_express_ignorance_about_the_correct_prior_distribution._This_prior_distribution_was_adopted_(apparently,_from_his_writings,_with_little_sign_of_doubt)_by_Pierre-Simon_Laplace,_and_hence_it_was_also_known_as_the_"Bayes-Laplace_rule"_or_the_"Laplace_rule"_of_"inverse_probability"_in_publications_of_the_first_half_of_the_20th_century._In_the_later_part_of_the_19th_century_and_early_part_of_the_20th_century,_scientists_realized_that_the_assumption_of_uniform_"equal"_probability_density_depended_on_the_actual_functions_(for_example_whether_a_linear_or_a_logarithmic_scale_was_most_appropriate)_and_parametrizations_used.__In_particular,_the_behavior_near_the_ends_of_distributions_with_finite_support_(for_example_near_''x''_=_0,_for_a_distribution_with_initial_support_at_''x''_=_0)_required_particular_attention._Keynes_(_Ch.XXX,_p. 381)_criticized_the_use_of_Bayes's_uniform_prior_probability_(Beta(1,1))_that_all_values_between_zero_and_one_are_equiprobable,_as_follows:_"Thus_experience,_if_it_shows_anything,_shows_that_there_is_a_very_marked_clustering_of_statistical_ratios_in_the_neighborhoods_of_zero_and_unity,_of_those_for_positive_theories_and_for_correlations_between_positive_qualities_in_the_neighborhood_of_zero,_and_of_those_for_negative_theories_and_for_correlations_between_negative_qualities_in_the_neighborhood_of_unity._"

_Haldane's_prior_probability_(Beta(0,0))

The_Beta(0,0)_distribution_was_proposed_by_J.B.S._Haldane,_who_suggested_that_the_prior_probability_representing_complete_uncertainty_should_be_proportional_to_''p''⁻¹(1−''p'')⁻¹._The_function_''p''⁻¹(1−''p'')⁻¹_can_be_viewed_as_the_limit_of_the_numerator_of_the_beta_distribution_as_both_shape_parameters_approach_zero:_α,_β_→_0._The_Beta_function_(in_the_denominator_of_the_beta_distribution)_approaches_infinity,_for_both_parameters_approaching_zero,_α,_β_→_0._Therefore,_''p''⁻¹(1−''p'')⁻¹_divided_by_the_Beta_function_approaches_a_2-point_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
_with_equal_probability_1/2_at_each_end,_at_0_and_1,_and_nothing_in_between,_as_α,_β_→_0._A_coin-toss:_one_face_of_the_coin_being_at_0_and_the_other_face_being_at_1.__The_Haldane_prior_probability_distribution_Beta(0,0)_is_an_"improper_prior"_because_its_integration_(from_0_to_1)_fails_to_strictly_converge_to_1_due_to_the_singularities_at_each_end._However,_this_is_not_an_issue_for_computing_posterior_probabilities_unless_the_sample_size_is_very_small.__Furthermore,_Zellner_points_out_that_on_the_log-odds_scale,_(the_logit_transformation_ln(''p''/1−''p'')),_the_Haldane_prior_is_the_uniformly_flat_prior._The_fact_that_a_uniform_prior_probability_on_the_logit_transformed_variable_ln(''p''/1−''p'')_(with_domain_(-∞,_∞))_is_equivalent_to_the_Haldane_prior_on_the_domain_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
was_pointed_out_by_Harold_Jeffreys_in_the_first_edition_(1939)_of_his_book_Theory_of_Probability_(_p. 123).__Jeffreys_writes_"Certainly_if_we_take_the_Bayes-Laplace_rule_right_up_to_the_extremes_we_are_led_to_results_that_do_not_correspond_to_anybody's_way_of_thinking._The_(Haldane)_rule_d''x''/(''x''(1−''x''))_goes_too_far_the_other_way.__It_would_lead_to_the_conclusion_that_if_a_sample_is_of_one_type_with_respect_to_some_property_there_is_a_probability_1_that_the_whole_population_is_of_that_type."__The_fact_that_"uniform"_depends_on_the_parametrization,_led_Jeffreys_to_seek_a_form_of_prior_that_would_be_invariant_under_different_parametrizations.

_Jeffreys'_prior_probability_(Beta(1/2,1/2)_for_a_Bernoulli_or_for_a_binomial_distribution)

Harold_Jeffreys_proposed_to_use_an_uninformative_prior_ In_Bayesian_statistical_inference,_a_prior_probability_distribution,_often_simply_called_the_prior,_of_an_uncertain_quantity_is_the_probability_distribution_that_would_express_one's_beliefs_about_this_quantity_before_some_evidence_is_taken_into__...
_probability_measure_that_should_be_Parametrization_invariance, invariant_under_reparameterization:_proportional_to_the_square_root_of_the_determinant_of_Fisher's_information_matrix.__For_the_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
,_this_can_be_shown_as_follows:_for_a_coin_that_is_"heads"_with_probability_''p''_∈_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
and_is_"tails"_with_probability_1_−_''p'',_for_a_given_(H,T)_∈__the_probability_is_''p^H''(1_−_''p'')^''T''.__Since_''T''_=_1_−_''H'',_the_Bernoulli_distribution_ In_probability_theory_and_statistics,_the_Bernoulli_distribution,_named_after_Swiss_mathematician__Jacob_Bernoulli,James_Victor_Uspensky:_''Introduction_to_Mathematical_Probability'',_McGraw-Hill,_New_York_1937,_page_45_is_the__discrete_probabi_...
_is_''p^H''(1_−_''p'')^1_−_''H''._Considering_''p''_as_the_only_parameter,_it_follows_that_the_log_likelihood_for_the_Bernoulli_distribution_is :

\ln__\mathcal_(p\mid_H)_=_H_\ln(p)+_(1-H)_\ln(1-p).

The_Fisher_information_matrix_has_only_one_component_(it_is_a_scalar,_because_there_is_only_one_parameter:_''p''),_therefore: :

\begin
\sqrt_&=_\sqrt_\\_pt&=_\sqrt_\\_pt&=_\sqrt_\\
&=_\frac.
\end

Similarly,_for_the_Binomial_distribution_with_''n''_Bernoulli_trials,_it_can_be_shown_that :

\sqrt=_\frac.

Thus,_for_the_Bernoulli_Bernoulli_can_refer_to: _People *Bernoulli_family_of_17th_and_18th_century_Swiss_mathematicians: **_Daniel_Bernoulli_(1700–1782),_developer_of_Bernoulli's_principle **Jacob_Bernoulli_(1654–1705),_also_known_as_Jacques,_after_whom_Bernoulli_numbe_...
,_and_Binomial_distributions,_Jeffreys_prior_is_proportional_to_

\scriptstyle_\frac

,_which_happens_to_be_proportional_to_a_beta_distribution_with_domain_variable_''x''_=_''p'',_and_shape_parameters_α_=_β_=_1/2,_the__arcsine_distribution: :

Beta(\tfrac,_\tfrac)_=_\frac.

It_will_be_shown_in_the_next_section_that_the_normalizing_constant_for_Jeffreys_prior_is_immaterial_to_the_final_result_because_the_normalizing_constant_cancels_out_in_Bayes_theorem_for_the_posterior_probability.__Hence_Beta(1/2,1/2)_is_used_as_the_Jeffreys_prior_for_both_Bernoulli_and_binomial_distributions._As_shown_in_the_next_section,_when_using_this_expression_as_a_prior_probability_times_the_likelihood_in_Bayes_theorem,_the_posterior_probability_turns_out_to_be_a_beta_distribution._It_is_important_to_realize,_however,_that_Jeffreys_prior_is_proportional_to_

\scriptstyle_\frac

_for_the_Bernoulli_and_binomial_distribution,_but_not_for_the_beta_distribution.__Jeffreys_prior_for_the_beta_distribution_is_given_by_the_determinant_of_Fisher's_information_for_the_beta_distribution,_which,_as_shown_in_the___is_a_function_of_the_trigamma_function_ In_mathematics,_the_trigamma_function,_denoted__or_,_is_the_second_of_the_polygamma_functions,_and_is_defined_by :_\psi_1(z)_=_\frac_\ln\Gamma(z). It_follows_from_this_definition_that :_\psi_1(z)_=_\frac_\psi(z) where__is_the_digamma_functio_...
_ψ₁_of_shape_parameters_α_and_β_as_follows: :

_\begin
\sqrt_&=_\sqrt_\\
\lim__\sqrt_&=\lim__\sqrt_=_\infty\\
\lim__\sqrt_&=\lim__\sqrt_=_0
\end

As_previously_discussed,_Jeffreys_prior_for_the_Bernoulli_and_binomial_distributions_is_proportional_to_the__arcsine_distribution_Beta(1/2,1/2),_a_one-dimensional_''curve''_that_looks_like_a_basin_as_a_function_of_the_parameter_''p''_of_the_Bernoulli_and_binomial_distributions._The_walls_of_the_basin_are_formed_by_''p''_approaching_the_singularities_at_the_ends_''p''_→_0_and_''p''_→_1,_where_Beta(1/2,1/2)_approaches_infinity._Jeffreys_prior_for_the_beta_distribution_is_a_''2-dimensional_surface''_(embedded_in_a_three-dimensional_space)_that_looks_like_a_basin_with_only_two_of_its_walls_meeting_at_the_corner_α_=_β_=_0_(and_missing_the_other_two_walls)_as_a_function_of_the_shape_parameters_α_and_β_of_the_beta_distribution._The_two_adjoining_walls_of_this_2-dimensional_surface_are_formed_by_the_shape_parameters_α_and_β_approaching_the_singularities_(of_the_trigamma_function)_at_α,_β_→_0._It_has_no_walls_for_α,_β_→_∞_because_in_this_case_the_determinant_of_Fisher's_information_matrix_for_the_beta_distribution_approaches_zero. It_will_be_shown_in_the_next_section_that_Jeffreys_prior_probability_results_in_posterior_probabilities_(when_multiplied_by_the_binomial_likelihood_function)_that_are_intermediate_between_the_posterior_probability_results_of_the_Haldane_and_Bayes_prior_probabilities. Jeffreys_prior_may_be_difficult_to_obtain_analytically,_and_for_some_cases_it_just_doesn't_exist_(even_for_simple_distribution_functions_like_the_asymmetric_triangular_distribution)._Berger,_Bernardo_and_Sun,_in_a_2009_paper__defined_a_reference_prior_probability_distribution_that_(unlike_Jeffreys_prior)_exists_for_the_asymmetric_triangular_distribution._They_cannot_obtain_a_closed-form_expression_for_their_reference_prior,_but_numerical_calculations_show_it_to_be_nearly_perfectly_fitted_by_the_(proper)_prior :

_\operatorname(\tfrac,_\tfrac)_\sim\frac

where_θ_is_the_vertex_variable_for_the_asymmetric_triangular_distribution_with_support_,_1_ The_comma__is_a_punctuation_mark_that_appears_in_several_variants_in_different_languages._It_has_the_same_shape_as_an_apostrophe_or_single_closing_quotation_mark_()_in_many_typefaces,_but_it_differs_from_them_in_being_placed_on_the__baseline_o_...
(corresponding_to_the_following_parameter_values_in_Wikipedia's_article_on_the_triangular_distribution:_vertex_''c''_=_''θ'',_left_end_''a''_=_0,and_right_end_''b''_=_1)._Berger_et_al._also_give_a_heuristic_argument_that_Beta(1/2,1/2)_could_indeed_be_the_exact_Berger–Bernardo–Sun_reference_prior_for_the_asymmetric_triangular_distribution._Therefore,_Beta(1/2,1/2)_not_only_is_Jeffreys_prior_for_the_Bernoulli_and_binomial_distributions,_but_also_seems_to_be_the_Berger–Bernardo–Sun_reference_prior_for_the_asymmetric_triangular_distribution_(for_which_the_Jeffreys_prior_does_not_exist),_a_distribution_used_in_project_management_and_PERT_analysis_to_describe_the_cost_and_duration_of_project_tasks. Clarke_and_Barron_prove_that,_among_continuous_positive_priors,_Jeffreys_prior_(when_it_exists)_asymptotically_maximizes_Shannon's_mutual_information_between_a_sample_of_size_n_and_the_parameter,_and_therefore_''Jeffreys_prior_is_the_most_uninformative_prior''_(measuring_information_as_Shannon_information)._The_proof_rests_on_an_examination_of_the_Kullback–Leibler_divergence_between_probability_density_functions_for_iid_random_variables.

_Effect_of_different_prior_probability_choices_on_the_posterior_beta_distribution

If_samples_are_drawn_from_the_population_of_a_random_variable_''X''_that_result_in_''s''_successes_and_''f''_failures_in_"n"_Bernoulli_trials_''n'' = ''s'' + ''f'',_then_the_likelihood_function_ The_likelihood_function_(often_simply_called_the_likelihood)_represents_the_probability_of__random_variable_realizations_conditional_on_particular_values_of_the__statistical_parameters._Thus,_when_evaluated_on_a__given_sample,_the_likelihood_funct_...
_for_parameters_''s''_and_''f''_given_''x'' = ''p''_(the_notation_''x'' = ''p''_in_the_expressions_below_will_emphasize_that_the_domain_''x''_stands_for_the_value_of_the_parameter_''p''_in_the_binomial_distribution),_is_the_following_binomial_distribution: :

\mathcal(s,f\mid_x=p)_=__x^s(1-x)^f_=__x^s(1-x)^._

If_beliefs_about_prior_probability_information_are_reasonably_well_approximated_by_a_beta_distribution_with_parameters_''α'' Prior_and_''β'' Prior,_then: :

(x=p;\alpha_\operatorname,\beta_\operatorname)_=_\frac

According_to_Bayes'_theorem_for_a_continuous_event_space,_the_posterior_probability_is_given_by_the_product_of_the_prior_probability_and_the_likelihood_function_(given_the_evidence_''s''_and_''f'' = ''n'' − ''s''),_normalized_so_that_the_area_under_the_curve_equals_one,_as_follows: :

\begin
&_\operatorname(x=p\mid_s,n-s)_\\_pt=__&_\frac__\\_pt=__&_\frac_\\_pt=__&_\frac_\\_pt=__&_\frac.
\end

The_binomial_coefficient :

\frac=\frac

appears_both_in_the_numerator_and_the_denominator_of_the_posterior_probability,_and_it_does_not_depend_on_the_integration_variable_''x'',_hence_it_cancels_out,_and_it_is_irrelevant_to_the_final_result.__Similarly_the_normalizing_factor_for_the_prior_probability,_the_beta_function_B(αPrior,βPrior)_cancels_out_and_it_is_immaterial_to_the_final_result._The_same_posterior_probability_result_can_be_obtained_if_one_uses_an_un-normalized_prior :

x^(1-x)^

because_the_normalizing_factors_all_cancel_out._Several_authors_(including_Jeffreys_himself)_thus_use_an_un-normalized_prior_formula_since_the_normalization_constant_cancels_out.__The_numerator_of_the_posterior_probability_ends_up_being_just_the_(un-normalized)_product_of_the_prior_probability_and_the_likelihood_function,_and_the_denominator_is_its_integral_from_zero_to_one._The_beta_function_in_the_denominator,_B(''s'' + ''α'' Prior, ''n'' − ''s'' + ''β'' Prior),_appears_as_a_normalization_constant_to_ensure_that_the_total_posterior_probability_integrates_to_unity. The_ratio_''s''/''n''_of_the_number_of_successes_to_the_total_number_of_trials_is_a_sufficient_statistic_in_the_binomial_case,_which_is_relevant_for_the_following_results. For_the_Bayes'_prior_probability_(Beta(1,1)),_the_posterior_probability_is: :

\operatorname(p=x\mid_s,f)_=_\frac,_\text=\frac,\text=\frac\text_0_<_s_<_n).

For_the_Jeffreys'_prior_probability_(Beta(1/2,1/2)),_the_posterior_probability_is: :

\operatorname(p=x\mid_s,f)_=__,\text_=_\frac,\text\frac\text_\tfrac_<_s_<_n-\tfrac).

and_for_the_Haldane_prior_probability_(Beta(0,0)),_the_posterior_probability_is: :

\operatorname(p=x\mid_s,f)_=_\frac,_\text_=_\frac,\text\frac\text_1_<_s_<_n_-1).

From_the_above_expressions_it_follows_that_for_''s''/''n'' = 1/2)_all_the_above_three_prior_probabilities_result_in_the_identical_location_for_the_posterior_probability_mean = mode = 1/2.__For_''s''/''n'' < 1/2,_the_mean_of_the_posterior_probabilities,_using_the_following_priors,_are_such_that:_mean_for_Bayes_prior_> mean_for_Jeffreys_prior_> mean_for_Haldane_prior._For_''s''/''n'' > 1/2_the_order_of_these_inequalities_is_reversed_such_that_the_Haldane_prior_probability_results_in_the_largest_posterior_mean._The_''Haldane''_prior_probability_Beta(0,0)_results_in_a_posterior_probability_density_with_''mean''_(the_expected_value_for_the_probability_of_success_in_the_"next"_trial)_identical_to_the_ratio_''s''/''n''_of_the_number_of_successes_to_the_total_number_of_trials._Therefore,_the_Haldane_prior_results_in_a_posterior_probability_with_expected_value_in_the_next_trial_equal_to_the_maximum_likelihood._The_''Bayes''_prior_probability_Beta(1,1)_results_in_a_posterior_probability_density_with_''mode''_identical_to_the_ratio_''s''/''n''_(the_maximum_likelihood). In_the_case_that_100%_of_the_trials_have_been_successful_''s'' = ''n'',_the_''Bayes''_prior_probability_Beta(1,1)_results_in_a_posterior_expected_value_equal_to_the_rule_of_succession_(''n'' + 1)/(''n'' + 2),_while_the_Haldane_prior_Beta(0,0)_results_in_a_posterior_expected_value_of_1_(absolute_certainty_of_success_in_the_next_trial).__Jeffreys_prior_probability_results_in_a_posterior_expected_value_equal_to_(''n'' + 1/2)/(''n'' + 1)._Perks_(p. 303)_points_out:_"This_provides_a_new_rule_of_succession_and_expresses_a_'reasonable'_position_to_take_up,_namely,_that_after_an_unbroken_run_of_n_successes_we_assume_a_probability_for_the_next_trial_equivalent_to_the_assumption_that_we_are_about_half-way_through_an_average_run,_i.e._that_we_expect_a_failure_once_in_(2''n'' + 2)_trials._The_Bayes–Laplace_rule_implies_that_we_are_about_at_the_end_of_an_average_run_or_that_we_expect_a_failure_once_in_(''n'' + 2)_trials._The_comparison_clearly_favours_the_new_result_(what_is_now_called_Jeffreys_prior)_from_the_point_of_view_of_'reasonableness'." Conversely,_in_the_case_that_100%_of_the_trials_have_resulted_in_failure_(''s'' = 0),_the_''Bayes''_prior_probability_Beta(1,1)_results_in_a_posterior_expected_value_for_success_in_the_next_trial_equal_to_1/(''n'' + 2),_while_the_Haldane_prior_Beta(0,0)_results_in_a_posterior_expected_value_of_success_in_the_next_trial_of_0_(absolute_certainty_of_failure_in_the_next_trial)._Jeffreys_prior_probability_results_in_a_posterior_expected_value_for_success_in_the_next_trial_equal_to_(1/2)/(''n'' + 1),_which_Perks_(p. 303)_points_out:_"is_a_much_more_reasonably_remote_result_than_the_Bayes-Laplace_result 1/(''n'' + 2)". Jaynes_questions_(for_the_uniform_prior_Beta(1,1))_the_use_of_these_formulas_for_the_cases_''s'' = 0_or_''s'' = ''n''_because_the_integrals_do_not_converge_(Beta(1,1)_is_an_improper_prior_for_''s'' = 0_or_''s'' = ''n'')._In_practice,_the_conditions_0_(p. 303)_shows_that,_for_what_is_now_known_as_the_Jeffreys_prior,_this_probability_is_((''n'' + 1/2)/(''n'' + 1))((''n'' + 3/2)/(''n'' + 2))...(2''n'' + 1/2)/(2''n'' + 1),_which_for_''n'' = 1, 2, 3_gives_15/24,_315/480,_9009/13440;_rapidly_approaching_a_limiting_value_of_

1/\sqrt_=_0.70710678\ldots

_as_n_tends_to_infinity.__Perks_remarks_that_what_is_now_known_as_the_Jeffreys_prior:_"is_clearly_more_'reasonable'_than_either_the_Bayes-Laplace_result_or_the_result_on_the_(Haldane)_alternative_rule_rejected_by_Jeffreys_which_gives_certainty_as_the_probability._It_clearly_provides_a_very_much_better_correspondence_with_the_process_of_induction._Whether_it_is_'absolutely'_reasonable_for_the_purpose,_i.e._whether_it_is_yet_large_enough,_without_the_absurdity_of_reaching_unity,_is_a_matter_for_others_to_decide._But_it_must_be_realized_that_the_result_depends_on_the_assumption_of_complete_indifference_and_absence_of_knowledge_prior_to_the_sampling_experiment." Following_are_the_variances_of_the_posterior_distribution_obtained_with_these_three_prior_probability_distributions: for_the_Bayes'_prior_probability_(Beta(1,1)),_the_posterior_variance_is: :

\text_=_\frac,\text_s=\frac_\text_=\frac

for_the_Jeffreys'_prior_probability_(Beta(1/2,1/2)),_the_posterior_variance_is: :_

\text_=_\frac_,\text_s=\frac_n_2_\text_=_\frac_1_

and_for_the_Haldane_prior_probability_(Beta(0,0)),_the_posterior_variance_is: :

\text_=_\frac,_\texts=\frac\text_=\frac

So,_as_remarked_by_Silvey,_for_large_''n'',_the_variance_is_small_and_hence_the_posterior_distribution_is_highly_concentrated,_whereas_the_assumed_prior_distribution_was_very_diffuse.__This_is_in_accord_with_what_one_would_hope_for,_as_vague_prior_knowledge_is_transformed_(through_Bayes_theorem)_into_a_more_precise_posterior_knowledge_by_an_informative_experiment.__For_small_''n''_the_Haldane_Beta(0,0)_prior_results_in_the_largest_posterior_variance_while_the_Bayes_Beta(1,1)_prior_results_in_the_more_concentrated_posterior.__Jeffreys_prior_Beta(1/2,1/2)_results_in_a_posterior_variance_in_between_the_other_two.__As_''n''_increases,_the_variance_rapidly_decreases_so_that_the_posterior_variance_for_all_three_priors_converges_to_approximately_the_same_value_(approaching_zero_variance_as_''n''_→_∞)._Recalling_the_previous_result_that_the_''Haldane''_prior_probability_Beta(0,0)_results_in_a_posterior_probability_density_with_''mean''_(the_expected_value_for_the_probability_of_success_in_the_"next"_trial)_identical_to_the_ratio_s/n_of_the_number_of_successes_to_the_total_number_of_trials,_it_follows_from_the_above_expression_that_also_the_''Haldane''_prior_Beta(0,0)_results_in_a_posterior_with_''variance''_identical_to_the_variance_expressed_in_terms_of_the_max._likelihood_estimate_s/n_and_sample_size_(in_): :

\text_=_\frac=_\frac_

with_the_mean_''μ'' = ''s''/''n''_and_the_sample_size ''ν'' = ''n''. In_Bayesian_inference,_using_a_prior_distribution_Beta(''α''Prior,''β''Prior)_prior_to_a_binomial_distribution_is_equivalent_to_adding_(''α''Prior − 1)_pseudo-observations_of_"success"_and_(''β''Prior − 1)_pseudo-observations_of_"failure"_to_the_actual_number_of_successes_and_failures_observed,_then_estimating_the_parameter_''p''_of_the_binomial_distribution_by_the_proportion_of_successes_over_both_real-_and_pseudo-observations.__A_uniform_prior_Beta(1,1)_does_not_add_(or_subtract)_any_pseudo-observations_since_for_Beta(1,1)_it_follows_that_(''α''Prior − 1) = 0_and_(''β''Prior − 1) = 0._The_Haldane_prior_Beta(0,0)_subtracts_one_pseudo_observation_from_each_and_Jeffreys_prior_Beta(1/2,1/2)_subtracts_1/2_pseudo-observation_of_success_and_an_equal_number_of_failure._This_subtraction_has_the_effect_of_smoothing_out_the_posterior_distribution.__If_the_proportion_of_successes_is_not_50%_(''s''/''n'' ≠ 1/2)_values_of_''α''Prior_and_''β''Prior_less_than 1_(and_therefore_negative_(''α''Prior − 1)_and_(''β''Prior − 1))_favor_sparsity,_i.e._distributions_where_the_parameter_''p''_is_closer_to_either_0_or 1.__In_effect,_values_of_''α''Prior_and_''β''Prior_between_0_and_1,_when_operating_together,_function_as_a_concentration_parameter. The_accompanying_plots_show_the_posterior_probability_density_functions_for_sample_sizes_''n'' ∈ ,_successes_''s'' ∈ _and_Beta(''α''Prior,''β''Prior) ∈ ._Also_shown_are_the_cases_for_''n'' = ,_success_''s'' = _and_Beta(''α''Prior,''β''Prior) ∈ ._The_first_plot_shows_the_symmetric_cases,_for_successes_''s'' ∈ ,_with_mean = mode = 1/2_and_the_second_plot_shows_the_skewed_cases_''s'' ∈ .__The_images_show_that_there_is_little_difference_between_the_priors_for_the_posterior_with_sample_size_of_50_(characterized_by_a_more_pronounced_peak_near_''p'' = 1/2)._Significant_differences_appear_for_very_small_sample_sizes_(in_particular_for_the_flatter_distribution_for_the_degenerate_case_of_sample_size = 3)._Therefore,_the_skewed_cases,_with_successes_''s'' = ,_show_a_larger_effect_from_the_choice_of_prior,_at_small_sample_size,_than_the_symmetric_cases.__For_symmetric_distributions,_the_Bayes_prior_Beta(1,1)_results_in_the_most_"peaky"_and_highest_posterior_distributions_and_the_Haldane_prior_Beta(0,0)_results_in_the_flattest_and_lowest_peak_distribution.__The_Jeffreys_prior_Beta(1/2,1/2)_lies_in_between_them.__For_nearly_symmetric,_not_too_skewed_distributions_the_effect_of_the_priors_is_similar.__For_very_small_sample_size_(in_this_case_for_a_sample_size_of_3)_and_skewed_distribution_(in_this_example_for_''s'' ∈ )_the_Haldane_prior_can_result_in_a_reverse-J-shaped_distribution_with_a_singularity_at_the_left_end.__However,_this_happens_only_in_degenerate_cases_(in_this_example_''n'' = 3_and_hence_''s'' = 3/4 < 1,_a_degenerate_value_because_s_should_be_greater_than_unity_in_order_for_the_posterior_of_the_Haldane_prior_to_have_a_mode_located_between_the_ends,_and_because_''s'' = 3/4_is_not_an_integer_number,_hence_it_violates_the_initial_assumption_of_a_binomial_distribution_for_the_likelihood)_and_it_is_not_an_issue_in_generic_cases_of_reasonable_sample_size_(such_that_the_condition_1 < ''s'' < ''n'' − 1,_necessary_for_a_mode_to_exist_between_both_ends,_is_fulfilled). In_Chapter_12_(p. 385)_of_his_book,_Jaynes_asserts_that_the_''Haldane_prior''_Beta(0,0)_describes_a_''prior_state_of_knowledge_of_complete_ignorance'',_where_we_are_not_even_sure_whether_it_is_physically_possible_for_an_experiment_to_yield_either_a_success_or_a_failure,_while_the_''Bayes_(uniform)_prior_Beta(1,1)_applies_if''_one_knows_that_''both_binary_outcomes_are_possible''._Jaynes_states:_"''interpret_the_Bayes-Laplace_(Beta(1,1))_prior_as_describing_not_a_state_of_complete_ignorance'',_but_the_state_of_knowledge_in_which_we_have_observed_one_success_and_one_failure...once_we_have_seen_at_least_one_success_and_one_failure,_then_we_know_that_the_experiment_is_a_true_binary_one,_in_the_sense_of_physical_possibility."_Jaynes__does_not_specifically_discuss_Jeffreys_prior_Beta(1/2,1/2)_(Jaynes_discussion_of_"Jeffreys_prior"_on_pp. 181,_423_and_on_chapter_12_of_Jaynes_book_refers_instead_to_the_improper,_un-normalized,_prior_"1/''p'' ''dp''"_introduced_by_Jeffreys_in_the_1939_edition_of_his_book,_seven_years_before_he_introduced_what_is_now_known_as_Jeffreys'_invariant_prior:_the_square_root_of_the_determinant_of_Fisher's_information_matrix._''"1/p"_is_Jeffreys'_(1946)_invariant_prior_for_the_exponential_distribution,_not_for_the_Bernoulli_or_binomial_distributions'')._However,_it_follows_from_the_above_discussion_that_Jeffreys_Beta(1/2,1/2)_prior_represents_a_state_of_knowledge_in_between_the_Haldane_Beta(0,0)_and_Bayes_Beta_(1,1)_prior. Similarly,_Karl_Pearson_in_his_1892_book_The_Grammar_of_Science_(p. 144_of_1900_edition)__maintained_that_the_Bayes_(Beta(1,1)_uniform_prior_was_not_a_complete_ignorance_prior,_and_that_it_should_be_used_when_prior_information_justified_to_"distribute_our_ignorance_equally"".__K._Pearson_wrote:_"Yet_the_only_supposition_that_we_appear_to_have_made_is_this:_that,_knowing_nothing_of_nature,_routine_and_anomy_(from_the_Greek_ανομία,_namely:_a-_"without",_and_nomos_"law")_are_to_be_considered_as_equally_likely_to_occur.__Now_we_were_not_really_justified_in_making_even_this_assumption,_for_it_involves_a_knowledge_that_we_do_not_possess_regarding_nature.__We_use_our_''experience''_of_the_constitution_and_action_of_coins_in_general_to_assert_that_heads_and_tails_are_equally_probable,_but_we_have_no_right_to_assert_before_experience_that,_as_we_know_nothing_of_nature,_routine_and_breach_are_equally_probable._In_our_ignorance_we_ought_to_consider_before_experience_that_nature_may_consist_of_all_routines,_all_anomies_(normlessness),_or_a_mixture_of_the_two_in_any_proportion_whatever,_and_that_all_such_are_equally_probable._Which_of_these_constitutions_after_experience_is_the_most_probable_must_clearly_depend_on_what_that_experience_has_been_like." If_there_is_sufficient_Sample_(statistics), sampling_data,_''and_the_posterior_probability_mode_is_not_located_at_one_of_the_extremes_of_the_domain''_(x=0_or_x=1),_the_three_priors_of_Bayes_(Beta(1,1)),_Jeffreys_(Beta(1/2,1/2))_and_Haldane_(Beta(0,0))_should_yield_similar_posterior_probability, ''posterior''_probability_densities.__Otherwise,_as_Gelman_et_al._(p. 65)_point_out,_"if_so_few_data_are_available_that_the_choice_of_noninformative_prior_distribution_makes_a_difference,_one_should_put_relevant_information_into_the_prior_distribution",_or_as_Berger_(p. 125)_points_out_"when_different_reasonable_priors_yield_substantially_different_answers,_can_it_be_right_to_state_that_there_''is''_a_single_answer?_Would_it_not_be_better_to_admit_that_there_is_scientific_uncertainty,_with_the_conclusion_depending_on_prior_beliefs?."

_Occurrence_and_applications

_Order_statistics

The_beta_distribution_has_an_important_application_in_the_theory_of_order_statistics._A_basic_result_is_that_the_distribution_of_the_''k''th_smallest_of_a_sample_of_size_''n''_from_a_continuous_Uniform_distribution_(continuous), uniform_distribution_has_a_beta_distribution.David,_H._A.,_Nagaraja,_H._N._(2003)_''Order_Statistics''_(3rd_Edition)._Wiley,_New_Jersey_pp_458.__This_result_is_summarized_as: :

U__\sim_\operatorname(k,n+1-k).

From_this,_and_application_of_the_theory_related_to_the_probability_integral_transform,_the_distribution_of_any_individual_order_statistic_from_any_continuous_distribution_can_be_derived.

_Subjective_logic

In_standard_logic,_propositions_are_considered_to_be_either_true_or_false._In_contradistinction,_subjective_logic_assumes_that_humans_cannot_determine_with_absolute_certainty_whether_a_proposition_about_the_real_world_is_absolutely_true_or_false._In_subjective_logic_the_A_posteriori, posteriori_probability_estimates_of_binary_events_can_be_represented_by_beta_distributions.A._Jøsang._A_Logic_for_Uncertain_Probabilities._''International_Journal_of_Uncertainty,_Fuzziness_and_Knowledge-Based_Systems.''_9(3),_pp.279-311,_June_2001
PDF
/ref>

_Wavelet_analysis

A_wavelet_is_a_wave-like_oscillation_with_an_amplitude_that_starts_out_at_zero,_increases,_and_then_decreases_back_to_zero._It_can_typically_be_visualized_as_a_"brief_oscillation"_that_promptly_decays._Wavelets_can_be_used_to_extract_information_from_many_different_kinds_of_data,_including –_but_certainly_not_limited_to –_audio_signals_and_images._Thus,_wavelets_are_purposefully_crafted_to_have_specific_properties_that_make_them_useful_for_signal_processing._Wavelets_are_localized_in_both_time_and_frequency_whereas_the_standard_Fourier_transform_is_only_localized_in_frequency._Therefore,_standard_Fourier_Transforms_are_only_applicable_to_stationary_processes,_while_wavelets_are_applicable_to_non-stationary_processes.__Continuous_wavelets_can_be_constructed_based_on_the_beta_distribution._Beta_waveletsH.M._de_Oliveira_and_G.A.A._Araújo,._Compactly_Supported_One-cyclic_Wavelets_Derived_from_Beta_Distributions._''Journal_of_Communication_and_Information_Systems.''_vol.20,_n.3,_pp.27-33,_2005._can_be_viewed_as_a_soft_variety_of_Haar_wavelets_whose_shape_is_fine-tuned_by_two_shape_parameters_α_and_β.

_Population_genetics

The_Balding–Nichols_model_is_a_two-parameter__parametrization_of_the_beta_distribution_used_in_population_genetics.__It_is_a_statistical_description_of_the_allele_frequencies_in_the_components_of_a_sub-divided_population: :

__\begin
____\alpha_&=_\mu_\nu,\\
____\beta__&=_(1_-_\mu)_\nu,
__\end

where_

\nu_=\alpha+\beta=_\frac

_and_

0_<_F_<_1

;_here_''F''_is_(Wright's)_genetic_distance_between_two_populations.

_Project_management:_task_cost_and_schedule_modeling

The_beta_distribution_can_be_used_to_model_events_which_are_constrained_to_take_place_within_an_interval_defined_by_a_minimum_and_maximum_value._For_this_reason,_the_beta_distribution —_along_with_the_triangular_distribution —_is_used_extensively_in_PERT,_critical_path_method_(CPM),_Joint_Cost_Schedule_Modeling_(JCSM)_and_other_project_management/control_systems_to_describe_the_time_to_completion_and_the_cost_of_a_task._In_project_management,_shorthand_computations_are_widely_used_to_estimate_the_mean_and__standard_deviation_of_the_beta_distribution: :

_\begin
__\mu(X)_&_=_\frac_\\
__\sigma(X)_&_=_\frac
\end

where_''a''_is_the_minimum,_''c''_is_the_maximum,_and_''b''_is_the_most_likely_value_(the_mode_ Mode_(_la,_modus_meaning_"manner,_tune,_measure,_due_measure,_rhythm,_melody")_may_refer_to: __Arts_and_entertainment_ *_''_MO''D''E_(magazine)'',_a_defunct_U.S._women's_fashion_magazine_ *_''Mode''_magazine,_a_fictional_fashion_magazine_which_is__...
_for_''α''_>_1_and_''β''_>_1). The_above_estimate_for_the_mean_

\mu(X)=_\frac

_is_known_as_the_PERT_three-point_estimation_and_it_is_exact_for_either_of_the_following_values_of_''β''_(for_arbitrary_α_within_these_ranges): :''β''_=_''α''_>_1_(symmetric_case)_with__standard_deviation_

\sigma(X)_=_\frac

,_skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_=_0,_and_excess_kurtosis_ In_probability_theory_and_statistics,_kurtosis_(from__el,_κυρτός,_''kyrtos''_or_''kurtos'',_meaning_"curved,_arching")_is_a_measure_of_the_"tailedness"_of_the_probability_distribution_of_a_real-valued_random_variable._Like_skewness,_kurtosi_...
_=_

_\frac

or :''β''_=_6_−_''α''_for_5_>_''α''_>_1_(skewed_case)_with__standard_deviation :

\sigma(X)_=_\frac,

skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_=_

\frac

,_and_excess_kurtosis_ In_probability_theory_and_statistics,_kurtosis_(from__el,_κυρτός,_''kyrtos''_or_''kurtos'',_meaning_"curved,_arching")_is_a_measure_of_the_"tailedness"_of_the_probability_distribution_of_a_real-valued_random_variable._Like_skewness,_kurtosi_...
_=_

\frac_-_3

The_above_estimate_for_the__standard_deviation_''σ''(''X'')_=_(''c''_−_''a'')/6_is_exact_for_either_of_the_following_values_of_''α''_and_''β'': :''α''_=_''β''_=_4_(symmetric)_with_skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_=_0,_and_excess_kurtosis_ In_probability_theory_and_statistics,_kurtosis_(from__el,_κυρτός,_''kyrtos''_or_''kurtos'',_meaning_"curved,_arching")_is_a_measure_of_the_"tailedness"_of_the_probability_distribution_of_a_real-valued_random_variable._Like_skewness,_kurtosi_...
_=_−6/11. :''β''_=_6_−_''α''_and_

\alpha_=_3_-_\sqrt2

_(right-tailed,_positive_skew)_with_skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_

=\frac

\alpha_=_3_+_\sqrt2

_(left-tailed,_negative_skew)_with_skewness_ In_probability_theory_and_statistics,_skewness_is_a_measure_of_the_asymmetry_of_the_probability_distribution_of_a__real-valued_random_variable_about_its_mean._The_skewness_value_can_be_positive,_zero,_negative,_or_undefined. For_a_unimodal__...
_