In probability theory and statistics, the

Dirichlet process In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a pro ...

(DP) is one of the most popular Bayesian nonparametric models. It was introduced by Thomas Ferguson as a prior over probability distributions. A

\mathrm\left(s,G_0\right)

is completely defined by its parameters:

G_0

(the ''base distribution'' or ''base measure'') is an arbitrary distribution and

s

(the ''

concentration parameter In probability theory and statistics, a concentration parameter is a special kind of numerical parameter of a parametric family of probability distributions. Concentration parameters occur in two kinds of distribution: In the Von Mises–Fisher ...

'') is a positive real number (it is often denoted as

\alpha

). According to the Bayesian paradigm these parameters should be chosen based on the available prior information on the domain. The question is: how should we choose the prior parameters

\left(s,G_0\right)

of the DP, in particular the infinite dimensional one

G_0

, in case of lack of prior information? To address this issue, the only prior that has been proposed so far is the limiting DP obtained for

s\rightarrow 0

, which has been introduced under the name of Bayesian bootstrap by Rubin;Rubin D (1981). The Bayesian bootstrap. Ann. Stat. 9 130–134 in fact it can be proven that the Bayesian bootstrap is asymptotically equivalent to the frequentist bootstrap introduced by Bradley Efron.Efron B (1979). Bootstrap methods: Another look at the jackknife. Ann. Stat. 7 1–26 The limiting Dirichlet process

s\rightarrow 0

has been criticized on diverse grounds. From an a-priori point of view, the main criticism is that taking

s\rightarrow 0

is far from leading to a noninformative prior. Moreover, a-posteriori, it assigns zero probability to any set that does not include the observations. The imprecise Dirichlet process has been proposed to overcome these issues. The basic idea is to fix

s > 0

but do not choose any precise base measure

G_0

. More precisely, the imprecise Dirichlet process (IDP) is defined as follows: :

~~\mathrm:~\left\

where

\mathbb

is the set of all probability measures. In other words, the IDP is the set of all Dirichlet processes (with a fixed

s > 0

) obtained by letting the base measure

G_0

to span the set of all probability measures.

Inferences with the Imprecise Dirichlet Process

Let

P

a probability distribution on

(\mathbb,\mathcal)

(here

\mathbb

is a standard Borel space with Borel

\sigma

-field

\mathcal

) and assume that

P\sim \mathrm(s,G_0)

. Then consider a real-valued bounded function

f

defined on

(\mathbb,\mathcal)

. It is well known that the expectation of

E /math> with respect to the Dirichlet process  is

: \mathcal E(f) \mathcal\left int f \, dP\right \int f \,d\mathcal = \int f \, dG_0. One of the most remarkable properties of the DP priors is that the posterior distribution of P is again a DP.
Let X_1,\dots,X_n be an independent and identically distributed sample from P and P \sim Dp(s,G_0), then the posterior distribution of P given the observations is

: P\mid X_1,\dots,X_n \sim Dp\left(s+n, G_n\right),~~~ \text~~~~~~ G_n=\frac G_0+ \frac \sum\limits_^n \delta_, where \delta_is an atomic probability measure (Dirac's delta) centered at X_i . Hence,  it follows
that \mathcal (f)\mid X_1,\dots,X_n \int f \, dG_n. Therefore, for any fixed G_0, we can exploit the previous equations to derive prior and posterior expectations.

In the IDP G_0 can span the set of all distributions \mathbb . This implies that we will get a different prior and posterior expectation of E(f) for any choice of G_0 . A way to characterize inferences for the IDP is by computing lower and upper bounds for the expectation of E(f) w.r.t. G_0 \in \mathbb .
A-priori these bounds are:
 
: \underline (f) \inf\limits_  \int f \,dG_0=\inf f, ~~~~\overline (f) \sup\limits_  \int f \,dG_0=\sup f, the lower (upper) bound is obtained by a probability measure that puts all the mass on the infimum (supremum) of f, i.e., G_0=\delta_with X_0=\arg \inf f (or respectively with X_0=\arg \sup f). From the above expressions of the lower and upper bounds, it can be observed that the range of \mathcal (f) /math>  under the IDP is the same as the original

range Range may refer to: Geography * Range (geographic), a chain of hills or mountains; a somewhat linear, complex mountainous or hilly area (cordillera, sierra) ** Mountain range, a group of mountains bordered by lowlands * Range, a term used to i ...

f

. In other words, by specifying the IDP, we are not giving any prior information on the value of the expectation of

f

. A-priori, IDP is therefore a model of prior (near)-ignorance for

E(f)

. A-posteriori, IDP can learn from data. The posterior lower and upper bounds for the expectation of

E(f)

are in fact given by: :

=\sup\limits_ \int f \, dG_n= \frac \sup f+ \int f(X) \frac \sum\limits_^n \delta_(dX) \\ & =\frac \sup f+ \frac \frac. \end

It can be observed that the posterior inferences do not depend on

G_0

. To define the IDP, the modeler has only to choose

s

(the concentration parameter). This explains the meaning of the adjective ''near'' in prior near-ignorance, because the IDP requires by the modeller the elicitation of a parameter. However, this is a simple elicitation problem for a nonparametric prior, since we only have to choose the value of a positive scalar (there are not infinitely many parameters left in the IDP model). Finally, observe that for

n \rightarrow \infty

, IDP satisfies :

\rightarrow S(f),

where

S(f)=\lim_ \tfrac\sum_^n f(X_i)

. In other words, the IDP is consistent.

Choice of the prior strength $s$

The IDP is completely specified by

s

, which is the only parameter left in the prior model. Since the value of

s

determines how quickly lower and upper posterior expectations converge at the increase of the number of observations,

s

can be chosen so to match a certain convergence rate. The parameter

s

can also be chosen to have some desirable frequentist properties (e.g., credible intervals to be calibrated frequentist intervals, hypothesis tests to be calibrated for the Type I error, etc.), see Example: median test

Example: estimate of the cumulative distribution

Let

X_1,\dots, X_n

be i.i.d. real random variables with

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

F(x)

. Since

F(x)=E mathbb_/math>, where \mathbb_is the

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...

, we can use IDP to derive inferences about

F(x).

The lower and upper posterior mean of

F(x)

are :

\\ = & \frac \frac =\frac\hat(x), \\

2pt PT, Pt, or pt may refer to: Arts and entertainment * ''P.T.'' (video game), acronym for ''Playable Teaser'', a short video game released to promote the cancelled video game ''Silent Hills'' * Porcupine Tree, a British progressive rock group ...

& \overline\left (x)\mid X_1,\dots,X_n\right= \overline \left (\mathbb_)\mid X_1,\dots,X_n\right\\ = & \frac+ \frac \frac = \frac+ \frac \hat(x). \end where

\hat(x)

is the

empirical distribution function In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...

. Here, to obtain the lower we have exploited the fact that

\inf \mathbb_=0

and for the upper that

\sup \mathbb_=1

. Note that, for any precise choice of

G_0

(e.g., normal distribution

\mathcal(x;0,1)

), the posterior expectation of

F(x)

will be included between the lower and upper bound.

Example: median test

IDP can also be used for hypothesis testing, for instance to test the hypothesis

F(0)<0.5

, i.e., the median of

F

is greater than zero. By considering the partition

(-\infty,0],(0,\infty)

and the property of the Dirichlet process, it can be shown that the posterior distribution of

F(0)

is :

F(0) \sim \mathrm(\alpha_0+n_,\beta_0+n-n_)

where

n_

is the number of observations that are less than zero, :

\alpha_0=s\int_^0 dG_0

and

\beta_0=s\int_0^\infty dG_0.

By exploiting this property, it follows that :

= \int\limits_0^ \mathrm(\theta;s+n_,n-n_)d\theta=I_(s+n_,n-n_),

\int\limits_0^ \mathrm(\theta;n_,s+n-n_)d\theta=I_(n_,s+n-n_).

where

I_(\alpha,\beta)

is the

regularized incomplete beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^(1 ...

. We can thus perform the hypothesis test :

1-\gamma,

(with

1-\gamma=0.95

for instance) and then # if both the inequalities are satisfied we can declare that

F(0)<0.5

with probability larger than

1-\gamma

; # if only one of the inequality is satisfied (which has necessarily to be the one for the upper), we are in an indeterminate situation, i.e., we cannot decide; # if both are not satisfied, we can declare that the probability that

F(0)<0.5

is lower than the desired probability of

1-\gamma

. IDP returns an indeterminate decision when the decision is prior dependent (that is when it would depend on the choice of

G_0

). By exploiting the relationship between the

of the

Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...

, and the

of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

''Z'' from a

binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...

, where the "probability of success" is ''p'' and the sample size is ''n'': :

F(k;n,p) = \Pr(Z \le k) = I_(n-k, k+1) = 1 - I_p(k+1,n-k),

we can show that the median test derived with th IDP for any choice of

s\geq 1

encompasses the one-sided frequentist sign test as a test for the median. It can in fact be verified that for

s= 1

the

p

-value of the sign test is equal to

1-\underline (0)<0.5\mid X_1,\dots,X_n /math>. Thus, if \underline (0)<0.5\mid X_1,\dots,X_n 0.95 then the p -value is less than 0.05 and, thus, they two tests have the same power.

Applications of the Imprecise Dirichlet Process

Dirichlet processes are frequently used in Bayesian nonparametric statistics. The Imprecise Dirichlet Process can be employed instead of the Dirichlet processes in any application in which prior information is lacking (it is therefore important to model this state of prior ignorance). In this respect, the Imprecise Dirichlet Process has been used for nonparametric hypothesis testing, se
the Imprecise Dirichlet Process statistical package
Based on the Imprecise Dirichlet Process, Bayesian nonparametric near-ignorance versions of the following classical nonparametric estimators have been derived: the Wilcoxon rank sum test and the Wilcoxon signed-rank test. A Bayesian nonparametric near-ignorance model presents several advantages with respect to a traditional approach to hypothesis testing. # The Bayesian approach allows us to formulate the hypothesis test as a decision problem. This means that we can verify the evidence in favor of the null hypothesis and not only rejecting it and take decisions which minimize the expected loss. #Because of the nonparametric prior near-ignorance, IDP based tests allows us to start the hypothesis test with very weak prior assumptions, much in the direction of letting data speak for themselves. #Although the IDP test shares several similarities with a standard Bayesian approach, at the same time it embodies a significant change of paradigm when it comes to take decisions. In fact the IDP based tests have the advantage of producing an indeterminate outcome when the decision is prior-dependent. In other words, the IDP test suspends the judgment when the option which minimizes the expected loss changes depending on the Dirichlet Process base measure we focus on. #It has been empirically verified that when the IDP test is indeterminate, the frequentist tests are virtually behaving as random guessers. This surprising result has practical consequences in hypothesis testing. Assume that we are trying to compare the effects of two medical treatments (Y is better than X) and that, given the available data, the IDP test is indeterminate. In such a situation the frequentist test always issues a determinate response (for instance I can tell that Y is better than X), but it turns out that its response is completely random, like if we were tossing of a coin. On the other side, the IDP test acknowledges the impossibility of making a decision in these cases. Thus, by saying "I do not know", the IDP test provides a richer information to the analyst. The analyst could for instance use this information to collect more data.

Categorical variables

For

categorical variable In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...

s, i.e., when

\mathbb

has a finite number of elements, it is known that the Dirichlet process reduces to a

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \boldsymb ...

. In this case, the Imprecise Dirichlet Process reduces to the Imprecise Dirichlet model proposed by Walley as a model for prior (near)-ignorance for chances.

References

{{reflist

External links

Open source implementation of hypothesis tests based on the IDPThe imprecise probability group at IDSIA
Nonparametric Bayesian statistics

Inferences with the Imprecise Dirichlet Process

Choice of the prior strength s

Example: estimate of the cumulative distribution

Example: median test

Applications of the Imprecise Dirichlet Process

Categorical variables

See also

References

External links

Choice of the prior strength $s$