Grouped Dirichlet Distribution
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the grouped Dirichlet distribution (GDD) is a multivariate generalization of the
Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \bold ...
It was first described by Ng et al. 2008. The Grouped Dirichlet distribution arises in the analysis of
categorical data In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...
where some observations could fall into any of a set of other 'crisp' category. For example, one may have a data set consisting of cases and controls under two different conditions. With complete data, the cross-classification of disease status forms a 2(case/control)-x-(condition/no-condition) table with cell probabilities If, however, the data includes, say, non-respondents which are known to be controls or cases, then the cross-classification of disease status forms a 2-x-3 table. The probability of the last column is the sum of the probabilities of the first two columns in each row, e.g. The GDD allows the full estimation of the cell probabilities under such aggregation conditions.


Probability Distribution

Consider the closed simplex set \mathcal_n=\left\ and \mathbf\in\mathcal_n. Writing \mathbf_=\left(x_1,\ldots,x_\right) for the first n-1 elements of a member of \mathcal_n, the distribution of \mathbf for two partitions has a density function given by : \operatorname_\left(\left.\mathbf_\\mathbf,\mathbf\right)= \frac where \operatorname\left(\mathbf\right) is the Multivariate beta function. Ng et al. went on to define an ''m'' partition grouped Dirichlet distribution with density of \mathbf_ given by : \operatorname_\left(\left.\mathbf_\\mathbf,\mathbf\right) = c_m^\cdot \left(\prod_^n x_i^\right)\cdot \prod_^m\left(\sum_^x_k\right)^ where \mathbf = \left(s_1,\ldots,s_m\right) is a vector of integers with 0=s_0. The normalizing constant given by : c_m=\left\\cdot \operatorname\left(b_1+\sum_^a_k,\ldots,b_m+\sum_^a_k\right) The authors went on to use these distributions in the context of three different applications in medical science.


References

{{Reflist Multivariate continuous distributions Conjugate prior distributions Exponential family distributions Continuous distributions