In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
machine learning, the hierarchical Dirichlet process (HDP) is a
nonparametric Bayesian approach to clustering
grouped data.
It uses a
Dirichlet process for each group of data, with the Dirichlet processes for all groups sharing a base distribution which is itself drawn from a Dirichlet process. This method allows groups to share statistical strength via sharing of clusters across groups. The base distribution being drawn from a Dirichlet process is important, because draws from a Dirichlet process are atomic probability measures, and the atoms will appear in all group-level Dirichlet processes. Since each atom corresponds to a cluster, clusters are shared across all groups. It was developed by
Yee Whye Teh
Yee-Whye Teh is a professor of statistical machine learning in the Department of Statistics, University of Oxford. Prior to 2012 he was a reader at the Gatsby Charitable Foundation computational neuroscience unit at University College London. Hi ...
,
Michael I. Jordan,
Matthew J. Beal and
David Blei and published in the ''
Journal of the American Statistical Association'' in 2006,
as a formalization and generalization of the
infinite hidden Markov model
Infinite may refer to:
Mathematics
*Infinite set, a set that is not a finite set
*Infinity, an abstract concept describing something without any limit
Music
*Infinite (group), a South Korean boy band
*''Infinite'' (EP), debut EP of American mu ...
published in 2002.
Model
This model description is sourced from.
The HDP is a model for grouped data. What this means is that the data items come in multiple distinct groups. For example, in a
topic model words are organized into documents, with each document formed by a bag (group) of words (data items). Indexing groups by
, suppose each group consist of data items
.
The HDP is parameterized by a base distribution
that governs the a priori distribution over data items, and a number of concentration parameters that govern the a priori number of clusters and amount of sharing across groups. The
th group is associated with a random probability measure
which has distribution given by a Dirichlet process:
:
where
is the concentration parameter associated with the group, and
is the base distribution shared across all groups. In turn, the common base distribution is Dirichlet process distributed:
:
with concentration parameter
and base distribution
. Finally, to relate the Dirichlet processes back with the observed data, each data item
is associated with a latent parameter
:
:
The first line states that each parameter has a prior distribution given by
, while the second line states that each data item has a distribution
parameterized by its associated parameter. The resulting model above is called a HDP mixture model, with the HDP referring to the hierarchically linked set of Dirichlet processes, and the mixture model referring to the way the Dirichlet processes are related to the data items.
To understand how the HDP implements a clustering model, and how clusters become shared across groups, recall that draws from a
Dirichlet process are atomic probability measures with probability one. This means that the common base distribution
has a form which can be written as:
:
where there are an infinite number of atoms,
, assuming that the overall base distribution
has infinite support. Each atom is associated with a mass
. The masses have to sum to one since
is a probability measure. Since
is itself the base distribution for the group specific Dirichlet processes, each
will have atoms given by the atoms of
, and can itself be written in the form:
:
Thus the set of atoms is shared across all groups, with each group having its own group-specific atom masses. Relating this representation back to the observed data, we see that each data item is described by a mixture model:
:
where the atoms
play the role of the mixture component parameters, while the masses
play the role of the mixing proportions. In conclusion, each group of data is modeled using a mixture model, with mixture components shared across all groups but mixing proportions being group-specific. In clustering terms, we can interpret each mixture component as modeling a cluster of data items, with clusters shared across all groups, and each group, having its own mixing proportions, composed of different combinations of clusters.
Applications
The HDP mixture model is a natural nonparametric generalization of
Latent Dirichlet allocation, where the number of topics can be unbounded and learnt from data.
Here each group is a document consisting of a bag of words, each cluster is a topic, and each document is a mixture of topics. The HDP is also a core component of the
infinite hidden Markov model
Infinite may refer to:
Mathematics
*Infinite set, a set that is not a finite set
*Infinity, an abstract concept describing something without any limit
Music
*Infinite (group), a South Korean boy band
*''Infinite'' (EP), debut EP of American mu ...
,
[Beal, M.J., Ghahramani, Z. and Rasmussen, C.E. (2002).]
"The infinite hidden Markov model"
(PDF). Advances in Neural Information Processing Systems 14:577–585. Cambridge, MA: MIT Press. which is a nonparametric generalization of the
hidden Markov model allowing the number of states to be unbounded and learnt from data.
[Fox, Emily B., et al. "A sticky HDP-HMM with application to speaker diarization." The Annals of Applied Statistics (2011): 1020-1056.]
Generalizations
The HDP can be generalized in a number of directions. The Dirichlet processes can be replaced by
Pitman-Yor processes and
Gamma processes, resulting in the
Hierarchical Pitman-Yor process
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
and Hierarchical Gamma process. The hierarchy can be deeper, with multiple levels of groups arranged in a hierarchy. Such an arrangement has been exploited in the
sequence memoizer
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called th ...
, a Bayesian nonparametric model for sequences which has a multi-level hierarchy of Pitman-Yor processes. In addition, Bayesian Multi-Domain Learning (BMDL) model derives domain-dependent latent representations of overdispersed count data based on hierarchical negative binomial factorization for accurate cancer subtyping even if the number of samples for a specific cancer type is small.
[Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X]
"Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data"
(PDF). 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.
See also
*
Chinese Restaurant Process
References
{{Scholia, topic
Stochastic processes
Nonparametric Bayesian statistics