probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, the Chinese restaurant process is a

discrete-time In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled. Discrete time Discrete time views values of variables as occurring at distinct, separate "po ...

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...

, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time ''n'', the ''n'' customers have been partitioned among ''m'' ≤ ''n'' tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final

distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations * Probability distribution, the probability of a particular value or value range of a vari ...

. This property greatly simplifies a number of problems in

population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...

linguistic analysis In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013). All acad ...

, and

image recognition Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

. David J. Aldous attributes the restaurant analogy to

Jim Pitman Jim or JIM may refer to: * Jim (given name), a given name * Jim, a diminutive form of the given name James (given name), James * Jim, a short form of the given name Jimmy (given name), Jimmy * OPCW-UN Joint Investigative Mechanism * Jim (comics), ...

and

Lester Dubins Lester Dubins (April 27, 1920 – February 11, 2010) was an American mathematician noted primarily for his research in probability theory. He was a faculty member at the University of California at Berkeley from 1962 through 2004, and in retireme ...

in his 1983 book.

Formal definition

For any positive integer

n

, let

\mathcal_

denote the set of all partitions of the set

\ \triangleq /math>. The Chinese restaurant process takes values in the infinite Cartesian product \prod_ \mathcal_.

The value of the process at time n is a partition B_n of the set /math>, whose probability distribution is determined as follows.  At time n=1, the trivial partition B_1 = \ is obtained (with probability one). At time n+1 the element " n+1 " is either:
# added to one of the blocks of the partition B_n, where each block is chosen with probability, b, /(n+1) where, b, is the size of the block (i.e. number of elements), or 
# added to the partition B_n as a new singleton block, with probability 1/(n+1) .
The random partition so generated has some special properties. It is ''exchangeable'' in the sense that relabeling \ does not change the distribution of the partition, and it is ''consistent'' in the sense that the law of the partition of -1 /math> obtained by removing the element n from the random partition B_n is the same as the law of the random partition B_.

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

: \Pr(B_n = B) = \frac, \qquad B \in \mathcal_where b is a block in the partition B and, b, is the size of b .

The definition can be generalized by introducing a parameter \theta>0 which modifies the probability of the new customer sitting at a new table to \frac and correspondingly modifies the probability of them sitting at a table of size, b, to \frac . The vanilla process introduced above can be recovered by setting \theta=1 . Intuitively, \theta can be interpreted as the effective number of customers sitting at the first empty table.

Distribution

The Chinese restaurant table distribution (CRT) is the

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

on the number of tables in the Chinese restaurant process. It can be understood as the sum of

n

independent

Bernoulli Bernoulli can refer to: People *Bernoulli family of 17th and 18th century Swiss mathematicians: ** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle **Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...

random variables, each with a different bias: :

b_n & \sim \operatorname \left( \frac \theta \right) \end

The probability mass function of

L

is given by :

f(\ell) = \frac , s(m,\ell),  \theta^\ell

where

s

denotes

Stirling numbers of the first kind In mathematics, especially in combinatorics, Stirling numbers of the first kind arise in the study of permutations. In particular, the Stirling numbers of the first kind count permutations according to their number of cycles (counting fixed poin ...

Generalization

This construction can be generalized to a model with two parameters,

\theta

\alpha

, commonly called the ''strength'' (or ''concentration'') and ''discount'' parameters respectively. At time

n+1

, the next customer to arrive finds

, B,

occupied tables and decides to sit at an empty table with probability :

\frac,

or at an occupied table

b

of size

, b,

with probability :

\frac.

In order for the construction to define a valid

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...

it is necessary to suppose that either

\alpha<0

and

\theta = -L\alpha

for some

L \in \

; or that

0\leq\alpha<1

and

\theta>-\alpha

. Under this model the probability assigned to any particular partition

B

/math>, in terms of the

Pochhammer k-symbol In the mathematical theory of special functions, the Pochhammer ''k''-symbol and the ''k''-gamma function, introduced by Rafael Díaz and Eddy Pariguan are generalizations of the Pochhammer symbol and gamma function. They differ from the Pochhamme ...

, is :

\Pr(B_n = B) = 
    \frac 
    \prod_(1-\alpha)_

where, by convention,

(a)_ = 1

, and for

b > 0

(a)_ = \prod_^(a+ic) = 
\begin
a^b & \textc = 0, \\ \\
\dfrac & \text. 
\end

Thus, for the case when

\theta > 0

the partition probability can be expressed in terms of the

Gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers except ...

as :

\Pr(B_n = B) =\frac\dfrac\prod_\dfrac.

In the one-parameter case, where

\alpha

is zero, this simplifies to :

\Pr(B_n = B) = \frac\prod_ \Gamma(, b, ).

Or, when

\theta

is zero, :

\Pr(B_n = B) =\frac\prod_ \frac.

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction. If

\alpha=0

, the probability distribution of the random partition of the integer

n

thus generated is the

Ewens distribution In population genetics, Ewens's sampling formula, describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample (statistics), sample. Definition Ewens's sampling formula, introd ...

with parameter

\theta

, used in

and the

unified neutral theory of biodiversity The unified neutral theory of biodiversity and biogeography (here "Unified Theory" or "UNTB") is a theory and the title of a monograph by ecologist Stephen P. Hubbell. It aims to explain the diversity and relative abundance of species in ecolog ...

Derivation

Here is one way to derive this partition probability. Let

C_i

be the random block into which the number

i

is added, for

i =1,2,3,...

. Then :

\Pr(C_i = c\mid C_1,\ldots,C_)
 = \begin
\dfrac & \textc \in \text, \\  \\
\dfrac & \textc\in b;
\end

The probability that

B_n

is any particular partition of the set

\

is the product of these probabilities as

i

runs from

1

n

. Now consider the size of block

b

: it increases by one each time we add one element into it. When the last element in block

b

is to be added in, the block size is

, b, -1

. For example, consider this sequence of choices: (generate a new block

b

)(join

b

)(join

b

)(join

b

). In the end, block

b

has 4 elements and the product of the numerators in the above equation gets

\theta\cdot 1\cdot 2\cdot 3

. Following this logic, we obtain

\Pr(B_n = B)

as above.

Expected number of tables

For the one parameter case, with

\alpha=0

and

0<\theta<\infty

, the number of tables is distributed according to the

chinese restaurant table distribution In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. C ...

. The expected value of this random variable, given that there are

n

seated customers, is :

\begin
\sum_^n \frac \theta  = \theta \cdot (\Psi(\theta+n) -  \Psi(\theta))
\end

where

\Psi(\theta)

is the

digamma function In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function: :\psi(x)=\frac\ln\big(\Gamma(x)\big)=\frac\sim\ln-\frac. It is the first of the polygamma functions. It is strictly increasing and strictly ...

. In the general case (

\alpha>0

) the expected number of occupied tables is :

\begin
\frac - \frac \theta \alpha,
\end

however, note that the

\Gamma(\cdot)

function here is not the standard gamma function.

The Indian buffet process

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the

Indian buffet process In the mathematical theory of probability, the Indian buffet process (IBP) is a stochastic process defining a probability distribution over sparse binary matrices with a finite number of rows and an infinite number of columns. This distribution ...

and can be used to infer latent features in data.Griffiths, T.L. and Ghahramani, Z. (2005
Infinite Latent Feature Models and the Indian Buffet Process
Gatsby Unit Technical Report GCNU-TR-2005-001.

Applications

The Chinese restaurant process is closely connected to

Dirichlet process In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a pro ...

es and Pólya's urn scheme, and therefore useful in applications of

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

including

nonparametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being dist ...

Bayesian methods Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and e ...

. The Generalized Chinese Restaurant Process is closely related to

Pitman–Yor process In probability theory, a Pitman–Yor process denoted PY(''d'', ''θ'', ''G''0), is a stochastic process whose sample path is a probability distribution. A random sample from this process is an infinite discrete probability distribution ...

. These processes have been used in many applications, including modeling text, clustering biological

microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...

data, biodiversity modelling, and image reconstruction

References

External links

Introduction to the Dirichlet Distribution and Related Processes by Frigyik, Kapila and Gupta
* A talk by

Michael I. Jordan Michael Irwin Jordan (born February 25, 1956) is an American scientist, professor at the University of California, Berkeley and researcher in machine learning, statistics, and artificial intelligence. Jordan was elected a member of the Nation ...

on the CRP: ** http://videolectures.net/icml05_jordan_dpcrp/ {{DEFAULTSORT:Chinese Restaurant Process Stochastic processes Nonparametric Bayesian statistics

Formal definition

Distribution

Generalization

Derivation

Expected number of tables

The Indian buffet process

Applications

See also

References

External links