[023] DP: Chinese Restaurant Process Viewpoint

Infinite Mixture Models

Let's return to the mixture models.

$\pi$
$N$
$K$
$\alpha$
$K$
$z_i$
$x_i$
$\theta_j$
$\lambda$
$G$
$N$
$H$
$\alpha$
$\bar\theta_i$
$x_i$

The relationship between the 2 models is $\bar\theta_i$ = $\theta_{z_i}$.

In the model on the right,

  • $\Theta = \set{\theta_1, \dots, \theta_K}$
  • $G(\theta) = \sum_{j=1}^K\pi_j\delta(\theta,\theta_j)$
  • $\bar\theta_i \sim G$

We have extended the model on the right to the infinite space $\Theta$:

  • $\Theta = \set{\text{all possible }\theta\text{'s}}$
  • $G(\theta) = \sum_{j=1}^\infty\pi_j\delta(\theta,\theta_j)$
  • $\bar\theta_i \sim G$

Since viewing the clusters in terms of $z_i$ is useful, let's convert back to the model on the left. Our plan is to show a way to sample $z_i$'s and argue that the predictive probability distribution matches the one from Dirichlet process.

Chinese Restaurant Process

Pitman and Dubins named this process after seeing super huge Chinese restaurants in San Francisco's Chinatown.

Definition (Chinese Restaurant Process) Imagine an infinite number of tables labeled $1, 2, \dots$. Customers come in one by one.

  • Customer #1 sits at table 1.
  • Customer #$i$ sits at:
    • an occupied table $j$ with probability $\propto$ the number of people in table $j$, or
    • the first unoccupied table with probability $\propto \alpha$
  • If any customer sits at an empty table $j$, the customer orders food $\theta_j \sim H$ to be shared in the table.

If customer #$i$ sits at table $j$, we let $z_i = j$ and $\bar\theta_i = \theta_j$.

1
$\displaystyle\frac{\alpha}{\alpha}$
...
2
$\displaystyle\frac{1}{1+\alpha}$
$\displaystyle\frac{\alpha}{1+\alpha}$
...
1
3
$\displaystyle\frac{1}{2+\alpha}$
$\displaystyle\frac{1}{2+\alpha}$
$\displaystyle\frac{\alpha}{2+\alpha}$
...
1
2
4
$\displaystyle\frac{1}{3+\alpha}$
$\displaystyle\frac{2}{3+\alpha}$
$\displaystyle\frac{\alpha}{3+\alpha}$
...
1
2
3
$\vdots$
N+1
$\displaystyle\frac{N_j}{N+\alpha}$
$\displaystyle\frac{\alpha}{N+\alpha}$
...
occupied table
empty table

Let $N_j$ be the number of people at table $j$. For customer #($n+1$), we have $$\begin{align*} \prob{\text{choose table }j} &= \frac{N_j}{\alpha + N} \\ \prob{\text{choose a new table}} &= \frac{\alpha}{\alpha + N} \end{align*}$$ which matches the predictive probability from Dirichlet process.

Interestingly, CRP also shows us that $z_i$ are exchangeable; i.e., we can shuffle the order of the customers. For example,

1
2
3
...
1
2
3
4
5
6

The probability of the configuration above is $$\frac{\alpha}{\alpha} \cdot\frac{1}{\alpha+1} \cdot\frac{\alpha}{\alpha+2} \cdot\frac{1}{\alpha+3} \cdot\frac{2}{\alpha+4} \cdot\frac{\alpha}{\alpha+5}$$ If we shuffle the customer and get a new configuration:

1
2
3
...
1
5
2
3
6
4

The probability of the new configuration is $$\frac{\alpha}{\alpha} \cdot\frac{\alpha}{\alpha+1} \cdot\frac{1}{\alpha+2} \cdot\frac{\alpha}{\alpha+3} \cdot\frac{1}{\alpha+4} \cdot\frac{2}{\alpha+5}$$ We can see that the two probabilities are equal.

Finally, we get a graphical model

$\pi$
$N$
$\infty$
$\alpha$
$z_i$
$x_i$
$\theta_j$
$\lambda$

To sample $x_i$'s:

  1. $\pi \sim \Mr{GEM}(\alpha)$
  2. $z_i \sim \pi$
  3. $\theta_j \sim H(\lambda)$
  4. $x_i \sim F(\theta_{z_i})$

Alternatively, the first 3 steps can be sampled using CRP.

Exported: 2016-07-13T01:46:56.425984