# Infinite Mixture Models

Let's return to the mixture models.

The relationship between the 2 models is $\bar\theta_i$ = $\theta_{z_i}$.

In the model on the right,

- $\Theta = \set{\theta_1, \dots, \theta_K}$
- $G(\theta) = \sum_{j=1}^K\pi_j\delta(\theta,\theta_j)$
- $\bar\theta_i \sim G$

We have extended the model on the right to the infinite space $\Theta$:

- $\Theta = \set{\text{all possible }\theta\text{'s}}$
- $G(\theta) = \sum_{j=1}^\infty\pi_j\delta(\theta,\theta_j)$
- $\bar\theta_i \sim G$

Since viewing the clusters in terms of $z_i$ is useful, let's convert back to the model on the left. Our plan is to show a way to sample $z_i$'s and argue that the predictive probability distribution matches the one from Dirichlet process.

# Chinese Restaurant Process

Pitman and Dubins named this process after seeing super huge Chinese restaurants in San Francisco's Chinatown.

** Definition** (Chinese Restaurant Process) Imagine an infinite number of tables labeled $1, 2, \dots$. Customers come in one by one.

- Customer #1 sits at table 1.
- Customer #$i$ sits at:
- an occupied table $j$ with probability $\propto$ the number of people in table $j$, or
- the first unoccupied table with probability $\propto \alpha$

- If any customer sits at an empty table $j$, the customer orders food $\theta_j \sim H$ to be shared in the table.

If customer #$i$ sits at table $j$, we let $z_i = j$ and $\bar\theta_i = \theta_j$.

Let $N_j$ be the number of people at table $j$. For customer #($n+1$), we have $$\begin{align*} \prob{\text{choose table }j} &= \frac{N_j}{\alpha + N} \\ \prob{\text{choose a new table}} &= \frac{\alpha}{\alpha + N} \end{align*}$$ which matches the predictive probability from Dirichlet process.

Interestingly, CRP also shows us that $z_i$ are exchangeable; i.e., we can shuffle the order of the customers. For example,

The probability of the configuration above is $$\frac{\alpha}{\alpha} \cdot\frac{1}{\alpha+1} \cdot\frac{\alpha}{\alpha+2} \cdot\frac{1}{\alpha+3} \cdot\frac{2}{\alpha+4} \cdot\frac{\alpha}{\alpha+5}$$ If we shuffle the customer and get a new configuration:

The probability of the new configuration is $$\frac{\alpha}{\alpha} \cdot\frac{\alpha}{\alpha+1} \cdot\frac{1}{\alpha+2} \cdot\frac{\alpha}{\alpha+3} \cdot\frac{1}{\alpha+4} \cdot\frac{2}{\alpha+5}$$ We can see that the two probabilities are equal.

Finally, we get a graphical model

To sample $x_i$'s:

- $\pi \sim \Mr{GEM}(\alpha)$
- $z_i \sim \pi$
- $\theta_j \sim H(\lambda)$
- $x_i \sim F(\theta_{z_i})$

Alternatively, the first 3 steps can be sampled using CRP.