[024] DP: Collapsed Gibbs Sampling

Inference

Given $x_1, \dots, x_N$, we would like to infer $z_1, \dots, z_N$. We can try

$$\prob{\Mb{z}\mid\Mb{x}} = \frac{\prob{\Mb{x}\mid\Mb{z}}\prob{\Mb{z}}}{\prob{\Mb{x}}}$$

However, enumerate all possible cluster configurations is intractable. So we need to approximate, and one way is to use MCMC.

In particular, we will use Gibbs sampling. The idea is to resample one $z_i$ at a time. To sample $z_i$ in the Chinese restaurant process viewpoint, move customer #$i$ to be the last customer (can do so because of exchangeability), and let the customer choose a table.

Collapsed Gibbs Sampling

$\pi$
$N$
$\infty$
$\alpha$
$z_i$
$x_i$
$\theta_j$
$\lambda$

Input Current $\Mb{z}^{(t-1)}$ in Markov chain

Output $\Mb{z}^{(t)}$

  1. Pick a random permutation $\tau$ of $\set{1,\dots,N}$
  2. For each $i\in\set{\tau(1),\dots,\tau(N)}$, resample $z_i$ as follows.

    a. Find the likelihoods: $$\begin{align*} f_j(x_i) &= \prob{x_i\mid z_i = j, \text{other }x, \text{other }z,\lambda} \\ &= \prob{x_i\mid \text{other }x\text{ in table }j, \lambda} \\ &= \prob{x_i\mid \lambda^*} \\ f_\text{new}(x_i) &= \prob{x_i\mid z_i = \text{new cluster},\text{other }x, \text{other }z,\lambda} \\ &= \prob{x_i\mid \lambda}\end{align*}$$ b. Sample new $z_i$ from the multinomial distribution: $$\begin{align*} \prob{z_i = j} &\propto \crab{N_j\text{ not counting }z_i} f_j(x_i) \\ \prob{z_i = \text{new cluster}} &\propto \alpha f_\text{new}(x_i) \end{align*}$$ c. Clean up empty clusters and establish new clusters (by sampling new $\theta_j$)

  3. Return the new $\Mb{z}$

Notes

  • To calculate $\prob{x_i \mid \lambda}$, use $$\int_\Theta \prob{x_i\mid\theta}\prob{\theta\mid\lambda}\,d\theta$$ Since we have collapsed $\theta$ out, the algorithm is called collapsed Gibbs sampling.
  • We can convert $\prob{x_i\mid \text{other }x\text{ in table }j, \lambda}$ to $\prob{x_i\mid \lambda^*}$ for some $\lambda^*$ if $H$ is a conjugate prior of $F$.
  • We may also want to resample $\alpha$. For example, we can use the prior $\alpha \sim \Mr{Gamma}(a,b)$ and resample $\alpha^{(t)}$ independent from $\alpha^{(t-1)}$.
Exported: 2016-07-13T01:47:06.083426