[013] Mixture Models

As a running example, we will consider the problem of mixture modeling. Let's consider the parametric case first.

Finite Mixture Models

Suppose we have data $x_1, \dots, x_N$ and we want to group them into clusters. Each cluster $j$ has distribution $F(\theta_j)$. (For example, $F$ can be multivariate Gaussian, and $\theta_j$ will be $(\mu_j, \Sigma_j)$.)

Suppose we know the number of clusters $K$. There are 2 ways to model the mixture.

View 1


Let $z_i \in \set{1,\dots,K}$. To sample $x_1, \dots, x_N$,

  • $z_i \sim \Mr{Multi}(\pi) = \Mr{Multi}(\pi_1, \dots, \pi_K)$
  • $x_i \sim F(\theta_{z_i})$

The priors are

  • $\theta_j \sim H(\lambda)$ for some conjugate prior $H$ of $F$
  • $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$. Usually, we set $\alpha_i = \alpha_0 / K$.

View 2


Let $\Theta = \set{\theta_1, \dots, \theta_K}$ be the parameter space for $x$. Let $G$ be a distribution on $\Theta$ defined as

$$\begin{align*} G(\theta) &= \sum_j \pi_j \delta(\theta, \theta_j) \\ &= \pi_j \text{ such that } (\theta = \theta_j) \end{align*}$$

where $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$ and $\theta_j \sim H(\lambda)$.

Let $\bar{\theta}$ be the draws (or the samples) from $G$. Then to sample $x_1, \dots, x_N$, we sample $\bar{\theta}_i \sim G$ and $x_i \sim F(\bar{\theta}_i)$. The connection to View 1 is that $\bar{\theta}_i = \theta_{z_i}$.

Connection with De Finetti's Theorem:

  • $\bar{\theta}_i$ corresponds to $y_i$
  • $\Theta$ corresponds to $Y$
  • The parameter space $\Pi := \set{\text{all possible }\pi\text{'s}}$ corresponds to $\Phi$

It is silly to limit $\Theta$ to the set of $K$ elements. In the next chapter, we will extend $\Theta$ to the set of all possible $\theta$'s (that are compatible with $F$).

Exported: 2016-07-13T01:43:08.754915