As a running example, we will consider the problem of mixture modeling. Let's consider the parametric case first.

# Finite Mixture Models

Suppose we have data $x_1, \dots, x_N$ and we want to group them into clusters. Each cluster $j$ has distribution $F(\theta_j)$. (For example, $F$ can be multivariate Gaussian, and $\theta_j$ will be $(\mu_j, \Sigma_j)$.)

Suppose we know the number of clusters $K$. There are 2 ways to model the mixture.

# View 1

Let $z_i \in \set{1,\dots,K}$. To sample $x_1, \dots, x_N$,

- $z_i \sim \Mr{Multi}(\pi) = \Mr{Multi}(\pi_1, \dots, \pi_K)$
- $x_i \sim F(\theta_{z_i})$

The priors are

- $\theta_j \sim H(\lambda)$ for some conjugate prior $H$ of $F$
- $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$. Usually, we set $\alpha_i = \alpha_0 / K$.

# View 2

Let $\Theta = \set{\theta_1, \dots, \theta_K}$ be the parameter space for $x$. Let $G$ be a distribution on $\Theta$ defined as

$$\begin{align*} G(\theta) &= \sum_j \pi_j \delta(\theta, \theta_j) \\ &= \pi_j \text{ such that } (\theta = \theta_j) \end{align*}$$

where $\pi \sim \Mr{Dir}(\alpha_1, \dots, \alpha_K)$ and $\theta_j \sim H(\lambda)$.

Let $\bar{\theta}$ be the draws (or the samples) from $G$. Then to sample $x_1, \dots, x_N$, we sample $\bar{\theta}_i \sim G$ and $x_i \sim F(\bar{\theta}_i)$. The connection to View 1 is that $\bar{\theta}_i = \theta_{z_i}$.

**Connection with De Finetti's Theorem:**

- $\bar{\theta}_i$ corresponds to $y_i$
- $\Theta$ corresponds to $Y$
- The parameter space $\Pi := \set{\text{all possible }\pi\text{'s}}$ corresponds to $\Phi$

It is silly to limit $\Theta$ to the set of $K$ elements. In the next chapter, we will extend $\Theta$ to the set of all possible $\theta$'s (that are compatible with $F$).