MCMC / Particle Filtering

Gibbs Sampling, Metropolis-Hastings and Particle Filtering (Sequential Monte Carlo) are sampling-based methods for approximating distributions (joint or marginal) that are difficult to compute directly.

# Particle

All methods above use the concept of particle, which just means a complete assignment of all model variables. For example:

$X_1$ $X_2$ $X_3$
Particle 1 1 0 0
Particle 2 1 0 1
Particle 3 0 1 1
Particle 4 0 1 0
... ... ... ...

If the particles are sampled from some distribution, then we can compute the empirical distribution as an approximate of the true distribution.

# MCMC Methods

Markov Chain Monte Carlo (MCMC) algorithms are methods for randomly sampling particles from a complicated distribution based on Markov chains.

## Gibbs Sampling

Gibbs sampling is a popular MCMC method. It is used when the joint probability $\prob{X_1,\dots,X_n}$ is complicated but the conditional probability $\prob{X_i\mid X_\text{other}}$ is easy to compute.

• Initialize a random sample $(X_1=v_1,\dots,X_n=v_n)$
• Until convergence:
• Choose a variable $X_i$
• Fix the values of other variables, and then compute $$w(v) = \prob{X_i=v\mid X_{\text{other}} = v_{\text{other}}}$$ for all possible values $v$ of $X_i$.
• Replace $x \gets x[X_i = v]$ where $v$ is sampled from the distribution $w$.

Some variants of Gibbs sampling include:

• Block Gibbs sampling: Choose multiple variables at the same time. The function $w$ becomes $$w(v_1,\dots,v_r) = \prob{X_{i_1}=v_1,\dots,X_{i_r}=v_r\mid X_{\text{other}} = v_{\text{other}}}$$
• Collapsed Gibbs sampling: Instead of using all other variables in $X_\text{other}$, marginalize out some variables. This is only a good idea when marginalization is easy to compute.

## Metropolis-Hastings

The other popular MCMC algorithm is the Metropolis-Hastings algorithm, which allows us to sample from an unnormalized distribution $f$.

• Initialize a random sample $x$
• Define a proposal density $Q(x_\text{next}\mid x_\text{prev})$. One constraint is that $Q$ must be symmetric: $Q(x\mid y) = Q(y\mid x)$. For example, use the Gaussian distribution around $x_\text{prev}$.
• Until bored:
• Sample $x'$ from $Q(\cdot\mid x)$
• Compute the acceptance ratio $$\alpha = \frac{f(x')}{f(x)}$$
• If $\alpha \geq 1$, then immediately accept $x'$. Otherwise, accept $x'$ with probability $\alpha$.
• If $x'$ is accepted, use $x'$ for the next iteration. Otherwise, keep using $x$.

The acceptance ratio $\alpha$ indicates how likely $x'$ is wrt. the old sample $x$.

## Problems with MCMC

Although (irreducible) MCMC chains are guaranteed to converge to the true distribution, MCMC methods usually have 2 disadvantages:

• The samples are correlated. In Metropolis-Hastings, this can be fixed by occasionally changing the proposal density $Q$ so that $Q(x_\text{far-away}\mid x)$ has more probability.
• During the burn-in period (1000 first samples or so — no good theoretical guarantee), the samples may not be related to the target distribution at all. So throw away the burn-in samples.

# Particle Filtering (Sequential Monte Carlo)

For simplicity, let's consider only the chain-structured graph:

We first review beam search:

• Initialize $C = \set{\emptyset}$
• For each $i$:
• Extend: $$C \gets \set{x_{1:i-1} \cup \set{X_i = x_i} : x_{1:i-1} \in C, \text{all possible } x_i}$$
• Prune $C$ to the top $K$ partial assignments with the highest weights.

There are 2 problems with beam search:

• Extend: we need to consider all possible $v$, which is expensive.
• Prune: the set of top $K$ partial assignments is usually not diverse.

Particle filtering solves both problems. Let us define:

• Proposal distribution $\pi_i(x_i\mid x_{1:i-1})$ (similar to Metropolis-Hastings), which is the guess of the value of $X_i$ given the previous values. For HMM, we usually use the transition probability $$\pi_i(x_i\mid x_{1:i-1}) = \prob{x_i\mid x_{i-1}}$$
• Reweighting distribution: $$w_i(x_{1:i}) = \frac{\prob{x_{1:i}}/\prob{x_{1:i-1}}}{\pi_i(x_i\mid x_{1:i-1})}$$ where the numerator is the potential that get multiplied when reach $X_i$. This reweighting function is defined to correct the "guessed" information $\pi_i$. For HMM, $$w_i(x_{1:i}) = \frac{\prob{x_i\mid x_{i-1}}\prob{e_i\mid x_i}}{\prob{x_i\mid x_{i-1}}} = \prob{e_i\mid x_i}$$

The algorithm is as follows:

• Initialize $C = \set{\emptyset}$
• For each $i$:
• Extend: $$C \gets \set{x_{1:i-1} \cup \set{X_i = x_i} : x_{1:i-1} \in C, x_i \sim \pi(x_i\mid x_{1:i-1})}$$ Only sample one $x_i$ for each element in $C$. (Note that multiple elements $x_{1:i-1}$ in $C$ can share the same value.)
• Reweight: Compute weight $w_i(x_{1:i})$ for each $x_{1:i}$ in $C$
• Resample: $C \gets$ sample $K$ elements independently from $C$ according to the computed weight. Note that we can get repeated samples.

Particle filtering can also be viewed as a recursive version of importance sampling.

# Reference

Exported: 2016-07-13T01:48:55.863294