**Gibbs Sampling**, **Metropolis-Hastings** and **Particle Filtering** (**Sequential Monte Carlo**) are sampling-based methods for approximating distributions (joint or marginal) that are difficult to compute directly.

# Particle

All methods above use the concept of **particle**, which just means a complete assignment of all model variables. For example:

$X_1$ | $X_2$ | $X_3$ | |
---|---|---|---|

Particle 1 | 1 | 0 | 0 |

Particle 2 | 1 | 0 | 1 |

Particle 3 | 0 | 1 | 1 |

Particle 4 | 0 | 1 | 0 |

... | ... | ... | ... |

If the particles are sampled from some distribution, then we can compute the **empirical distribution** as an approximate of the true distribution.

# MCMC Methods

**Markov Chain Monte Carlo** (**MCMC**) algorithms are methods for randomly sampling particles from a complicated distribution based on Markov chains.

## Gibbs Sampling

**Gibbs sampling** is a popular MCMC method. It is used when the joint probability $\prob{X_1,\dots,X_n}$ is complicated but the conditional probability $\prob{X_i\mid X_\text{other}}$ is easy to compute.

- Initialize a random sample $(X_1=v_1,\dots,X_n=v_n)$
- Until convergence:

- Choose a variable $X_i$
- Fix the values of other variables, and then compute $$w(v) = \prob{X_i=v\mid X_{\text{other}} = v_{\text{other}}}$$ for all possible values $v$ of $X_i$.
- Replace $x \gets x[X_i = v]$ where $v$ is sampled from the distribution $w$.

Some variants of Gibbs sampling include:

**Block Gibbs sampling**: Choose multiple variables at the same time. The function $w$ becomes $$w(v_1,\dots,v_r) = \prob{X_{i_1}=v_1,\dots,X_{i_r}=v_r\mid X_{\text{other}} = v_{\text{other}}}$$**Collapsed Gibbs sampling**: Instead of using all other variables in $X_\text{other}$, marginalize out some variables. This is only a good idea when marginalization is easy to compute.

## Metropolis-Hastings

The other popular MCMC algorithm is the **Metropolis-Hastings** algorithm, which allows us to sample from an *unnormalized* distribution $f$.

- Initialize a random sample $x$
- Define a
proposal density$Q(x_\text{next}\mid x_\text{prev})$. One constraint is that $Q$ must be symmetric: $Q(x\mid y) = Q(y\mid x)$. For example, use the Gaussian distribution around $x_\text{prev}$.- Until bored:

- Sample $x'$ from $Q(\cdot\mid x)$
- Compute the
acceptance ratio$$\alpha = \frac{f(x')}{f(x)}$$- If $\alpha \geq 1$, then immediately accept $x'$. Otherwise, accept $x'$ with probability $\alpha$.
- If $x'$ is accepted, use $x'$ for the next iteration. Otherwise, keep using $x$.

The acceptance ratio $\alpha$ indicates how likely $x'$ is wrt. the old sample $x$.

## Problems with MCMC

Although (irreducible) MCMC chains are guaranteed to converge to the true distribution, MCMC methods usually have 2 disadvantages:

- The samples are correlated. In Metropolis-Hastings, this can be fixed by occasionally changing the proposal density $Q$ so that $Q(x_\text{far-away}\mid x)$ has more probability.
- During the
**burn-in**period (1000 first samples or so — no good theoretical guarantee), the samples may not be related to the target distribution at all. So throw away the burn-in samples.

# Particle Filtering (Sequential Monte Carlo)

For simplicity, let's consider only the chain-structured graph:

We first review **beam search**:

- Initialize $C = \set{\emptyset}$
- For each $i$:

Extend: $$C \gets \set{x_{1:i-1} \cup \set{X_i = x_i} : x_{1:i-1} \in C, \text{all possible } x_i}$$Prune$C$ to the top $K$ partial assignments with the highest weights.

There are 2 problems with beam search:

**Extend**: we need to consider all possible $v$, which is expensive.**Prune**: the set of top $K$ partial assignments is usually not diverse.

**Particle filtering** solves both problems. Let us define:

**Proposal distribution**$\pi_i(x_i\mid x_{1:i-1})$ (similar to Metropolis-Hastings), which is the guess of the value of $X_i$ given the previous values. For HMM, we usually use the transition probability $$\pi_i(x_i\mid x_{1:i-1}) = \prob{x_i\mid x_{i-1}}$$**Reweighting distribution**: $$w_i(x_{1:i}) = \frac{\prob{x_{1:i}}/\prob{x_{1:i-1}}}{\pi_i(x_i\mid x_{1:i-1})}$$ where the numerator is the potential that get multiplied when reach $X_i$. This reweighting function is defined to correct the "guessed" information $\pi_i$. For HMM, $$w_i(x_{1:i}) = \frac{\prob{x_i\mid x_{i-1}}\prob{e_i\mid x_i}}{\prob{x_i\mid x_{i-1}}} = \prob{e_i\mid x_i}$$

The algorithm is as follows:

- Initialize $C = \set{\emptyset}$
- For each $i$:

Extend: $$C \gets \set{x_{1:i-1} \cup \set{X_i = x_i} : x_{1:i-1} \in C, x_i \sim \pi(x_i\mid x_{1:i-1})}$$ Only sample one $x_i$ for each element in $C$. (Note that multiple elements $x_{1:i-1}$ in $C$ can share the same value.)Reweight: Compute weight $w_i(x_{1:i})$ for each $x_{1:i}$ in $C$Resample: $C \gets$ sample $K$ elements independently from $C$ according to the computed weight. Note that we can get repeated samples.

Particle filtering can also be viewed as a recursive version of **importance sampling**.