# Exchangeability

In many applications, the **order** of observations does not matter. (For example, training instances can be permuted without changing the probability distribution; words in the bag-of-words model can be permuted)

Formally,

** Definition** Random variables $(y_1, \dots, y_N)$ are

**exchangeable**if for all permutations $\tau$, $$\prob{y_1, \dots, y_N} = \prob{y_{\tau(1)}, \dots, y_{\tau(N)}}$$

** Definition** Sequence $\{y_i\}_\infty$ is

**infinitely exchangeable**if any finite subsequence is exchangeable.

# De Finetti's Theorem

If $y_1, y_2, \dots$ are exchangeable, then the probability distribution $\prob{y_1, y_2, \dots}$ can be described succinctly due to the following theorem.

** Theorem** (De Finetti) If $\{y_i\}_\infty$ is infinitely exchangeable where $y_i \in Y$, then there exists some

**parameter space**$\Phi$ and density function $\prob{\phi}$ such that for any $N$ observations, $$\prob{y_1, \dots, y_N} = \int_\Phi \prob{\phi}\prod_{i=1}^N\prob{y_i\mid\phi}\,d\phi$$

In other words, we can always draw a graphical model:

If $Y$ is finite, then $\Phi$ has finite dimension, and we can **parameterize**. (For example, if $Y$ is Bernoulli, then we only need 1 parameter $\phi\in[0,1]$).

However, if $Y$ is infinite, $\Phi$ has infinite dimension and cannot be parameterized by a finite amount of parameters.

This is the motivation for using **nonparametric methods** which do not assume the (finite) number of parameters.