Shafi's ML Blog: GMM using EM

GMM using EM

Gaussian Mixture Model (GMM)

Similar to K-means clustering, however with a probabilistic flavor. That is, each point can belongs to more than one cluster, which is described by a probability distribution.

enter image description here

In the figure above, we will cluster the points using a mixture of 3 Gaussian distribution of the form $\mathcal{N}(\mu_k, \Sigma_k)$ , $k=1,2,3$ . The proportions of the Gaussian are given by $\{\pi_1, \pi_2, \pi_3\}$ . Likelihood is defined as follows:
$\begin{eqnarray} \prod_{i=1}^N p(X|\theta)=\pi_1 \mathcal{N}(x_i|\mu_1, \Sigma_1)+\pi_2 \mathcal{N}(x_i|\mu_2, \Sigma_2)+\pi_3 \mathcal{N}(x_i|\mu_3, \Sigma_3) \end{eqnarray}$
We want to maximize the likelihood $\prod_{i=1}^N p(X|\theta)$ , where $\theta =\{\mu_k, \Sigma_k, \pi_k\}$ , $k=1,2,3$ . We have the following constraint:
$\Sigma_k\succeq0,~~~\pi_k\ge 0,~~~ \sum_k\pi_k=1$

Due to the above constraints, it is difficult to solve the problem using methods like gradient descent. So, instead, we will use EM algorithm to solve this problem.
We will convert the above likelihood function to a marginal likelihood function by introducing latent variable $z_i\in\{1,2,3\}$ for $i=1,\ldots,N$ . The marginal likelihood of GMM in this example as follows:
$\begin{eqnarray} &&\text{marginal log likelihood}\\ &=&\sum_{i=1}^N\log p(x_i|\theta)\\ &=&\sum_{i=1}^N\log \sum_{z_i}p(x_i,z_i|\theta)\\ &=&\sum_{i=1}^N\log \sum_{k=1}^3 p(x_i|z_i=k,\theta)p(z_i=k|\theta)\\ \end{eqnarray}$

Note that, here, we have a total of $3\times N$ probability values $p(z_i=k|\theta)$ associated with the latent variable $z_i$ . However, GMM is a special case in that these mixture weights are same for all data points. So, we will only have 3 values. This is in contrast to method like PLSA, where the mixture weights are different for each data points. See here for GMM vs. PLSA. In addition, in GMM, $p(x_i|z_i=k,\theta)$ is represented by Gaussian.

GMM: Mixture weigths are same among all data points. Also $p(x_i|z_i=k,\theta)$ is Gaussian.
$p(z_i=k|\theta)=\pi_k, ~~\text{for}~~i=1,\ldots,N$
$p(x_i|z_i=k,\theta)\sim\mathcal{N}(\mu_k, \Sigma_k)$

E-step

For each $i$ ,
$\text{Set } q(z_i) = p(z_i|x_i,\theta)$

Using Bayes Rule,
$p(z_i=k|x_i,\theta)=\dfrac{p(x_i|z_i=k,\theta)p(z_i=k|\theta)}{\sum_{k=1}^3 p(x_i|z_i=k,\theta)p(z_i=k|\theta)}=\dfrac{p(x_i|z_i=k,\theta)\pi_k}{\sum_{k=1}^3 p(x_i|z_i=k,\theta)\pi_k}$

M-step

$\begin{eqnarray} &&\max _{\boldsymbol {\theta}} \sum_{i=1}^N \sum_{z_i} {q(z_i)}\log p(x_i,z_i|\theta)\\ &=&\max _{\boldsymbol {\theta}} \sum_{i=1}^N \sum_{k=1}^3 {q(z_i=k)}\log p(x_i,z_i=k|\theta)\\ &=&\max _{\boldsymbol {\theta}} \sum_{i=1}^N \sum_{k=1}^3 {q(z_i=k)}\log [p(x_i|z_i=k,\theta) p(z_i=k|\theta)] \end{eqnarray}$
Using the assumptions specific for GMM, we can re-write the above expression as follows for 1-D input data points:
$\begin{eqnarray} &&\max _{\boldsymbol {\theta}} \sum_{i=1}^N \sum_{k=1}^3 {q(z_i=k)}\log \left[\dfrac{\exp(-(x_i-\mu_k)^2/2\sigma_k^2)}{Z} \pi_k\right]\\ &=&\max _{\boldsymbol {\theta}} \sum_{i=1}^N \sum_{k=1}^3 {q(z_i=k)} \left[\log\dfrac{\pi_k}{Z} -\dfrac{(x_i-\mu_k)^2}{2\sigma_k^2}\right] \end{eqnarray}$
Now to determine $\theta =\{\mu_k, \Sigma_k, \pi_k\}$ , we set derivative to zero.
$\begin{eqnarray} \dfrac{\partial...}{\partial \mu_k}=\sum_{i=1}^N {q(z_i=k)}\dfrac{(x_i-\mu_k)}{\sigma_k^2}=0\\ \mu_k=\dfrac{\sum_{i=1}^N q(z_i=k) x_i}{\sum_{i=1}^N q(z_i=k)} \end{eqnarray}$
Similarly,
$\begin{eqnarray} \sigma_k=\dfrac{\sum_{i=1}^N q(z_i=k) (x_i-\mu_k)^2}{\sum_{i=1}^N q(z_i=k)} \end{eqnarray}$
Solving $\pi_k$ is similar, however it also includes the constraint, $\sum_{k=1}^3 \pi_k=1$ and $\pi_k\ge0$ .
$\pi_k=\dfrac{\sum_{i=1}^N q(z_i=k) }{N}$

Summary

Initialize $\theta^{(0)} =\{\mu_k^{(0)}, \sigma_k^{(0)}, \pi_k^{(0)}\}$

For Steps m=1, 2, … ,Do the following:

E-step: For $i=1,\ldots,N$
$q^{(m)}(z_i=k)=\dfrac{ \mathcal{N}(x_i|\mu_k^{(m)}, \sigma_k^{(m)})\pi_k^{(m)}}{\sum_{k=1}^3 \mathcal{N}(x_i|\mu_k^{(m)}, \sigma_k^{(m)})\pi_k^{(m)}}$

M-step:
$\mu_k^{(m+1)}=\dfrac{\sum_{i=1}^N q^{(m)}(z_i=k) x_i}{\sum_{i=1}^N q^{(m)}(z_i=k)}$
$\sigma_k^{(m+1)}=\dfrac{\sum_{i=1}^N q^{(m)}(z_i=k) (x_i-\mu_k^{(m)})^2}{\sum_{i=1}^N q^{(m)}(z_i=k)}$
$\pi_k^{(m+1)}=\dfrac{\sum_{i=1}^N q^{(m)}(z_i=k) }{N}$

Shafi's ML Blog

Thursday, December 7, 2017

GMM using EM

Gaussian Mixture Model (GMM)

E-step

M-step

Summary

No comments:

Post a Comment