Tuesday, December 5, 2017

Expectation Maximization (EM) Algorithm

Expectation Maximization (EM) Algorithm

Reference

Objective of EM Algorithm

From wikipedia,

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find ML or MAP estimates of parameters in statistical models, where the model depends on unobserved latent variables.

In the statistical model, we have:

  • - observed data
  • - unobserved latent data or missing value
  • - unknown parameters

Then:

Since, we do not observe , we need to marginalize them out.

In its general form, EM algorithm maximizes the marginal likelihood of the problem:

EM Objective

General Idea

Suppose, we want to maximize the following marginal likelihood function:

enter image description here

Similar to variational inference, here, we approximate with a family of lower bounds . Here, are known as variational parameters. In other words, we find the variational lower bounds as follows:

Initialization: Suppose, we start with point in the above figure.

enter image description here

E-step: We want to find the best lower bound from the family of lower bounds that goes through the point .

The red lower bound in the figure below is the best lower bound that goes through the point .

enter image description here

M-step: Once we found the best lower bound at E-step, as shown above, we want to find the point that maximizes the lower bound.

As shown in the figure below, is the new point.

enter image description here

Continue Iteration: As shown in the figure below.

enter image description here

E-step and KL divergence

Now, in E-step, we are trying to approximate the marginal likelihood with a family of distribution . We want to choose a distribution, that minimizes the gap shown in the figure:

enter image description here

Now, if we set Jensen’s lower bound from the above to the approximation function, i.e. , then the gap becomes:

Therefore, to minimize the gap, we will need to be zero, which will only happens when both the distributions are the same. Therefore, we have the following:

E-step
For each ,

M Step

In M-step, we want to find . Now,

Summary

In summary:
We approximate the marginal log-likelihood using a family of lower bound:

In E-step, we find best approximation of the lower bound, by maximizing over this family of lower bound, so that the gap between the marginal log-likelihood and the family of lower bound is minimized:

It turns out, the gap between marginal log-likelihood and the lower bound is equal to the KL divergence between the lower bound and
.

In M-step, we maximize the best approximation of the lower bound to find the best .

Convergence

EM algortihm converges to a local maxima.

enter image description here

In the figure above, at point , we chose the best lower bound curve during E-step. At M-step, we found where the lower bound is maximum. The likelihood at point is which is definitely higher than or equal to the lower bound at that point, i.e. .

That is the likelihood value at step k+1 is higher than the value at step k. That is, at each step, EM algorithm doesn’t lower the likelihood function. This can also be used as debugging tool.

No comments:

Post a Comment