Reference
Objective of EM Algorithm
From wikipedia,
In statistics, an expectation–maximization (EM) algorithm is an iterative method to find ML or MAP estimates of parameters in statistical models, where the model depends on unobserved latent variables.
In the statistical model, we have:
- - observed data
- - unobserved latent data or missing value
- - unknown parameters
Then:
Since, we do not observe , we need to marginalize them out.
In its general form, EM algorithm maximizes the marginal likelihood of the problem:
EM Objective
General Idea
Suppose, we want to maximize the following marginal likelihood function:
Similar to variational inference, here, we approximate with a family of lower bounds . Here, are known as variational parameters. In other words, we find the variational lower bounds as follows:
Initialization: Suppose, we start with point in the above figure.
E-step: We want to find the best lower bound from the family of lower bounds that goes through the point .
The red lower bound in the figure below is the best lower bound that goes through the point .
M-step: Once we found the best lower bound at E-step, as shown above, we want to find the point that maximizes the lower bound.
As shown in the figure below, is the new point.
Continue Iteration: As shown in the figure below.
E-step and KL divergence
Now, in E-step, we are trying to approximate the marginal likelihood with a family of distribution . We want to choose a distribution, that minimizes the gap shown in the figure:
Now, if we set Jensen’s lower bound from the above to the approximation function, i.e. , then the gap becomes:
Therefore, to minimize the gap, we will need to be zero, which will only happens when both the distributions are the same. Therefore, we have the following:
E-step
For each ,
M Step
In M-step, we want to find . Now,
Summary
In summary:
We approximate the marginal log-likelihood using a family of lower bound:
In E-step, we find best approximation of the lower bound, by maximizing over this family of lower bound, so that the gap between the marginal log-likelihood and the family of lower bound is minimized:
It turns out, the gap between marginal log-likelihood and the lower bound is equal to the KL divergence between the lower bound and
.
In M-step, we maximize the best approximation of the lower bound to find the best .
Convergence
EM algortihm converges to a local maxima.
In the figure above, at point , we chose the best lower bound curve during E-step. At M-step, we found where the lower bound is maximum. The likelihood at point is which is definitely higher than or equal to the lower bound at that point, i.e. .
That is the likelihood value at step k+1 is higher than the value at step k. That is, at each step, EM algorithm doesn’t lower the likelihood function. This can also be used as debugging tool.
No comments:
Post a Comment