Gaussian Mixture Models EM algorithm use average log likelihood to test convergence

Question

I was investigating scikit-learn's implementation of the EM algorithm for fitting Gaussian Mixture Models and I was wondering how they did come up with using the average log likelihood instead of the sum of the log likelihoods to test convergence.

I see that it should cause the algorithm to converge faster (given their default parameters), but where does that idea come from ?

Does anyone know if they based this part of the implementation on a specific paper or if they just came up with it and used it ?

In most explanations of the EM algorithm I have come across, they would have used log_likelihoods.sum() instead of log_likelihoods.mean().

score 4 · Accepted Answer · answered Jul 01 '16 at 21:36

4

It makes unit testing easier; invariant to the size of the sample.

Reference: the github discussion that led to the change.

answered Jul 01 '16 at 21:36

Emre

10,481
1
29
39

@Dawny33 no hard feeling, but I still don't understand why you rejected the edit. I just believe that what I'm saying states the same thing as Emre but with clearer. :) – eliasah Jul 03 '16 at 12:33
@Emre what do you think ? I think your answer is great but it just need some clarifications. – eliasah Jul 03 '16 at 12:34
@eliasah many of your edits are putting lots of words in the answerer's mouth. They may or may not be right; they shouldn't be an edit. Write your own answer, or comment. – Sean Owen Jul 03 '16 at 13:11

Gaussian Mixture Models EM algorithm use average log likelihood to test convergence

1 Answers1