I am reading about the EM (Expectation-Maximization) algorithm in a machine learning book. At the end remark of the chapter, the authors mentioned that we cannot decide the "optimality" of the number of components (# of mixtures of Gaussians distributions) based on each model's log likelihood at the end--since models with more parameters will inevitably describe the data better.
Therefore, my questions are
1) How do we compare the performance of each model using a different number of components?
2) What are the important factors that help us to decide that an EM model is sufficient for modeling the observed data?