Questions tagged [expectation-maximization]

An optimization algorithm often used for maximum-likelihood estimation in the presence of missing data.

The expectation maximization (EM) algorithm is an optimization algorithm. It is an instance of a broader class of optimization algorithms known as Majorization-Minimization (MM). It is a cornerstone of statistics since it particularly suitable for "skipping" local maxima which often arise in likelihood maximization problems, especially in the presence of missing data.

More specifically, the EM algorithm is an iterative method for finding maximum likelihood estimates. The typical form of the EM algorithm works as follows:

  • Expectation Step: Compute the expected value of the log-likelihood function based on the current estimate.
  • Maximization Step: Update the estimate by maximizing the result of the expectation step.
23 questions
4
votes
1 answer

Can Expectation Maximization estimate truth and confusion matrix from multiple noisy sources?

Suppose we have $m$ sources, each of which noisily observe the same set of $n$ independent events from the outcome set $\{A,B,C\}$. Each source has a confusion matrix, for example for source $i$: $$C_i = \begin{bmatrix} 0.98 & 0.01 & 0.07 \\ 0.01 &…
3
votes
1 answer

Why Gaussian mixture model uses Expectation maximization instead of Gradient descent?

Why Gaussian mixture model uses Expectation maximization instead of Gradient descent? What other models uses Expectation maximization to find best optimal parameters instead of using gradient descent?
star
  • 1,411
  • 7
  • 18
  • 29
3
votes
1 answer

Using EM (Expectation Maximization) algorithm for Training Logistic Regression

Is it possible to learn the weights for a logistic regression classifier using EM (Expectation Maximization)algorithm? Is there any instance reference?
3
votes
1 answer

How to compare the performance of different number of mixing components for EM algorithm?

I am reading about the EM (Expectation-Maximization) algorithm in a machine learning book. At the end remark of the chapter, the authors mentioned that we cannot decide the "optimality" of the number of components (# of mixtures of Gaussians…
2
votes
0 answers

Which latent variable model is better to find hidden variable?

Currently, I am exploring the concept of latent variable for regression type datasets. I have gone through literature of few of the methods and models used in finding latent variable, like: EM algorithms, Partial least square regression, Latent…
2
votes
0 answers

Does feature normalization improve performance of Hidden Markov Models?

For training a Hidden Markov Model (HMM) on a multivariate, continuous time series, is it preferable to scale the data somehow? Some pre-processing steps may be: Normalize to 0-mean and unit-variance Scale to [-1, 1] interval Scale to [0, 1]…
2
votes
1 answer

Statistical machine translation word alignment for FR-ENG and ENG-FR: what is p(e) and p(f)?

I'm currently trying to implement this paper, but am struggling to understand some of the math here. I'm pretty sure I understand how to implement the E-step, but for the M-step, I'm confused on how to compute the M-step. It says just before section…
2
votes
1 answer

Math of Gaussian Mixture Models & EM Algorithm

I'm having a hard time understanding the math behind GMMs. A GMM is a weighted sum over K different Gaussian components with parameters $\mu_k, \sigma_k, \pi_k$ From my understanding, the general overview is: Initialize random parameters for each…
1
vote
0 answers

How to derive Evidence Lower Bound in the paper "Zero-Shot Text-to-Image Generation"?

Can someone share the derivation of Evidence Lower Bound in this paper ? Zero-Shot Text-to-Image Generation The overall procedure can be viewed as maximizing the evidence lower bound (ELB) (Kingma & Welling, 2013; Rezende et al., 2014) on the joint…
1
vote
0 answers

Active learning with mixture model cluster assignments - am I injecting bias here?

Suppose I have a dataset of people's phone numbers and heights, and I'm interested in learning the parameters $p_{girl}$, $p_{boy}=1-p_{girl}$, $\mu_{boy}$, $\mu_{girl}$, and overall $\sigma$ governing the distribution of peoples' heights. I don't…
1
vote
1 answer

How to find the feature regions where each label is the most expected when using decision trees?

Given a decision tree for classification for example this one: What is the way to find the feature domain (petal and sepal width and length) where a sample would most likely occur in the feature space for each class? It is clear here that for…
1
vote
0 answers

E-step for EM algorithm for document clustering

I have a code for the E-step in the EM algorithm for Document Clustering in the version of hard-EM algorithm. I'm trying to implement the E-step for soft-EM algorithm. Here is my code for Hard-EM: E.step <- function(gamma, model, counts){ N <-…
t.wangd
  • 11
  • 2
1
vote
0 answers

N-Gram Linear Smoothing

In slide 61 of the NLP text, to smooth out the n-gram probabilities, we need to find the lambdas the miximazies a probability to held-out set given in terms of M(λ1, λ2, ...λ_k). What does this notation mean please? Also, it says that "One way is to…
chikitin
  • 153
  • 6
1
vote
1 answer

Gaussian Mixture Models Clustering

When using the EM algorithm in Gaussian Mixture Models (GMM), in the E-step, we take each x set in the training dataset to calculate and update the "weight" and parameters of each Gaussian distribution of the clusters (M-step). I have read that we…
1
vote
0 answers

Code or Package to cluster sequences (or time series) of different lengths based on HMM?

Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers? 1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997):…
1
2