How do I initialize a Hidden Markov Model when using MFCC features for speech recognition?

Asked May 02 '21 at 01:55

Active May 02 '21 at 17:02

Viewed 26 times

I have a personal dataset of 10000 audio files, each consisting a single spoken sentence. These files each have the transcribed text labels with them that I can use for supervised HMM training.

Now that I have have extracted MFCC features, how do I input these vectors of MFCC sequences for each format into the HMM? After reviewing multiple source materials, my head tells me to initialize the $N \times N$ input transition matrix with all of the vectors, but how does this segment out the starts and ends of each spoken sentence sequence? I'm also unsure how to assign number of states when the files each vary in word length.

This is my personal understanding:

$N \times N$ transition probability matrix – which I believe are for words (language model).
$1 \times N$ start & end probability matrices – which I also believe are $0$th order Markovian per word.
The type of underlying distribution(s) used for each state.
Please correct me if my perspective is wrong.

edited May 02 '21 at 17:02

The Pointer

asked May 02 '21 at 01:55

Zander

How do I initialize a Hidden Markov Model when using MFCC features for speech recognition?

0 Answers0