Representing output label for character level speec recognition using RNN

Asked May 29 '18 at 19:19

Active May 29 '18 at 22:36

Viewed 53 times

I saw this tutorial on generating text using LSTM. In this tutorial the author trained the network by taking 100 previous characters as input and the next character as the output label.

I am interested to try some simple speech recognition using LSTM. I may use mfcc features of the audio signal as input data, but what's confusing me most is how to represent the output label.

The dataset I have is the VCTK corpus which contains sentence level audio recording and its transcription.

In the tutorial, next character that comes after the input vector was used as output label. But for speech it's impractical to know which part of speech produced which character without transcribing the audio for every second. So, how would I represent the output labels for this problem?

edited May 29 '18 at 22:36

Stephen Rauch

1,783
11
21
34

asked May 29 '18 at 19:19

Jahir Islam

Representing output label for character level speec recognition using RNN

0 Answers0