Highest Voted 'speech-to-text' Questions - Data Science Stack Exchange

4

votes

1 answer

Why are observation probabilities modelled as Gaussian distributions in HMM?

HMM is a statistical model with unobserved (i.e. hidden) states used for recognition algorithms (speech, handwriting, gesture, ...). What distinguishes DHMM form CHMM is the transition probability matrix P with elements. In CHMM, state space of…

asked Aug 09 '18 at 18:03

Roberto Pierson

41
1

4

votes

2 answers

How to convert a mel spectrogram to log-scaled mel spectrogram

I was reading this paper on environmental noise discrimination using Convolution Neural Networks and wanted to reproduce their results. They convert WAV files into log-scaled mel spectrograms. How do you do this? I am able to convert a WAV file to a…

python speech-to-text

asked Feb 09 '18 at 06:20

Ajay H

222
1
3
9

3

votes

1 answer

How to double audio dataset?

I am trying to develop a mispronunciation detection model for English speech. I use TIMIT dataset, this is phoneme labeled audio dataset. A phoneme is any of the perceptually distinct units of sound. So, my dataset looks like an audio file and…

neural-network dataset machine-learning-model audio-recognition speech-to-text

asked May 20 '21 at 08:29

Abylay Omar

31
2

3

votes

1 answer

ASR on low dataset

I am doing an ASR(automatic speech recognition) as master thesis on low key dataset. Voice and text data is labelled. There are around 4000 phrases and around 5 hours speech. I don't have background in speech or signal processing. How huge would be…

machine-learning deep-learning preprocessing speech-to-text

asked Dec 27 '19 at 05:49

Naveen Gabriel

141
7

2

votes

2 answers

How to evaluate the quality of speech-to-text data without access to the true labels?

I am dealing with a data set of transcribed call center data, where customers are being recorded when interacting with the agent. This is then automatically transcribed by an external transcription system. I want to automatically assess the quality…

nlp text-mining transformer speech-to-text

asked Jan 24 '21 at 01:12

miri_h_ds

21
3

2

votes

0 answers

How to do phoneme segmentation using dynamic time warping?

Background Information: Dynamic Time Warping (DTW): In time series analysis, dynamic time warping (DTW) is one of the algorithms for measuring similarity between two temporal sequences, which may vary in speed. (Source: Wikipedia) Phoneme…

speech-to-text dynamic-time-warping

asked Jan 08 '21 at 21:23

Sam Kagawa

21
1

2

votes

1 answer

How is an ASR's output compared to ground truth for validation?

I am curious how it is done as I am interested in doing something similar. I have some manually transcribed data that contains tags for multiple speakers. I want to compare how well the out of the box ASRs (Google, AWS Transcribe) are able to…

nlp similarity speech-to-text

asked Oct 20 '20 at 22:16

Samarth

339
1
8

2

votes

2 answers

Creating pronunciation dictionary for ASR

I am working on ASR(automatic speech recoginition) on Somali data as master thesis and now I am stuck with how to create a phonetics or pronunciation dictionary for it. I searched over net and could not find one. I'm not sure how to tackle this.…

machine-learning dataset nlp speech-to-text

asked Feb 12 '20 at 12:20

Naveen Gabriel

141
7

2

votes

1 answer

GMM in speech recoginition using HMM-GMM

I am trying to solve/understand ASR using HMM-GMM. At the abstract level i do understand what's happening but I did not understand how GMM fits into it. My data has 5K hours of speech from single user. I took the above picture from this article. I…

nlp gaussian markov-hidden-model speech-to-text

asked Jan 29 '20 at 15:24

Naveen Gabriel

141
7

2

votes

2 answers

where to start in natural language processing for a language

My native language is a regional language and few people speak it. I have some assignements in a machine learning course and i was thinking about doing some natural languge processing on my native language but i don't know where to start since there…

nlp machine-translation speech-to-text

asked Dec 04 '19 at 11:05

AMAR BESSALAH

21
1

2

votes

0 answers

Representing output label for character level speec recognition using RNN

I saw this tutorial on generating text using LSTM. In this tutorial the author trained the network by taking 100 previous characters as input and the next character as the output label. I am interested to try some simple speech recognition using…

machine-learning rnn speech-to-text

asked May 29 '18 at 19:19

Jahir Islam

121
1

2

votes

1 answer

Validation loss is less than training loss by 5 units. How this result is interpreted?

Iam training a Keras model for end-to-end speech recognition. I have my own dataset of speech containing about 400 wave files. Text transcriptions is also given as input. Model summary…

deep-learning keras speech-to-text

asked Mar 30 '18 at 04:34

ml_user0993

21
2

1

vote

0 answers

How do I initialize a Hidden Markov Model when using MFCC features for speech recognition?

I have a personal dataset of 10000 audio files, each consisting a single spoken sentence. These files each have the transcribed text labels with them that I can use for supervised HMM training. Now that I have have extracted MFCC features, how do I…

machine-learning generative-models speech-to-text markov-hidden-model

asked May 02 '21 at 01:55

Zander

11
2

1

vote

1 answer

How does Wav2Vec 2.0 feed output from Convolutional Feature Encoder as input to the Transformer Context Network

I was reading the Wav2Vec 2.0 paper and trying to understand the model architecture, but I have trouble understanding how audio raw inputs of variable lengths can be fed through the model, especially from the Convolutional Feature Encoder to the…

speech-to-text

asked Apr 14 '21 at 01:23

user116029

11
1

1

vote

1 answer

How to prepare Audio-text data for speech recognition

I have gathered some raw audio from all the conferences, meetings, lectures & casual conversation that I was part of. The machine transcription did not offer good results (from Azure, AWS etc.) I would transcribe it so to have both data+label…

dataset data-cleaning speech-to-text

asked Aug 03 '20 at 06:50

johnyc

11
1

Questions tagged [speech-to-text]