0

I am very new to ML and have limited knowledge about it. I am having issue in feature normalization process. I have understood from the post that we need to normalize the training features and scale the test/validation features with the training data. I am facing issue in the implementation as in my case my training samples have fixed dimension but the dimension of validation and test data is variable. So, I can apply zero mean unit variance for training data but I am not sure how can I normalize the validation/test data samples as the sample dimension/length is variable/not fixed.

  • Can you explain why your validation samples have a different dimension in comparison to training data? The basis for many ML algos to work is that the train , validation and test data belong to the same underlying distribution – Jayaram Iyer Apr 28 '21 at 04:19
  • Can you explain why training and test data are different? In my understanding, this can bring some issues, as your system has been trained with a different distribution of data. – Raul Alvarez Apr 28 '21 at 06:51
  • @RaulAlvarez The [paper](https://arxiv.org/pdf/1911.06878.pdf) I am trying to implement says that their model uses the fixed sizes (512, 128) samples during training and complete audio clip as one sample during testing and validation. – skiii gairola May 02 '21 at 11:53

2 Answers2

0

The easiest way is to pad your data into the same length. For example make all training, validation, & test subjects into the same length by add 0 at the end or beginning of each subject, then your problem should be solved. You can refer to this keras example for a better idea.

https://keras.io/guides/understanding_masking_and_padding/

DaCard
  • 126
  • 6
0

That is a common case on image and audio processing, you need to find a way in which dimensions stay the same, such as normalizing per channel.

If you have a 1D vector of features, taking mean and variance of all variables will end up normalizing it in a way, it works in Computer Vision like a charm. It is also a way to reduce the space cost of your normalizing algorithm.

Pedro Henrique Monforte
  • 1,606
  • 1
  • 11
  • 26
  • In my case, I am dealing with mono-channel audio. and frequency bin is fixed to 128 but time-frames are different in each audio clip. I tried to normalize across frequency bin i.e., calculated mean as a vector of length 128 (mean for each frequency bin) but it is not working. – skiii gairola May 02 '21 at 10:53