feature normalisation problem

Question

I am very new to ML and have limited knowledge about it. I am having issue in feature normalization process. I have understood from the post that we need to normalize the training features and scale the test/validation features with the training data. I am facing issue in the implementation as in my case my training samples have fixed dimension but the dimension of validation and test data is variable. So, I can apply zero mean unit variance for training data but I am not sure how can I normalize the validation/test data samples as the sample dimension/length is variable/not fixed.

Can you explain why your validation samples have a different dimension in comparison to training data? The basis for many ML algos to work is that the train , validation and test data belong to the same underlying distribution — Jayaram Iyer, Apr 28 '21 at 04:19
Can you explain why training and test data are different? In my understanding, this can bring some issues, as your system has been trained with a different distribution of data. — Raul Alvarez, Apr 28 '21 at 06:51
@RaulAlvarez The [paper](https://arxiv.org/pdf/1911.06878.pdf) I am trying to implement says that their model uses the fixed sizes (512, 128) samples during training and complete audio clip as one sample during testing and validation. — skiii gairola, May 02 '21 at 11:53

score 0 · Answer 1 · answered Feb 21 '22 at 01:31

The easiest way is to pad your data into the same length. For example make all training, validation, & test subjects into the same length by add 0 at the end or beginning of each subject, then your problem should be solved. You can refer to this keras example for a better idea.

https://keras.io/guides/understanding_masking_and_padding/

score 0 · Answer 2 · answered Apr 28 '21 at 11:15

0

That is a common case on image and audio processing, you need to find a way in which dimensions stay the same, such as normalizing per channel.

If you have a 1D vector of features, taking mean and variance of all variables will end up normalizing it in a way, it works in Computer Vision like a charm. It is also a way to reduce the space cost of your normalizing algorithm.

answered Apr 28 '21 at 11:15

Pedro Henrique Monforte

1,606
1
11
26

In my case, I am dealing with mono-channel audio. and frequency bin is fixed to 128 but time-frames are different in each audio clip. I tried to normalize across frequency bin i.e., calculated mean as a vector of length 128 (mean for each frequency bin) but it is not working. – skiii gairola May 02 '21 at 10:53

feature normalisation problem

2 Answers2