I am very new to ML and have limited knowledge about it. I am having issue in feature normalization process. I have understood from the post that we need to normalize the training features and scale the test/validation features with the training data. I am facing issue in the implementation as in my case my training samples have fixed dimension but the dimension of validation and test data is variable. So, I can apply zero mean unit variance for training data but I am not sure how can I normalize the validation/test data samples as the sample dimension/length is variable/not fixed.
Asked
Active
Viewed 78 times
0
-
Can you explain why your validation samples have a different dimension in comparison to training data? The basis for many ML algos to work is that the train , validation and test data belong to the same underlying distribution – Jayaram Iyer Apr 28 '21 at 04:19
-
Can you explain why training and test data are different? In my understanding, this can bring some issues, as your system has been trained with a different distribution of data. – Raul Alvarez Apr 28 '21 at 06:51
-
@RaulAlvarez The [paper](https://arxiv.org/pdf/1911.06878.pdf) I am trying to implement says that their model uses the fixed sizes (512, 128) samples during training and complete audio clip as one sample during testing and validation. – skiii gairola May 02 '21 at 11:53
2 Answers
0
The easiest way is to pad your data into the same length. For example make all training, validation, & test subjects into the same length by add 0 at the end or beginning of each subject, then your problem should be solved. You can refer to this keras example for a better idea.
DaCard
- 126
- 6
0
That is a common case on image and audio processing, you need to find a way in which dimensions stay the same, such as normalizing per channel.
If you have a 1D vector of features, taking mean and variance of all variables will end up normalizing it in a way, it works in Computer Vision like a charm. It is also a way to reduce the space cost of your normalizing algorithm.
Pedro Henrique Monforte
- 1,606
- 1
- 11
- 26
-
In my case, I am dealing with mono-channel audio. and frequency bin is fixed to 128 but time-frames are different in each audio clip. I tried to normalize across frequency bin i.e., calculated mean as a vector of length 128 (mean for each frequency bin) but it is not working. – skiii gairola May 02 '21 at 10:53