It is customary to normalize feature variables and this normally does increase the performance of a neural network in particular a CNN. I was wondering if normalizing the target could also help increase performance? I did not notice an increase in performance with the data set I am using at the moment but was curious if anyone has tried in the past. Of course the normalization happens only on the training data.
1 Answers
One reason for normalising the inputs is to make gradient descent more stable, as gradients spend more time in a comfortable region with meaningful updates and less neurons 'die' during trainings - getting stuck at one of the tails of e.g. the sigmoid non-linearity.
Normalising the output distribution is perhaps not the best idea, as you are by definition altering the defition of the target. This means you are essentially predicting a distribution that doesn't mirror your real-world target (at least without some reverse non-linear transforms later on).
On this you could do would be to scale the target, instead of normalising. The shape of the distribution should remain almost identical (thinking about the shape of the distribution), but the values themselves might be more easily attainable and therefore faster to optimise for; they are all closer in magnitude to the gradients that are being computed.
- 14,663
- 2
- 28
- 49