Data normalization of count data for neural networks

Question

I have a sparse matrix of count data that I'm using as input to a neural network.

I know, usually, the input data should be normalized (e.g. via min-max scaling, $z$-score standardization, etc.). But for features that are counts, what is a good approach? Should I $\log_2(x+1)$ transform the data and then do a $z$-score standardization? Is there another better approach?

score 1 · Answer 1 · answered Jun 19 '21 at 15:58

1

One option is to convert counts to rate. Rates are always bound between 0 and 1. For example instead of a count of 100 events, the data could be encoded as a rate of .10 (100 events out of 1,000 opportunities).

answered Jun 19 '21 at 15:58

Brian Spiering

20,142
2
25
102

I like this idea, specially if you have an expected top count. If you divide your counts for the expected top count, you convert your input in universal rates, that do not depend on the samples you have. It´s the same when you divide pixel values by 255 before inputing them to a convolutional NN. – AlexSC Aug 16 '23 at 10:59

Data normalization of count data for neural networks

1 Answers1