I am working with the first layer of a CNN and trying to understand how to interpret the activation output. My CNN takes input from 3 channels (RBG picture) and the first layer is Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False).
From this I understand that every input picture will be convolved with weights of size 64x3x7x7 in order to derive the output.
After this is done I would like to visualise the output (the activations of the layer) but I have the following questions:
Should the activations be normalised for the visualisation and if yes how? My question essentially boils down to "Is activation relative"?
For example if for channel 1 the max value of the activation pixels is 4 while for channel 5 the max value of the activation pixels is 10 should I normalize both activations based on their respective max(4 and 10) or the total max (10). And how should I interpret this?Activations that do not show pixel variability can be interpreted as "The pattern that the corresponding weight kernel is looking for is not present in the image?"