1

I am working with the first layer of a CNN and trying to understand how to interpret the activation output. My CNN takes input from 3 channels (RBG picture) and the first layer is Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False). From this I understand that every input picture will be convolved with weights of size 64x3x7x7 in order to derive the output.
After this is done I would like to visualise the output (the activations of the layer) but I have the following questions:

  1. Should the activations be normalised for the visualisation and if yes how? My question essentially boils down to "Is activation relative"?
    For example if for channel 1 the max value of the activation pixels is 4 while for channel 5 the max value of the activation pixels is 10 should I normalize both activations based on their respective max(4 and 10) or the total max (10). And how should I interpret this?

  2. Activations that do not show pixel variability can be interpreted as "The pattern that the corresponding weight kernel is looking for is not present in the image?"

User2321
  • 111
  • 3

0 Answers0