1

Just to summarize Understanding dropout and gradient descent and https://stats.stackexchange.com/questions/207481/dropout-backpropagation-implementation

Suppose I need to implement inverted dropout in my CNN. All the neuron outputs in dropout layer during feedforward phase are multiplied by mask/p, where mask is 0 or 1, p is retain rate. But should I apply the same operation (include division by p) at the backpropagation phase? I suppose positive answer (see the second link above), but I need to know for sure.

Serge P.
  • 217
  • 3
  • 10

1 Answers1

3

As given in the links, the answer is yes! note that you divide the mask by p so that you won't need to multiply by p in the test time and since this is a coefficient for the new activation, it will come out of the derivative in chain rule in backprop.