Just to summarize Understanding dropout and gradient descent and https://stats.stackexchange.com/questions/207481/dropout-backpropagation-implementation
Suppose I need to implement inverted dropout in my CNN. All the neuron outputs in dropout layer during feedforward phase are multiplied by mask/p, where mask is 0 or 1, p is retain rate. But should I apply the same operation (include division by p) at the backpropagation phase? I suppose positive answer (see the second link above), but I need to know for sure.