Is applying dropout equivalent to zeroing output of random neurons in each mini-batch iteration and leaving rest of forward and backward steps in back-propagation unchanged? I'm implementing network from scratch in numpy.
Asked
Active
Viewed 353 times
5
-
1Yes, although just to be super-duper-extra precise, *bernoulli* dropout is the same as zeroing-out random neurons (some people use other kinds of randomness and call it things like Gaussian dropout; see e.g. https://keras.io/api/layers/regularization_layers/gaussian_dropout/) – John Madden Nov 09 '22 at 21:30
-
@Qbik please see edits to my reply below. – hH1sG0n3 Nov 10 '22 at 09:07
1 Answers
6
Indeed. To be precise, the dropout operation will randomly zero some of the input tensor elements with probability $p$, and furthermore the rest of the non-dropped out outputs are scaled by a factor of $\frac{1}{1-p}$ during training.
For example, see how elements of each tensor in the input (top tensor in output) are zeroed in the output tensor (bottom tensor in output) using pytorch.
m = nn.Dropout(p=0.5)
input = torch.randn(3, 4)
output = m(input)
print(input, '\n', output)
>>> tensor([[-0.9698, -0.9397, 1.0711, -1.4557],
>>> [-0.0249, -0.9614, -0.7848, -0.8345],
>>> [ 0.9420, 0.6565, 0.4437, -0.2312]])
>>> tensor([[-0.0000, -0.0000, 2.1423, -0.0000],
>>> [-0.0000, -0.0000, -1.5695, -1.6690],
>>> [ 0.0000, 0.0000, 0.0000, -0.0000]])
EDIT: please note the post has been updated to reflect Todd Sewell's addition in the comments.
hH1sG0n3
- 1,978
- 7
- 27
-
1Note that non-dropped out elements are scaled by 1/(1-p) to compensate for the shift in average magnitude, so it's not _just_ zeroing out some elements. – Todd Sewell Nov 09 '22 at 20:12
-
That is very true, I omitted that info from the original pytorch docs for simplicity however in that way makes the post only half correct. Amended now to reflect your point. – hH1sG0n3 Nov 09 '22 at 22:17