In neural networks, is applying dropout the same as zeroing random neurons?

Question

Is applying dropout equivalent to zeroing output of random neurons in each mini-batch iteration and leaving rest of forward and backward steps in back-propagation unchanged? I'm implementing network from scratch in numpy.

Yes, although just to be super-duper-extra precise, *bernoulli* dropout is the same as zeroing-out random neurons (some people use other kinds of randomness and call it things like Gaussian dropout; see e.g. https://keras.io/api/layers/regularization_layers/gaussian_dropout/) — John Madden, Nov 09 '22 at 21:30

hH1sG0n3 · Accepted Answer · 2022-11-10T20:22:56.137

Indeed. To be precise, the dropout operation will randomly zero some of the input tensor elements with probability $p$, and furthermore the rest of the non-dropped out outputs are scaled by a factor of $\frac{1}{1-p}$ during training.

For example, see how elements of each tensor in the input (top tensor in output) are zeroed in the output tensor (bottom tensor in output) using pytorch.

m = nn.Dropout(p=0.5)
input = torch.randn(3, 4)
output = m(input)

print(input, '\n', output)

>>> tensor([[-0.9698, -0.9397,  1.0711, -1.4557],
>>>        [-0.0249, -0.9614, -0.7848, -0.8345],
>>>        [ 0.9420,  0.6565,  0.4437, -0.2312]]) 
>>> tensor([[-0.0000, -0.0000,  2.1423, -0.0000],
>>>        [-0.0000, -0.0000, -1.5695, -1.6690],
>>>        [ 0.0000,  0.0000,  0.0000, -0.0000]])

EDIT: please note the post has been updated to reflect Todd Sewell's addition in the comments.

Note that non-dropped out elements are scaled by 1/(1-p) to compensate for the shift in average magnitude, so it's not _just_ zeroing out some elements. — Todd Sewell, Nov 09 '22 at 20:12
That is very true, I omitted that info from the original pytorch docs for simplicity however in that way makes the post only half correct. Amended now to reflect your point. — hH1sG0n3, Nov 09 '22 at 22:17

In neural networks, is applying dropout the same as zeroing random neurons?

1 Answers1