5

Is applying dropout equivalent to zeroing output of random neurons in each mini-batch iteration and leaving rest of forward and backward steps in back-propagation unchanged? I'm implementing network from scratch in numpy.

hH1sG0n3
  • 1,978
  • 7
  • 27
Qbik
  • 195
  • 4
  • 1
    Yes, although just to be super-duper-extra precise, *bernoulli* dropout is the same as zeroing-out random neurons (some people use other kinds of randomness and call it things like Gaussian dropout; see e.g. https://keras.io/api/layers/regularization_layers/gaussian_dropout/) – John Madden Nov 09 '22 at 21:30
  • @Qbik please see edits to my reply below. – hH1sG0n3 Nov 10 '22 at 09:07

1 Answers1

6

Indeed. To be precise, the dropout operation will randomly zero some of the input tensor elements with probability $p$, and furthermore the rest of the non-dropped out outputs are scaled by a factor of $\frac{1}{1-p}$ during training.

For example, see how elements of each tensor in the input (top tensor in output) are zeroed in the output tensor (bottom tensor in output) using pytorch.

m = nn.Dropout(p=0.5)
input = torch.randn(3, 4)
output = m(input)

print(input, '\n', output)

>>> tensor([[-0.9698, -0.9397,  1.0711, -1.4557],
>>>        [-0.0249, -0.9614, -0.7848, -0.8345],
>>>        [ 0.9420,  0.6565,  0.4437, -0.2312]]) 
>>> tensor([[-0.0000, -0.0000,  2.1423, -0.0000],
>>>        [-0.0000, -0.0000, -1.5695, -1.6690],
>>>        [ 0.0000,  0.0000,  0.0000, -0.0000]])

EDIT: please note the post has been updated to reflect Todd Sewell's addition in the comments.

hH1sG0n3
  • 1,978
  • 7
  • 27
  • 1
    Note that non-dropped out elements are scaled by 1/(1-p) to compensate for the shift in average magnitude, so it's not _just_ zeroing out some elements. – Todd Sewell Nov 09 '22 at 20:12
  • That is very true, I omitted that info from the original pytorch docs for simplicity however in that way makes the post only half correct. Amended now to reflect your point. – hH1sG0n3 Nov 09 '22 at 22:17