7

Dropout and weight decay are both regularization techniques. From my experience, dropout has been more widely used in the last few years. Are there scenarios where weight decay shines more than dropout?

David Masip
  • 5,981
  • 2
  • 23
  • 61

1 Answers1

9

These techniques are not mutually exclusive; combining dropout with weight decay has become pretty standard for deep learning.

However, where weight decay applies a linear penalty, dropout can cause the penalty to grow exponentially. This property of dropout can lead to hypothetical failures as proposed and proven in section 4.2 of this paper.

In general, research has consistently shown the benefits of dropout (with and without weight decay) for training deep networks. A practical scenario in which weight decay is exclusively preferred over dropout would be quite the anomaly.

Ben
  • 2,512
  • 3
  • 14
  • 28