When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?

Question

In Keras, there are 2 methods to reduce over-fitting. L1,L2 regularization or dropout layer.

What are some situations to use L1,L2 regularization instead of dropout layer? What are some situations when dropout layer is better?

L1, L2 regularizers don't really work in NN's if you are using an NN..A good intuition is given by lec 2/3/4 (I am not sure) of Stanford CNN course.. Just serach it on utube — DuttaA, Aug 23 '18 at 17:36
I would say that there aren't `in Keras`, but rather on ML, Statistics etc. — Enzo Dtz, Jul 05 '22 at 02:02

score 24 · Accepted Answer · answered Aug 23 '18 at 20:19

24

I am unsure there will be a formal way to show which is best in which situations - simply trying out different combinations is likely best!

It is worth noting that Dropout actually does a little bit more than just provide a form of regularisation, in that it is really adding robustness to the network, allowing it to try out many many different networks. This is true because the randomly deactivated neurons are essentially removed for that forward/backward pass, thereby giving the same effect as if you had used a totally different network! Have a look at this post for a few more pointers regarding the beauty of dropout layers.

$L_1$ versus $L_2$ is easier to explain, simply by noting that $L_2$ treats outliers a little more thoroughly - returning a larger error for those points. Have a look here for more detailed comparisons.

answered Aug 23 '18 at 20:19

n1k31t4

14,663
2
28
49

When I use L2 regularization, the loss rate increase. Do you know why? – N.IT Sep 05 '18 at 11:44
If your $L2$ loss increases, so would your $L1$ loss (just perhaps more slowly). Either way, your model is diverging from a minimum in the loss curve i.e. it isn't learning. You might want to think about other parts of your model again, such as the architecture, data preprocessing or class-imbalance in your data. Those keywords might help you in the right direction with some searches. – n1k31t4 Sep 05 '18 at 13:15
Also, dropout can cause validation accuracy sometimes to be higher than train accuracy, which is indicative of a good performance. – Anshuman Kumar Jun 06 '20 at 02:47
@AnshumanKumar **which is indicative of a good performance.** Can you elaborate this a bit more? – stuckoverflow Apr 13 '21 at 19:49
can we use L1 regularization and dropout together? – MUK May 16 '22 at 12:06

ACB_prgm · Answer 2 · 2022-06-30T17:52:02.730

It seems deciding between L2 and Dropout is a "guess and check" type of thing, unfortunately. Both are used to make the network more "robust" and reduce overfitting by preventing the network from relying too heavily on any given neuron. ie: it is generally believed that it would be better to have many neurons contributing to a model’s output, rather than a select few. L2 and Dropout do this by different means, so you really have to play around to see what gives you a different result.

Dropout randomly mutes some percentage of neurons (provided by you) each forward pass through the network, forcing the network to diversify.

L2 reduces the contribution of high outlier neurons (those significantly larger than the median) and prevents any one neuron from exploding. This also forces the network to diversify.

L1 should really be in its own category, as it is most useful for features selection and small networks. It almost does the opposite of L2 and Dropout by simplifying the network and muting nome neurons.

If you notice that adding a small regularization decreases your accuracy / increases your loss, it's probably because your network was overfitting.

When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?

2 Answers2