Questions tagged [regularization]

Inclusion of additional constraints (typically a penalty for complexity) in the model fitting process. Used to prevent overfitting / enhance predictive accuracy.

Regularization refers to the inclusion of additional components in the model fitting process that are used to prevent overfitting and/or stabilize parameter estimates.

Parametric approaches to regularization typically add terms to the training error or MLE objective function that penalize model complexity, in addition to the standard data misfit terms (e.g. Ridge Regression, LASSO). This penalty can be interpreted as arising from a prior on the parameter vector in the framework of Bayesian MAP estimation.

Non-parametric regularization techniques include dropout (used in deep learning) and truncated-SVD (used in linear least squares).

Synonyms include: penalization, shrinkage methods, and constrained fitting.

172 questions

votes

2 answers

When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?

In Keras, there are 2 methods to reduce over-fitting. L1,L2 regularization or dropout layer. What are some situations to use L1,L2 regularization instead of dropout layer? What are some situations when dropout layer is better?

asked Aug 23 '18 at 15:46

user781486

1,305
2
16
18

votes

3 answers

L1 & L2 Regularization in Light GBM

This question pertains to L1 & L2 regularization parameters in Light GBM. As per official documentation: reg_alpha (float, optional (default=0.)) – L1 regularization term on weights. reg_lambda (float, optional (default=0.)) – L2 regularization term…

xgboost regularization lightgbm

asked Aug 08 '19 at 17:08

Vikrant Arora

votes

5 answers

Why does adding a dropout layer improve deep/machine learning performance, given that dropout suppresses some neurons from the model?

If removing some neurons results in a better performing model, why not use a simpler neural network with fewer layers and fewer neurons in the first place? Why build a bigger, more complicated model in the beginning and suppress parts of it later?

machine-learning deep-learning keras regularization dropout

asked Aug 16 '18 at 12:18

user781486

1,305
2
16
18

votes

2 answers

Why using L1 regularization over L2?

Conducting a linear regression model using a loss function, why should I use $L_1$ instead of $L_2$ regularization? Is it better at preventing overfitting? Is it deterministic (so always a unique solution)? Is it better at feature selection (because…

linear-regression regularization

asked Oct 12 '17 at 19:54

astudentofmaths

votes

4 answers

Choosing regularization method in neural networks

When training neural networks, there are at least 4 ways to regularize the network: L1 Regularization L2 Regularization Dropout Batch Normalization plus of course other things like weight sharing and reducing the number of connections, which…

neural-network regularization

asked May 25 '16 at 05:08

Thomas Johnson

votes

2 answers

Are there studies which examine dropout vs other regularizations?

Are there any papers published which show differences of the regularization methods for neural networks, preferably on different domains (or at least different datasets)? I am asking because I currently have the feeling that most people seem to use…

neural-network convolutional-neural-network computer-vision regularization dropout

asked Dec 03 '15 at 21:30

Martin Thoma

18,630
31
92
167

votes

2 answers

Why use regularization instead of decreasing the model

Regularization is used to decrease the capacity of a machine learning model to avoid overfitting. Why don't we just use a model with less capacity (e.g. decrease the number of layers). This would also benefit the computational time and memory. My…

machine-learning neural-network regularization

asked Aug 08 '19 at 19:45

deep_ozean

votes

2 answers

Light GBM Regressor, L1 & L2 Regularization and Feature Importances

I want to know how L1 & L2 regularization works in Light GBM and how to interpret the feature importances. Scenario is: I used LGBM Regressor with RandomizedSearchCV (cv=3, iterations=50) on a dataset of 400000 observations & 160 variables. In order…

feature-selection regularization lightgbm

asked Aug 08 '19 at 09:35

Vikrant Arora

votes

1 answer

Dropout vs weight decay

Dropout and weight decay are both regularization techniques. From my experience, dropout has been more widely used in the last few years. Are there scenarios where weight decay shines more than dropout?

machine-learning deep-learning overfitting regularization dropout

asked Apr 20 '18 at 13:46

David Masip

5,981
2
23
61

votes

3 answers

Convolutional Neural Network overfitting

I built a CNN to learn to classify EEG data (only about 4000 training examples, 2 classes, 50-50 class balance). Each training example is 64x512, with 5 channels each Ive tried to keep the network as simple/small as possible for testing: ConvLayer…

machine-learning neural-network classification convolutional-neural-network regularization

asked Dec 05 '16 at 22:08

Simon

1,071
2
10
28

votes

1 answer

Regularization practice with ANNs

I have learnt from some examples the existence of regularization option at ANNs (concretely, at Keras implementation). As far as I know, regularization in general is a kind of "penalty" on parameters to prevent model complexity and…

machine-learning neural-network keras regularization

asked Nov 17 '16 at 10:58

Hendrik

8,377
17
40
55

votes

2 answers

Understanding regularization

I'm currently trying to understand regularization for logistic regression. So far, I'm not quite sure whether I really got it. Basically, the problem is that when we add an additional features to a model we might overfit the training set. This leads…

logistic-regression regularization

asked Feb 17 '16 at 06:12

Golo Roden

1,313
2
9
6

votes

1 answer

difference in l1 and l2 regularization

I have seen at different places saying that: l1 regularization penalizes weights more than l2. But the derivative of l1 norm is $\lambda$ and l2 norm is 2$\lambda$w. So l1 regularization subtracts smaller value than l2. Then why is it called that l1…

regularization

asked May 17 '20 at 10:07

shaifali Gupta

votes

2 answers

Understanding XG Boost Training (Multi class classification)

I have been working with XG boost for classification (multi class classification : 6 classes) I use 5 fold CV to train and validate my model. Please refer to the paramters, which i had used in my model. params = {"objective": 'multi:softprob',…

python xgboost regularization

asked Apr 04 '20 at 10:45

Mari

votes

1 answer

Which regularization in convolution layers (conv2D)

I am using Keras for a project. I would like to know if it makes any sense to add any kind of regularization components such as kernel, bias or activity regularization in convolutional layers i.e Conv2D in Keras. If yes, then which regularization…

neural-network keras regularization

asked Nov 19 '18 at 18:24

Arka Mallick

2 3

…

11 12 Next