Questions tagged [lasso]

Least Absolute Shrinkage and Selection Operator (LASSO) regression, is a regularization technique used in regression cases where the model overfits or there is high multi-collinearity.

Lease Absolute Shrinkage and Selection Operator (LASSO) regression, is a regularization technique used in regression cases where the model overfits or there is high multi-collinearity. It has one tuning parameter, $\lambda$, and as this value in increased the estimates are shrunk closer and closer to zero. It differs from Ridge Regression in that values can be shrunk to zero which can make this Lasso Regression useful for feature selection.

It is defined by:

$$SSE_{L1 norm} = \sum_{i=1}^{n}(y_i-\hat{y_i})^2 + \lambda \sum_{j=1}^{P} \lvert{\beta_j^2}\rvert$$

Where the goal is to reduce model complexity and by adding a penalty term to the Sum of Squared Errors (SSE).

47 questions

votes

1 answer

For a square matrix of data, I achieve $R^2=1$ for Linear Regression and $R^2=0$ for Lasso. What's the intuition behind?

For a square matrix of random data, N columns and N rows. I am fitting two models, linear regression and Lasso. For the linear regression, I achieve a perfect score in train set, while in the Lasso I achieve a score of 0. import pandas as pd import…

asked Dec 28 '19 at 09:52

Carlos Mougan

6,011
2
15
45

votes

2 answers

LASSO remaining features for different penalisation

I am using the sklearn LASSOCV function and I am changing the penalisation parameter in order to adjust the number of features killed off. For example for $\alpha = 0.01$ I have 55 features remaining and for $\alpha=0.5$ I have 6 remaining features.…

scikit-learn time-series regression lasso

asked Aug 24 '19 at 16:59

prax1telis

votes

2 answers

Why does Lasso behave "erratically" when the number of features is greater than the number of training instances?

From the book "Hands-on Machine Learning with Scikit-Learn and TensorFlow 2nd edition" chapter 4: In general, Elastic Net is preferred over Lasso since Lasso may behave erratically when the number of features is greater than the number of …

machine-learning regularization lasso

asked Jul 17 '19 at 12:03

Moaz Ashraf

votes

1 answer

Difference between PCA and regularisation

Currently, I am confusing about PCA and regularisation. I wonder what is the difference between PCA and regularisation: particularly lasso (L1) regression? Seems both of them can do the feature selection. I have to admit, I am not quiet familiar…

pca regularization lasso

asked May 18 '21 at 14:37

Hang

votes

1 answer

Interpreting machine learning coefficients

My dog show predictive tool is having some trouble with its neural net. Broadly, I start with a couple of factors--age, weight, height, breed (which is a set of dummy variables), a subjective cuteness score--and predict whether the animal will win…

machine-learning neural-network lasso

asked Dec 17 '20 at 09:34

gibbsabroad

votes

3 answers

how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?

How does Lasso regression help with feature selection of model by making the coefficient shrink to zero? I could see few below with below diagram. Can any please explain in simple terms how to correlate below diagram with: How Lasso shrinks the…

python regression linear-regression lasso ridge-regression

asked Nov 10 '20 at 20:05

star

1,411
7
18
29

votes

2 answers

How do standardization and normalization impact the coefficients of linear models?

One benefit of creating a linear model is that you can look at the coefficients the model learns and interpret them. For example, you can see which features have the most predictive power and which do not. How, if at all, does feature…

linear-regression feature-scaling interpretation ridge-regression lasso

asked Aug 21 '20 at 15:29

codeananda

votes

1 answer

When should we start using stacking of models?

I am solving a Kaggle contest and my single model has reached score of 0.121, I'd like to know when to start using ensembling/stacking to improve the score. I used lasso and xgboost and there obviously must be variance associated with those two…

machine-learning regression ensemble-modeling lasso

asked Dec 04 '19 at 02:57

thewhitetulip

votes

0 answers

Why do you need to use group lasso with categorical variables?

From what I've read you should you use group lasso to either discard the dummy encoded variables (of the category) or use all of them. If you use normal lasso then some of the variables in the group can be discarded (set to zero) and some might not,…

predictive-modeling linear-regression lasso

asked Jul 14 '19 at 10:48

Ferus

votes

0 answers

Can I rescale TF matrix or TF-IDF matrix using StandardScaler prior to Logisitc Lasso regression?

I am trying to use Logistic Lasso to classify documents as 1 or 0. I've tried using both the TF matrix and TF-IDF matrix representations of the documents as my predictors. I've found that if I use the StandardScaler function in python (standardizing…

python feature-scaling tfidf lasso

asked Oct 20 '21 at 14:53

Patrick Steele

vote

1 answer

Lasso regression not getting better without random features

First of all, I'm new to lasso regression, so sorry if this feels stupid. I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with. I started by standardizing all…

machine-learning regression lasso

asked Feb 04 '21 at 16:20

Onur Ece

vote

1 answer

Lasso Regression for Feature Importance saying almost every feature is unimportant?

I have a metric (RevenueSoFar) that is a great predictor of my target FinalRevenue as you'd expect - it is a metric where we tend to get 90-95% of revenue so far on day 1 and then it can increase over the next 6 days. Therefore i'm also using…

machine-learning python scikit-learn feature-selection lasso

asked Dec 22 '20 at 11:51

James

vote

2 answers

What is the meaning of the sparsity parameter

Sparse methods such as LASSO contain a parameter $\lambda$ which is associated with the minimization of the $l_1$ norm. Higher the value of $\lambda$ ($>0$) means that more coefficients will be shrunk to zero. What is unclear to me is that how does…

regularization lasso ridge-regression elastic-net sparsity

asked Dec 01 '20 at 01:15

Sm1

vote

1 answer

Generating artificial data to extend learning set

I have dataset containing 42 instances(X) and one final Y on which i want to perform LASSO regression.All are continuous and numerical. As the sample size small, I wish to extend it. I am kind of aware of algorithms like SMOTE used for extending…

dataset regression data sampling lasso

asked Aug 21 '20 at 15:53

rik

vote

1 answer

How is learning rate calculated in sklearn Lasso regression?

I was applying different regression models to Kaggle Housing dataset for advanced regression. I am planning to test out lasso, ridge and elastic net. However, none of these models have learning rate as their parameter. How is the learning rate…

linear-regression regularization learning-rate ridge-regression lasso

asked May 18 '20 at 07:10

Aman Krishna

2 3 4 Next