Questions tagged [lasso]

Least Absolute Shrinkage and Selection Operator (LASSO) regression, is a regularization technique used in regression cases where the model overfits or there is high multi-collinearity.

Lease Absolute Shrinkage and Selection Operator (LASSO) regression, is a regularization technique used in regression cases where the model overfits or there is high multi-collinearity. It has one tuning parameter, $\lambda$, and as this value in increased the estimates are shrunk closer and closer to zero. It differs from Ridge Regression in that values can be shrunk to zero which can make this Lasso Regression useful for feature selection.

It is defined by:

$$SSE_{L1 norm} = \sum_{i=1}^{n}(y_i-\hat{y_i})^2 + \lambda \sum_{j=1}^{P} \lvert{\beta_j^2}\rvert$$

Where the goal is to reduce model complexity and by adding a penalty term to the Sum of Squared Errors (SSE).

47 questions
4
votes
1 answer

For a square matrix of data, I achieve $R^2=1$ for Linear Regression and $R^2=0$ for Lasso. What's the intuition behind?

For a square matrix of random data, N columns and N rows. I am fitting two models, linear regression and Lasso. For the linear regression, I achieve a perfect score in train set, while in the Lasso I achieve a score of 0. import pandas as pd import…
Carlos Mougan
  • 6,011
  • 2
  • 15
  • 45
4
votes
2 answers

LASSO remaining features for different penalisation

I am using the sklearn LASSOCV function and I am changing the penalisation parameter in order to adjust the number of features killed off. For example for $\alpha = 0.01$ I have 55 features remaining and for $\alpha=0.5$ I have 6 remaining features.…
prax1telis
  • 141
  • 1
4
votes
2 answers

Why does Lasso behave "erratically" when the number of features is greater than the number of training instances?

From the book "Hands-on Machine Learning with Scikit-Learn and TensorFlow 2nd edition" chapter 4: In general, Elastic Net is preferred over Lasso since Lasso may behave erratically when the number of features is greater than the number of …
Moaz Ashraf
  • 141
  • 2
3
votes
1 answer

Difference between PCA and regularisation

Currently, I am confusing about PCA and regularisation. I wonder what is the difference between PCA and regularisation: particularly lasso (L1) regression? Seems both of them can do the feature selection. I have to admit, I am not quiet familiar…
Hang
  • 33
  • 2
2
votes
1 answer

Interpreting machine learning coefficients

My dog show predictive tool is having some trouble with its neural net. Broadly, I start with a couple of factors--age, weight, height, breed (which is a set of dummy variables), a subjective cuteness score--and predict whether the animal will win…
2
votes
3 answers

how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?

How does Lasso regression help with feature selection of model by making the coefficient shrink to zero? I could see few below with below diagram. Can any please explain in simple terms how to correlate below diagram with: How Lasso shrinks the…
star
  • 1,411
  • 7
  • 18
  • 29
2
votes
2 answers

How do standardization and normalization impact the coefficients of linear models?

One benefit of creating a linear model is that you can look at the coefficients the model learns and interpret them. For example, you can see which features have the most predictive power and which do not. How, if at all, does feature…
2
votes
1 answer

When should we start using stacking of models?

I am solving a Kaggle contest and my single model has reached score of 0.121, I'd like to know when to start using ensembling/stacking to improve the score. I used lasso and xgboost and there obviously must be variance associated with those two…
2
votes
0 answers

Why do you need to use group lasso with categorical variables?

From what I've read you should you use group lasso to either discard the dummy encoded variables (of the category) or use all of them. If you use normal lasso then some of the variables in the group can be discarded (set to zero) and some might not,…
Ferus
  • 121
  • 1
2
votes
0 answers

Can I rescale TF matrix or TF-IDF matrix using StandardScaler prior to Logisitc Lasso regression?

I am trying to use Logistic Lasso to classify documents as 1 or 0. I've tried using both the TF matrix and TF-IDF matrix representations of the documents as my predictors. I've found that if I use the StandardScaler function in python (standardizing…
1
vote
1 answer

Lasso regression not getting better without random features

First of all, I'm new to lasso regression, so sorry if this feels stupid. I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with. I started by standardizing all…
Onur Ece
  • 11
  • 1
1
vote
1 answer

Lasso Regression for Feature Importance saying almost every feature is unimportant?

I have a metric (RevenueSoFar) that is a great predictor of my target FinalRevenue as you'd expect - it is a metric where we tend to get 90-95% of revenue so far on day 1 and then it can increase over the next 6 days. Therefore i'm also using…
1
vote
2 answers

What is the meaning of the sparsity parameter

Sparse methods such as LASSO contain a parameter $\lambda$ which is associated with the minimization of the $l_1$ norm. Higher the value of $\lambda$ ($>0$) means that more coefficients will be shrunk to zero. What is unclear to me is that how does…
Sm1
  • 511
  • 3
  • 17
1
vote
1 answer

Generating artificial data to extend learning set

I have dataset containing 42 instances(X) and one final Y on which i want to perform LASSO regression.All are continuous and numerical. As the sample size small, I wish to extend it. I am kind of aware of algorithms like SMOTE used for extending…
rik
  • 11
  • 1
1
vote
1 answer

How is learning rate calculated in sklearn Lasso regression?

I was applying different regression models to Kaggle Housing dataset for advanced regression. I am planning to test out lasso, ridge and elastic net. However, none of these models have learning rate as their parameter. How is the learning rate…
1
2 3 4