I have dataset containing 42 instances(X) and one final Y on which i want to perform LASSO regression.All are continuous and numerical. As the sample size small, I wish to extend it. I am kind of aware of algorithms like SMOTE used for extending imbalanced dataset. Is there anything available for my case where there is no imbalance?
Asked
Active
Viewed 45 times
1
-
You could use the same technique used in SMOTE to generate artificial instances, but it's unlikely to help obtaining a better model. Imho generating artificial instances often causes more problems than it solves. – Erwan Aug 23 '20 at 20:46
-
@Erwan thanks for your answer. Can you briefly tell or point me towards some resources about the problems that might crop up in that case? – rik Aug 25 '20 at 06:04
-
the main issue is that the artificial data distribution can be very different from the true distribution: it creates instances which might not exist in the true distribution, it doesn't represent most rare instances in the true distribution (because they're not in the real small data), and it oversamples some instances which happen by chance in the real small data. To be clear I'm not saying you shouldn't use it at all: if you don't have any other choice it's better than nothing! – Erwan Aug 25 '20 at 08:42
1 Answers
0
SMOTE can be used to resample any continuous dataset, imbalanced or not.
If you use the Python implementation of SMOTE, sampling_strategy can be set to 'all' which resamples all classes.
Something like:
from imblearn.over_sampling import SMOTE
sm = SMOTE(sampling_strategy='all')
X_res, y_res = sm.fit_resample(X, y)
Brian Spiering
- 20,142
- 2
- 25
- 102