How to tackle imbalanced regression?

Question

I've recently encountered a problem where I want to fit a regression model on data that's target variable is like 75% zeroes, and the rest is a continuous variable. This makes it a regression problem, however, the non-zero values also have a very high variance: they can take anywhere from between 1 to 105 million.

What would be an effective approach to such a problem? Due to the high variance, I keep getting regressors that fit too much to the zeroes and as a result I get very high MAE. I understand in classification you can use balanced weighting for example in RandomForests, but what's the equivalent to regression problems? Does SciKit-Learn have anything similar?

score 3 · Answer 1 · edited Apr 22 '22 at 20:38

3

Zero-inflated models (https://en.wikipedia.org/wiki/Zero-inflated_model) first predict whether an individual's response will be zero, and then among the non-zero responses, predict categorical values.

If your non-zero values could be consider count or rate data, you might use:

statsmodels.discrete.count_model.ZeroInflatePoisson

edited Apr 22 '22 at 20:38

Ethan

1,625
8
23
39

answered Apr 22 '22 at 16:17

clementzach

131
2

How to tackle imbalanced regression?

1 Answers1