I have a 20k dataset, and a couple hundred of those lines are extreme values and 10 of them or so are even extremer values. But they are correct and have a unique tag, so when that tag comes up I am hoping the ML treats it as unique as it is and not average them down like you would get in a Linear Regression model.
I know decision tree style MLs are the best at this and ive been having good luck with LightGBM, XGBoost and Catboost. The most luck with Catboost.
I was wondering if anyone had any tips to get more accurate predictions for this regression. Perhaps a certain evalmetric/scoring I should focus on.
Its hard to trust the r2 score in this situation because while it hits 90% of the dataset with accuracy sometimes its extremely off in these special cases