6

I have dataset of around 180k observations of 13 variables (mix of numerical and categorical features). It is binary classification problem, but classes are imbalanced (25:1 for negative ones). I wanted to deployed XGBoost (in R) and reach the best possible Precision & Recall. For dealing with imbalances I tried upsampling of positive class, as well as XGB high weights for positive class. However, despite the fact Recall is pretty high, there is very poor Precision (around 0.10).

My parameters tuning for XGB:

  • Random search of parameters - 10 interations

    5-folds CV

Parameter's intervals: max_depth = 3-10 lambda = 0 - 50 gamma = 0 -10 min_child_weight = 1 -10 eta = 0.01-0.20

Then, I tried Random Forest with upsampled dataset and it performed suprisingly great with Recall 0.88 and Precision 0.73 (on test dataset).

Could someone tell me please, if it is possible that RF outperforms XGB so much, or it is a sign I am doing something wrong? Thank you very much.

Filip
  • 63
  • 3
  • Up/down-sampling is not the same as using weighting, so I am not surprised you got different results. Use the same up/down-sampled data set for RF and XGB and then compare. – user2974951 Jan 06 '22 at 11:12
  • 5
    Yes it is possible that an RF can out perform an xgboost model. There is no "best" algorithm across all problems and data (features, signal, noise). Different algorithms might also find very similar results. What does best possible precision and recall mean? Those are chosen for a specific cutoff value. How are you choosing the cutoff value? – Craig Jan 06 '22 at 11:41
  • @user2974951 sorry, i wrote it probably a bit unclear. I tried both methods separately - when did upsampling so no weights - so i compared upsampled xgb with no weight and upsampled RF – Filip Jan 06 '22 at 12:10
  • @Craig .... I went for recall/precision measure because of imbalanced dataset - so to know how much are predictions of minor class reliable. I was using 0.5 cutoff for now. – Filip Jan 06 '22 at 12:14
  • So you built both models on the same data, RF on up-sampled data and XGB on the same up-sampled data, without any weighting? And similarly for the weighted non-up-sampled data? And the RF model obtained better results in both cases? – user2974951 Jan 06 '22 at 13:44
  • I agree with @Craig, that the answer to the title question is certainly "it's possible". That said, 10 random hyperparameter points may well have been too small to find a reasonably good one. – Ben Reiniger Jan 06 '22 at 15:37
  • It would be interesting to see if the XGB and Random Forest are comparable in precision and recall after you add colsample_bytree and subsample to @Filip. – hjerp Jan 22 '22 at 10:56

1 Answers1

4

There are two important things in random forests: "bagging" and "random". Broadly speaking: bagging means that only a part of the "rows" are used at a time (see details here) while "random" means that only a small fraction of the "columns" (features, usually $\sqrt{m}$ as default) are used to make a single split. This helps to also let seemingly "weak" features have a say in the prediction or to avoid dominance of few features in the model (and thus to avoid overfitting).

Looking at your XGB parameters, I notice that you do not subsample rows and columns, which is possible by using the parameters colsample_bytree and subsample. You could also use scale_pos_weight to tackle imbalanced classes. Subsetting columns and rows would possibly be useful if you have some dominant features or observations in your data. I suspect that using subsampling (this would be „stochastic gradient boosting“), the XGB results would improve and be "closer" to the results obtained by using a random forest.

Also make sure you have enough boosting rounds (to have good learning progress). You can add a watchlist and an early_stopping_rounds criterium to stop boosting in case no more progress is made. In this case you would set nrounds to a "high" number and stop boosting in case no more learning progress after early_stopping_rounds steps is made as in this generic code.

Peter
  • 7,277
  • 5
  • 18
  • 47