1

I've searched everywhere and still couldn't figure this one out. This post mentioned that Gradient Boosting is better than Random Forest for unbalanced data. Why is that? Is Random Forest worse because of bootstrapping (perhaps this wouldn't get a stratified sample during training, idk)?

Any thoughts?

Thanks in advance

  • I'd highly recommend to check out this stack exchange [question](https://stats.stackexchange.com/questions/173390/gradient-boosting-tree-vs-random-forest). Although it's a general approach comparing Random Forest results versus Gradient Boosting, it contains a lot insights. – Miss.Alpha Oct 17 '22 at 21:42

1 Answers1

1

Boosting is the method of creating ensemble by increasing the importance of wrongly predicted instances in each iteration. RandomForest works by creating ensemble using Bootstrap Aggregating which involves 'sampling with replacement'. So, when you have imbalanced dataset random sampling is less likely to work than when you are increasing the importance of misclassified (in imbalanced dataset) instances as in Boosting.

Ashish Jain
  • 149
  • 4