5

I know that we iteratively model the residuals in case of a gradient boosted regression problem. The intuition is very well explained at kaggle.

Can someone explain what are the residuals that are modeled in case of a classification scenario?

Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34
Arc
  • 151
  • 1
  • 3

1 Answers1

3

It's a similar trick to logistic regression. We use an unbounded value that we can map to a probability by using the sigmoid function. Only at the end of the gradient boosting tree model we map it to a probability. The loss function used for deciding the weights of the terminal nodes is adapted from the normal sigmoid loss to not have to map directly to probabilities. I cannot find it at the moment but it should be easy to derive.

EDIT: I found the post that I read this from the other day, it's over at stats stackexchange: https://stats.stackexchange.com/questions/204154/classification-with-gradient-boosting-how-to-keep-the-prediction-in-0-1

Jan van der Vegt
  • 9,328
  • 34
  • 52