I know that we iteratively model the residuals in case of a gradient boosted regression problem. The intuition is very well explained at kaggle.
Can someone explain what are the residuals that are modeled in case of a classification scenario?
I know that we iteratively model the residuals in case of a gradient boosted regression problem. The intuition is very well explained at kaggle.
Can someone explain what are the residuals that are modeled in case of a classification scenario?
It's a similar trick to logistic regression. We use an unbounded value that we can map to a probability by using the sigmoid function. Only at the end of the gradient boosting tree model we map it to a probability. The loss function used for deciding the weights of the terminal nodes is adapted from the normal sigmoid loss to not have to map directly to probabilities. I cannot find it at the moment but it should be easy to derive.
EDIT: I found the post that I read this from the other day, it's over at stats stackexchange: https://stats.stackexchange.com/questions/204154/classification-with-gradient-boosting-how-to-keep-the-prediction-in-0-1