1

I'm predicting a response that I would typically model under a gamma distribution, with relatively simple paramters, I'm just using the default other than these:

  • learning_rate = 0.01
  • max_depth = 6
  • base_score = the average of y

Since my base_score is set to the average of y, I would expect the average of my predictions to stay there as more trees are fit - afterall the predictions are already close to the target.

This isn't the case though, see plot below where the average prediction is in orange and the average actuals is yellow. It does converge eventually but the mean squared error on validation (light blue line) tells us the best model was at about 200 trees

absolute values of the y axes don't matter so I've removed them

What could be causing this?

0 Answers0