0

As per the original paper on xgboost, the best split at a node is found by maximising the quantity below

$ \cal{L}_{\rm split} = \frac{1}{2} \sum \left [ \frac{G_L}{H_L + \lambda} + \frac{G_R}{H_R + \lambda} - \frac{G_I}{H_I + \lambda} \right ] - \gamma $

There exists a gamma parameter in xgboost package; assuming it is referring to the same parameter as in the equation, why would it impact the choice of the split if its value does not change?

Bob
  • 65
  • 1
  • 7

1 Answers1

2

You are correct that it does not affect the choice of which split to make. Instead, it affects the choice of whether to make any split. If every $\mathcal{L}_{\text{split}}$ is negative, then no split will be made at the node, pre-pruning the tree.

See also Tree complexity in xgboost

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53