1

According to xgboost paper, regularization is given by:

$$\Omega(f) = \gamma T + \lambda || w||^2$$

where $\gamma$ is the complexity of a tree (i.e., number of leaves in the tree).

The parameter gamma in xgboost library, on the other hand, controls the minimum split at a node in order to proceed. Hence, is the $\gamma$ in the equation above used by xgboost software package? I could not find any reference to it.

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53
zzzbob
  • 45
  • 4

1 Answers1

2

In the paragraph following equation (1):

$T$ is the number of leaves in the tree.

$\gamma$ is a hyperparameter that affects how much regularization occurs on the size (number of leaves) of the tree.

Now it turns out that you can interpret $\gamma$ (at least roughly, see note at bottom) as ([source]):

Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be.

You can see that from equation (2), the regularized objective:

$$\mathcal{L}(\phi) = \sum_i l(\hat{y}_i, y_i) + \sum_k \Omega(f_k),\\ \text{where }\Omega(f)=\gamma T + \frac12 \lambda \|w\|^2.$$

By making the split, you increase $T$ by one, so the penalty increases by $\gamma$, and so your base loss term $l$ needs to decrease by at least $\gamma$ for this to be an overall improvement. Note: Of course, this ignores what happens to the leaf weights $w$ in splitting one node into two.

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53