1

I wonder whether algorithms such as GBM, XGBoost, CatBoost, and LightGBM perform more than two splits at a node in the decision trees? Can a node be split into 3 or more branches instead of merely binary splits? Can more than one feature be used in deciding how to split a node? Can a feature be re-used in splitting a descendant node?

Chong Lip Phang
  • 221
  • 2
  • 8
  • 1
    Your question needs more focus: the last two questions are sufficiently different from the title question that they should be asked separately. And indeed those have already been asked, e.g. https://datascience.stackexchange.com/q/60848/55122 and https://datascience.stackexchange.com/q/10713/55122 – Ben Reiniger Dec 18 '20 at 15:35

1 Answers1

2

Gradient boosting can be applied to any base model, so doing it with a Quinlan-family decision tree (which allow for such higher-arity splits for categorical features) should make this possible. However, all implementations of gradient boosted trees that I know of (and certainly XGBoost, CatBoost, LightGBM) all use CART as their tree model, so you won't get anything but binary trees. (These GBMs do modify CART a little, e.g. in using histogram binning to reduce the split searches, but nothing as drastic as n-ary splits for categoricals.)

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53
  • Hey, thanks for the answer. Why are the limitations put into place? Won't it be more accurate and flexible to do away with such limitations? Should I try implementing my own gradient boosting regressor that can perform n-ary splits, use more than one feature at a split, and reuse features? – Chong Lip Phang Dec 18 '20 at 15:52
  • "More accurate and flexible" on the training set might overfit, so it's not an obvious improvement. multivariate splits and n-ary splits for continuous variables will hurt training time, as well; but if you build such a thing and can show improvement on some benchmark datasets, maybe that can be overcome. Reusing features is already standard. – Ben Reiniger Dec 18 '20 at 16:11
  • It suddenly occurred to me that there is no point in doing a ternary split as it can always be done in two-level binary splits. As for multivariate splits, I think it is worth looking into. I suppose reusing features in the descendants of a decision tree has been the standard in the implementations of regressors. – Chong Lip Phang Dec 18 '20 at 16:17