3

At each node of a decision tree, we must choose a collection of features to split along.

Suppose we know a priori that the features can be partitioned into subsets that are 'correlated', i.e. this partition describes someone's hat and this partition describes their shoes

Is there anyway to force this partitioning to be used when choosing which features to split along?

Like if you are choosing $k$ features, make sure that all $k$ are from the same partition.

  • 1
    Welcome to DataScienceSE. I think you have a confusion: at each node there is a single feature selected as the most discriminative (not a group of features), so the question doesn't really make sense imho. – Erwan Sep 07 '21 at 10:18
  • I don't believe this is true. See max_features argument of sklearn decision tree implementations – Ryan Keathley Sep 07 '21 at 23:49
  • 2
    `max_features` is used to limit the number of *candidate features* when looking for the best split at a particular node, see [this question and its answer](https://datascience.stackexchange.com/questions/41417/how-max-features-parameter-works-in-decisiontreeclassifier). Whatever the value of `max_features`, for one node there's always a single feature selected among the candidate features. Btw the definition of a decision tree makes this a requirement since there can be only one condition tested at every node. – Erwan Sep 08 '21 at 00:23
  • Ah thank you for your comments. Surely its still possible to consider multiple features though, just not within the usual definition of a decision tree – Ryan Keathley Sep 08 '21 at 04:14
  • The only way to handle a group of features in a single node with the regular decision tree method is to predefine (engineer) some features made from a combination of features. As far as I know all the existing variants of DT work this way, so what you're proposing would require inventing a new learning algorithm, this is not trivial. – Erwan Sep 08 '21 at 11:15
  • For splits based on multiple features, see https://datascience.stackexchange.com/q/60848/55122. But grouping them would need a further implementation quirk. – Ben Reiniger Sep 09 '21 at 23:37

3 Answers3

1

Maybe you can try running a principal components analysis (PCA) first for you data set, and then use these components as variables to build your tree. Therefore, at each split, the tree algorithm will be selecting from specific combinations of your original data.

PCA will build components that describe features present in your data, such as contrasts between variables, overall size, ...

Nick
  • 11
  • 1
0

One simple way would be to create new (composite) features associated with each subgroup of original features, and feed these new composite features to the tree model instead.

Else there is no built-in way for current tree algorithms to handle subgroups of correlated features as one super feature.

Nikos M.
  • 2,301
  • 1
  • 6
  • 11
0

This can be achieved, and it is already implemented in XGBoost. See a full description here.