Evaluating optimal values for depth of tree

Question

I'm studying the performance of an AdaBoost model and I wonder how it performs in regard to the depth of the trees.

Here's the accuracy for the model with a depth of 1

and here with a depth of 3

From my point of view, I would say the lower one looks better but somehow I guess the upper one is better as the training accuracy doesn't vanish (overfitting?)? The question resp. answer from Hyperparameter tunning for Random Forest- choose the best max depth underlines my assumption, though.

I think your y axis is labelled incorrectly. It should say log loss, not accuracy. — Jonathan, Jul 19 '21 at 05:00
true, I was also wondering.. but the question itself( would) remain(s)? — Ben, Jul 19 '21 at 06:40

score 1 · Answer 1 · answered Jul 19 '21 at 09:53

The training error shouldn't be too far from test error, otherwise it is a high deviance scenario and you could be in an overfitting situation in production.

However, having a higher deviance could be normal by increasing depth, but it shouldn't happen if you have enough data.

Consequently, if you haven't a lot of data, the depth of 1 seems better, and you should increase the training iterations to lower the error.

In addition to that, there is just a small difference in test results between the depth of 1 and the depth of 3. So, the small benefit of the depth of 3 doesn't worth the risk of having a high deviance scenario. But maybe max depth of 2 is better than 1...

Evaluating optimal values for depth of tree

1 Answers1