Nested cross-validation generalization error for multiple models

Question

I am referring to this question:

Nested cross-validation and selecting the best regression model - is this the right SKLearn process?

In the answers it shows that nested cv can estimate the generalization error of hyperparameter optimization for different algorithms. But in my opinion the choice between different algorithms is also an optimization process, which leads to generalization errors. Therefore, either the algorithm choice should be part of the inner cv or another third cv would have to be introduced to evaluate the error for the algorithm choice. Is this a correct assumption ?

score 1 · Accepted Answer · answered Nov 07 '18 at 14:29

In general you are right and in this answer it has been done as far as I see. The models are compared to each other while the best tuning of them is found, both inside the loop. It looks fine.

About your point, yes. But the point in Machine learning is that at some point we need to stop/limit our attempts as the number of algorithms which can do the task are very large. We usually try to evaluate different families of algorithms and then narrow the search from there but at the end we can never claim that the best answer we found is necessarily the best possible answer. In another POV, this is the main idea behind many research papers in ML. They just creatively find/modify an algorithm and show that it works better than previously applied algorithm through a benchmark dataset.

Nested cross-validation generalization error for multiple models

1 Answers1