Early stopping and bounds

Question

Say I am training neural networks using a train set and set aside a validation set V. I obtain models h's after each epoch along with the validation losses(0-1 loss) $\hat{L}(h_1,V)$, $\hat{L}(h_2,V)$ ... if I use the early stopping rule suggested here(top answer).

Is the resulting $\hat{L}(h_*,V)$ an unbiased estimate of the true loss?
How can I bound the true loss using $\hat{L}(h_*,V)$ ?

I'm guessing no for the first one since the stopping rule depends on the partition of my data set.

Afaik the bounds that can be applied depends on the size of my hypothesis set and I'm not entirely sure if it's finite, countable or uncountable in this case.

Neil Slater · Accepted Answer · 2017-09-22T06:56:16.163

Is the resulting $\hat{L}(h_*,V)$ an unbiased estimate of the true loss?

No. You have taken multiple measurements, each with some uncertainty, and chosen the maximum or minimum value.

How can I bound the true loss using $\hat{L}(h_*,V)$ ?

In the general case you cannot. It will depend on how much over-fitting is occurring within the model on the training set, size of cv set, amount of times it has been used, and how similar the model's performance was on each use. There is also sampling bias in the cv set, and that interacts with the selection process.

What is generally done if you need unbiased estimate at the end of production is a train/cv/test split. The cv set is used for model selection, and once you have a single model selected, you estimate its loss - or other key metric - on the test set. It is important to use the test set minimally and not in order to select models, if you want it to be an unbiased measure. Otherwise you repeat the problem.

Another approach which maintains confidence in cv-based metrics is to use k-fold cross validation. Taking taking the mean of a metric in k-fold cross validation is still biased once you have used it a few times, but the bias is reduced somewhat. You can take that idea further with nested cross-validation, which allows you to get an unbiased estimate of model performance in a general fashion (i.e. using the same hyper-parameters) from more of your data.

Do not forget nested cross validation. If you have little data, you cannot rely on a single test dataset. Choose the model within the inner CV and test such selection in the outer CV. — Ricardo Magalhães Cruz, Sep 21 '17 at 21:28
@RicardoCruz: Thanks, added that. However, if I understand it correctly, nested CV will give unbiased estimate of "a model built and trained like this" as opposed to unbiased estimate of a specific trained model, which is an important caveat. — Neil Slater, Sep 22 '17 at 06:58

Early stopping and bounds

1 Answers1