4

For Gradient Boosting Models such as XGBOOST and LGBM does n_warmup_steps in optuna.pruners.MedianPruner refer to the minimum number of folds evaluated before pruning is triggered?

I.e. if number of CV folds equals 5 then n_warmup_steps=1 means pruning only takes place after at least one 2/5 folds is evaluated (since it starts counting at 0)?

1 Answers1

3

The steps in n_warmup_steps refer to the incremental steps taken during gradient decent. So with n_warmup_steps=1 the trial will directly check the loss for the very first incremental step. To put it differently, with n_warmup_steps=1 a trial has to achieve a good result immediatly after the gradient decent started.
As you are using optuna.pruners.MedianPruner following applies (copied from the official documentation):

Prune if the trial’s best intermediate result is worse than median of intermediate results of previous trials at the same step.

So if a trial does not yield a better result (measured via the objective/loss function) than the median of the intermediate results at the same step, it will be pruned.
The parameter n_warmup_steps can be used in conjunction with n_startup_trials in order to get a robust median value first - so numbers of trials without pruning. After n_startup_trials, the trials will be pruned after n_warmup_steps if they performed worse than the preceding trials after the same amount of steps while gradient decending.

The rationale behind the parameter n_warmup_steps ist that a trial can start with a loss value below the median, but can give a much better loss value eventually - after more iterations (e.g. depending on the learning rate [...]).

It's not easy for me to explain it, maybe the images at the beginning of this article give a better intuition.

OliverHennhoefer
  • 328
  • 2
  • 3
  • 11