1

I am using RandomizedSearchcv for hyperparameter optimization. When I run the model, it shows the scores for each model training. The problem is, it trains way more than 10 models when in fact I expect it to train just 10 models by specifying n_iters to 10. Why is that? What should I do to limit the total runs to 10?

here is my code

from catboost import CatBoostRegressor
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

params = {
    'iterations': randint(100, 1000),
    'depth': randint(3, 10),
    'learning_rate': [0.01, 0.02, 0.03, 0.04, 0.05],
    'l2_leaf_reg': randint(1, 10),
    'border_count': randint(32, 255),
    'bagging_temperature': [0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
}

model = CatBoostRegressor(loss_function='RMSE', od_type='Iter', cat_features=[0, 1, 2, 3, 4], task_type='GPU')
search = RandomizedSearchCV(model, param_distributions=params, n_iter=10, scoring='neg_root_mean_squared_error')

# Fit the model
search.fit(X_train, y_train, eval_set=(X_val, y_val), plot=True)

2 Answers2

0

Are you just overlooking the "CV" part? By default, 5-fold cross-validation is performed to estimate the performance of each hyperparameter combination, so $5\cdot10=50$ models will be trained ($+1$ for the final refit model).

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53
-1

The n_iter parameter in RandomizedSearchCV specifies the number of iterations to run for hyperparameter optimization. However, it does not necessarily limit the total number of models that will be trained.

The reason for this is that RandomizedSearchCV performs a randomized search over the hyperparameter space. This means that it randomly samples from the parameter distributions for each iteration. Therefore, it's possible that some hyperparameter combinations will be repeated in multiple iterations, leading to more than 10 models being trained.

To limit the total number of models trained to 10, you can set the max_evals parameter in CatBoostRegressor. This parameter limits the maximum number of iterations for hyperparameter optimization. You can set it to 10 to ensure that only 10 models are trained:

     model = CatBoostRegressor(loss_function='RMSE', od_type='Iter',
            cat_features=[0, 1, 2, 3, 4],
            task_type='GPU', max_evals=10)

     search = RandomizedSearchCV(model, param_distributions=params, 
              n_iter=10,scoring='neg_root_mean_squared_error')
        
     search.fit(X_train, y_train, eval_set=(X_val, y_val), plot=True)

#Here is a another way:

[If you want to limit the total number of model fits to 10, you can set the n_estimators parameter in CatBoostRegressor to 1 and set the cv parameter in RandomizedSearchCV to 10 (or any other number that is less than or equal to 10). This will ensure that only one model is fit for each combination of hyperparameters and that a total of 10 models are fit.]

model = CatBoostRegressor(
    n_estimators=1, # Only fit one model per hyperparameter combination
    loss_function='RMSE', od_type='Iter',
    cat_features=[0, 1, 2, 3, 4], task_type='GPU'
)

search = RandomizedSearchCV(
    model, param_distributions=params, n_iter=10,
    scoring='neg_root_mean_squared_error', cv=10 # Limit to 10 folds
)

# Fit the model
search.fit(X_train, y_train, eval_set=(X_val, y_val), plot=True)
Hrushi
  • 111
  • 2
  • hi welphera, thank you for your response. I am puzzled with this expression, though: is means that it randomly samples from the parameter distributions for each iteration. Therefore, it's possible that some hyperparameter combinations will be repeated in multiple iterations, leading to more than 10 models being trained. do you mean that there can be more than one hyperparameter combınation to be trained for each iterations? and would you recommend setting max_evals to the number that I set n_iter? or would it bring decreased hyperparameters optimizatıon? – Mehmet Deniz Apr 01 '23 at 19:12
  • Set max_evals=10. – Hrushi Apr 02 '23 at 20:20