0

I want to optimize Catboost and XGBoost models and visualize this process such that:

  1. Use 3-fold cross-validation
  2. Use my own pre-processing pipeline (Missing value imputation, over- or undersampling)
  3. Use Catboost and XGboost - independent tools for cross-validation and visualization (NOT use catboost.cv and xgboost.cv)

Now I have this:

from imblearn.pipeline import Pipeline as ImbPipeline

pipe = ImbPipeline(steps=[
    ('imp', Imputer()),
    ('res', RandomOverSampler(random_state=11)),
    ('clf', CatBoostClassifier(cat_features=cols_cat, random_seed=11, verbose=50))
])

param = [{'clf__iterations': [500]}]

cb = GridSearchCV(pipe, param)
cb.fit(X_train, y_train)

After cross-validation in complete (500 iterations and 3 folds), I want to build a plot like this:

100 iterations - average CV Test score by 3 folds ...
200 iterations - average CV Test score by 3 folds ...
300 iterations - average CV Test score by 3 folds ...
400 iterations - average CV Test score by 3 folds ...
500 iterations - average CV Test score by 3 folds ...

It's like early stopping, but I want to have results averaged by 3 folds, so I am ready to wait 500 iterations - but I don't want to do this search:

param = [{'clf__iterations': [100, 200, 300, 500]}]

catboost.cv can build such plots, but catboost.cv doesn't support custom pipelines.

Using libraries like Optuna is an option, I don't hold on GridSearchCV.

Do I can achieve this?

Ars ML
  • 61
  • 3

0 Answers0