3

I am using gridsearchcv to tune the parameters of my model and I also use pipeline and cross-validation. When I run the model to tune the parameter of XGBoost, it returns nan. However, when I use the same code for other classifiers like random forest, it works and it returns complete results.

kf = StratifiedKFold(n_splits=10, shuffle=False)

SCORING = ['accuracy', 'precision', 'recall', 'f1' ]

# define parametres for hypertuning
params = {
    'Classifier__n_estimators': [5, 10, 20, 50, 100, 200]
}

XGB = XGBClassifier()
UnSam = RepeatedEditedNearestNeighbours()

pipe = Pipeline(steps=[('UnderSampling', UnSam ), ('Classifier', XGB)])
# ___________________________________________

mod = GridSearchCV(pipe, params, cv =kf, scoring = SCORING, refit='f1', return_train_score=True)
mod.fit(X_train, y_train)

Here is my code and when I run it, the following results are obtained:

{'Classifier__n_estimators': 5}
__________________________________________________
F1 :  [nan nan nan nan nan nan] 
 Recall :  [nan nan nan nan nan nan] 
 Accuracy :  [nan nan nan nan nan nan] 
 Precision :  [nan nan nan nan nan nan]

Another thing that is weird is that when I apply the same code for tunning the penalty in Logistics Regression, it returns nan for l1 and elasticnet.

kf = StratifiedKFold(n_splits=10, shuffle=False)

SCORING = ['accuracy', 'precision', 'recall', 'f1' ]

# define parametres for hypertuning
params = {
    'Classifier__penalty': ['l1','l2','elasticnet']
}

LR = LogisticRegression(random_state=0)
UnSam = RepeatedEditedNearestNeighbours()

pipe = Pipeline(steps=[('UnderSampling', UnSam ), ('Classifier', LR)])
# ___________________________________________

mod = GridSearchCV(pipe, params, cv =kf, scoring = SCORING, refit='f1', return_train_score=True)
mod.fit(X_train, y_train)

The results are as follows:

{'Classifier__penalty': 'l2'}
__________________________________________________
F1 :  [  nan 0.363   nan] 
 Recall :  [   nan 0.4188    nan] 
 Accuracy :  [   nan 0.7809    nan] 
 Precision :  [   nan 0.3215    nan]
Amin
  • 191
  • 3
  • 9

1 Answers1

9

By default, GridSearchCV provides a score of nan when fitting the model fails. You can change that behavior and raise an error by setting the parameter error_score="raise", or you can try fitting a single model to get the error. You can then use the traceback to help figure out where the problem is.

For the LogisticRegression, I can identify the likely culprit: the default solver is lbfgs, which cannot handle L1 or ElasticNet penalty. Use saga.

I don't immediately see an issue with the XGBoost model or parameters. Get the error traceback using the first paragraph, and search/ask that as a separate question if needed.

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53
  • useful info. upvoted. But, do you know why does a model fit fail in the 1st step? If you refer the linked post below, it provides gridsearchCV score for some CV executions but it is again `nan` for some. Can you help with this? https://datascience.stackexchange.com/questions/109322/how-to-gridsearchcv-on-balancedbagging-classifier – The Great Mar 24 '22 at 13:41