I have an imbalanced dataset and I want to train a binary classifier to model the dataset.
Here was my approach which resulted into (relatively) acceptable performance:
1- I made a random split to get train/test sets.
2- In the training set, I down-sampled the majority class to make my training set balanced. To do that, I used the resample method from sklearn.utils module.
3- I trained the model, and then evaluated the performance of the model on the test set (which is unseen and still imbalanced).
I got fairly acceptable results including precision, recall, f1 score and AUC.
Afterwards, I wanted to try out something. Therefore, I flipped the labels in both training set and testing set (i.e. converting 1 to 0 and 0 to 1).
Then I repeated the step 3 and trained the model again with flipped labels. This time, the performance of model dropped and I got much lower precision and f1 score on test set.
Additional details:
The model was trained with GridSearchCV using a LogisticRegression estimator.
I have then two question: is there anything wrong with my approach (i.e. downsampling)? and how come flipping the label led into worse results? I have a feeling that it could be due to the fact that my test set is still imbalanced. But more insight will be appreciated.