I'm working with an imbalanced multi-class dataset. I try to tune the parameters of a DecisionTreeClassifier, RandomForestClassifier and a GradientBoostingClassifier using a randomized search and a bayesian search.
For now, I used just accuracy for the scoring which is not really applicable for assessing my models performance (which I'm not doing). Is it also not suitable for parameter tuning?
I found that for example recall_micro and recall_weighted yield the same results as accuracy. This should be the same for other metrics like f1_micro.
So my question is: Is the scoring relevant for tuning? I see that recall_macro leads to lower results since it doesn't take the number of samples per class into account. So which metric should I use?