For the imbalanced datasets:
- Can we say the Precision-Recall curve is more informative, thus accurate, than ROC curve?
- Can we rely on F1-score to evaluate the skillfulness of the resulted model in this case?
For the imbalanced datasets:
Precision-recall curves are argued to be more useful than ROC curves in "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets" by Saito and Rehmsmeier. They argue that ROC might lead to the wrong visual interpretation of specificity.
F1-score equally balances precision and recall. In some domains it might be more useful to more heavily weight precision (F < 1) or recall (F > 1).