Suitable metric choice for imbalanced multi-class dataset (classes have equal importance)

Question

What type of metrics I should use to evaluate my classification models, given that I have two imbalanced multi-class datasets (21 and 16 classes, respectively) where all classes have equal importance?

I am somehow convinced with macro-averaged-based metrics choice such as macro F1 and macro TNR, ...etc. Are macro-averaged-based metrics suitable for my problem based on the aforementioned inputs?

Links of possible interest, especially Harrell’s blog: https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en — Dave, Apr 03 '21 at 11:33

score 2 · Accepted Answer · answered Apr 03 '21 at 23:31

Yes, a macro-average measure is the standard choice in this context: a macro-average score is simply the mean of the individual score for every class, thus it treats every class equally.

With an strongly imbalanced dataset, this means that a small class which has only a few instances instances in the data is given as much weight as the majority class. Since the former is generally harder for a classifier to correctly identify, the macro-average performance value is usually lower than a micro-average one (this is normal of course).

Suitable metric choice for imbalanced multi-class dataset (classes have equal importance)

1 Answers1