2

I am looking for a way to quantify the performance of multi-class model labelers, and thus compare them. I want to account for the fact that some classes are ‘closer’ than others (for example a car is ‘closer’ to a ‘truck’ than a ‘flower’ is. So, if a labeler classifies a car as a truck that is better than classifying the car as a flower. I am considering using a Jaccard similarity score. Will this do what I want?

Tavi
  • 21
  • 1
  • How would you use Jaccard similarity exactly? – Erwan Feb 06 '20 at 18:11
  • The Jaccard score computes the average of the Jaccard similarity coeffficients. So basically it's the average of the union and intersection of the two (or more) sets of labels. – Tavi Feb 06 '20 at 18:21
  • yes but I mean what are the sets that you are going to compare? I don't see how Jaccard can find a higher similarity between classes "car" and "truck" than classes "car" and flowers". Or maybe you independently calculate the similarity based on the words context in a large corpus? – Erwan Feb 06 '20 at 18:24
  • I was thinking also that RMSE might be valuable since that takes into consideration the distance from truth. – Tavi Feb 06 '20 at 18:25
  • Agreed. I would have to manually indicate 'closeness' – Tavi Feb 06 '20 at 18:26

1 Answers1

0

There is no commonly established metric do that. You'll have to write custom code based on manually indicating rank ordered preferences of misclassifications.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102