I am looking for a way to quantify the performance of multi-class model labelers, and thus compare them. I want to account for the fact that some classes are ‘closer’ than others (for example a car is ‘closer’ to a ‘truck’ than a ‘flower’ is. So, if a labeler classifies a car as a truck that is better than classifying the car as a flower. I am considering using a Jaccard similarity score. Will this do what I want?
Asked
Active
Viewed 36 times
2
-
How would you use Jaccard similarity exactly? – Erwan Feb 06 '20 at 18:11
-
The Jaccard score computes the average of the Jaccard similarity coeffficients. So basically it's the average of the union and intersection of the two (or more) sets of labels. – Tavi Feb 06 '20 at 18:21
-
yes but I mean what are the sets that you are going to compare? I don't see how Jaccard can find a higher similarity between classes "car" and "truck" than classes "car" and flowers". Or maybe you independently calculate the similarity based on the words context in a large corpus? – Erwan Feb 06 '20 at 18:24
-
I was thinking also that RMSE might be valuable since that takes into consideration the distance from truth. – Tavi Feb 06 '20 at 18:25
-
Agreed. I would have to manually indicate 'closeness' – Tavi Feb 06 '20 at 18:26
1 Answers
0
There is no commonly established metric do that. You'll have to write custom code based on manually indicating rank ordered preferences of misclassifications.
Brian Spiering
- 20,142
- 2
- 25
- 102