4

I have a logistic regression classifier that shows differing levels of performance for precision and recall at different probability boundaries as follows:

enter image description here

The default threshold for the classifier to decide which class something belongs to is 0.5. However, am I right in understanding that in order to get the best performance trade-off I should set the decision boundary to be about 0.82 below? That can be done in Scikit-Learn, but I want to make sure that I am drawing the correct conclusions. Any advice would be appreciated.

Sandy Lee
  • 227
  • 1
  • 8

1 Answers1

3

The intersection of the precision and recall curves is certainly a good choice, but it's not the only one possible.

The choice depends primarily on the application: in some applications having very high recall is crucial (e.g. a fire alarm system), whereas in some other applications precision is more important (e.g. deciding if somebody needs a risky medical treatment). Of course if your application needs high recall you'd choose a threshold before 0.6, if it needs high precision you'd choose a threshold around 0.85-0.9.

If none of these cases apply, people usually choose an evaluation metric to optimize: F1-score would be a common one, sometimes accuracy (but don't use accuracy if there is strong class imbalance). It's likely that the F1-score would be optimal around the point where the two curves intersect, but it's not sure: for example it might be a bit before 0.8, when the recall decreases slowly and the precision increases fast (this is just an example, I'm not sure of course).

My point is that even if it's a perfectly reasonable choice in this case, in general there's no particular reason to automatically choose the point where precision and recall are equal.

Erwan
  • 24,823
  • 3
  • 13
  • 34
  • Makes sense. Another method I have come across for threshold selection is the Zweig-Campbell score (paper available here: https://bit.ly/3qnoIuW). Have you ever come across that? Is it useful? It is starting to feel like I am over thinking this. – Sandy Lee Jan 15 '21 at 12:24
  • @SandyLee no, I don't know this Zweig-Campbell score. Interestingly, when I searched it on DataScienceSE and [CrossValidatedSE](https://stats.stackexchange.com) to see if it is commonly used (turns out it's not), I found [this answer](https://stats.stackexchange.com/a/278969/250483) which is probably relevant to your question. Personally, in cases where there's no reason to favor either precision or recall I would just select the threshold where the F1-score is maximal: I've seen this a lot so it's pretty standard... but that doesn't necessarily mean that it's "the right way" ;) – Erwan Jan 15 '21 at 23:07
  • Thanks for your help with this Erwan. I really appreciate the support. – Sandy Lee Jan 19 '21 at 09:34