7

In the context of performance measures for classification, I have a question about recall and precision.

Looking at the definition of recall:-

$recall = \frac{T_p}{T_p+F_n}$

When I look at this, it sounds like a $conditional$ probability to me -- probability that a test instance will be classified as positive, given that it is indeed positive:-

$ recall = Pr(X\hspace{1mm}is\hspace{1mm}predicted\hspace{1mm}as\hspace{1mm}positive | X=positive ) = \frac{T_p}{T_p+F_n}$

Here I am taking liberty to think (as it goes from the definition of $F_n$ ) that $F_n$ in the denominator is actually count of the test instances which are positive but got mis-classified as negative.

In the same light, if I think of precision now, this is how I am thinking of it as a probabilitty that a new test instance actually being positive given that it is predicted as a positive:-

$ precision = \frac{T_p}{T_p+F_p} = Pr(X=positive | X\hspace{1mm}is\hspace{1mm}predicted\hspace{1mm}as\hspace{1mm}positive)$

Is this interpretation of precision and recall correct?

Dhiraj
  • 223
  • 2
  • 5
  • 3
    Yes, it is correct. – Valentas Apr 04 '17 at 08:15
  • Please post it as an answer, so I can accept it as an answer. Just formality. Thx – Dhiraj Apr 04 '17 at 08:17
  • Tried posting confirmation myself that yes it is correct interpretation but looks like this site doesn't allow such a short answer. So I guess I will just have to leave this as it is then. – Dhiraj Apr 05 '17 at 23:02

2 Answers2

3

Your interpretation is correct.

$ F_n $ represents the number of false negatives whereas $ F_p $ represents the number of false positives.

Recall

answers the question: when presented a positive example, how often does the classifier get it right?

aliases: sensitivity, true positive rate (TPR)

Precision

answers the question: out of all the examples the classifier thought were positive, how often were the examples actually positive?

aliases: positive predictive value (PPV)

Ben
  • 2,512
  • 3
  • 14
  • 28
1

Another interesting isight is, that Bayes lemma connects both precision and recall when they are viewed as estimates of contional probabilities: $P(\hat{C}=P|C=P):=\frac{P(C=P,\hat{C}=P)}{P(C=P)}=\frac{P(C=P|\hat{C}=P)P(\hat{C}=P)}{P(C=P)},$

with

  • $\hat{C}$ predicted class
  • $C$ true class (e.g. $C=P$ means that the true class is positive)
  • $P(\hat{C}=P|C=P)$ Recall
  • $P(C=P|\hat{C}=P$ Precision
  • $P(C=P)$ prevalence
  • $P(\hat{C}=P)$ probability predicted positive (implicitly depending on thresholds), estimated by fraction of predicted positive cases

The implicit dependence of $P(\hat{C}=P)$ on the thresholds is the reason why the precision recall curve is not just a straight line.

Ggjj11
  • 166
  • 3