3

My problem is the following: I have a binary Logistic Regression model with a very imbalanced dataset that outputs the percentage of the prediction. As can be seen in the images, as the threshold is increased there's a certain point it stops predicting. I am researching calibration techniques to try and make it work better but I thought maybe I could get some direction here.

I've tried giving weights to the classes but it didn't seem to get much better.

Is it a probability calibration problem?

The three graphs below are shown in no particular order.

Thanks in advance.

1 2 3

fractalnature
  • 805
  • 6
  • 19

1 Answers1

1

Given a confusion matrix:

            predicted
            (+)   (-)
            ---------
       (+) | TP | FN |
actual      ---------
       (-) | FP | TN |
            ---------

we know that:

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)

As can be seen in the images, as the threshold is increased there's a certain point it stops predicting

Thats not true, it stops predicting accurately only one class. What is udnerstandable because you moved threshold to make all your predictions on the other class.

Dont optimise this yourself, for example random forest will do this for you imiplicitly, (determining cut-off level) or just do hyperparameter optimisation youself.

Noah Weber
  • 5,609
  • 1
  • 11
  • 26