I have 3 graphs of a binary Logistic Regression that I want to understand better what is happening and learn of a strategy to make the model better

Question

My problem is the following: I have a binary Logistic Regression model with a very imbalanced dataset that outputs the percentage of the prediction. As can be seen in the images, as the threshold is increased there's a certain point it stops predicting. I am researching calibration techniques to try and make it work better but I thought maybe I could get some direction here.

I've tried giving weights to the classes but it didn't seem to get much better.

Is it a probability calibration problem?

The three graphs below are shown in no particular order.

Thanks in advance.

Just wondering if you tried finding the AUC of precision-recall curves for these models? — fractalnature, Mar 05 '20 at 17:37

score 1 · Answer 1 · answered Mar 10 '20 at 16:05

Given a confusion matrix:

            predicted
            (+)   (-)
            ---------
       (+) | TP | FN |
actual      ---------
       (-) | FP | TN |
            ---------

we know that:

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)

As can be seen in the images, as the threshold is increased there's a certain point it stops predicting

Thats not true, it stops predicting accurately only one class. What is udnerstandable because you moved threshold to make all your predictions on the other class.

Dont optimise this yourself, for example random forest will do this for you imiplicitly, (determining cut-off level) or just do hyperparameter optimisation youself.

I have 3 graphs of a binary Logistic Regression that I want to understand better what is happening and learn of a strategy to make the model better

1 Answers1