1

I want to recreate catboost.utils.select_threshold(desc) method for CalibratedClassifierCV model.

In Catboost I can select desired fpr value, to return the boundary at which the given FPR value is reached.

My goal is to the same logic after computing fpr, tpr and boundaries from sklearn.metrics.roc_curve

I have the following code

prob_pred = model.predict_proba(X[features_list])[:, 1]
            
fpr, tpr, thresholds = metrics.roc_curve(X['target'], prob_pred)

optimal_idx = np.argmax(tpr - fpr) # here I need to use FPR=0.1
boundary = thresholds[optimal_idx]
 
binary_pred = [1 if i >= boundary else 0 for i in prob_pred]

I guess it should be simple formula but I am not sure how to place 0.1 value here to adjust threshold.

Michael
  • 131
  • 4

1 Answers1

1

I've done my research and testing and it's that simple:

def select_treshold(proba, target, fpr_max = 0.1 ):
    # calculate roc curves
    fpr, tpr, thresholds = roc_curve(target, proba)
    # get the best threshold with fpr <=0.1
    best_treshold = thresholds[fpr <= fpr_max][-1]
    
    return best_treshold
Michael
  • 131
  • 4