1

I read this interesting book on conformal predictors: https://arxiv.org/abs/2107.07511. Conformal predictors are a way to choose a set that's guaranteed to include the true labels with some pre-chosen certainty. I was wondering if there's a way to get conformal predictors to output calibrated probabilities? For example let's say I have a binary classification (dog or cat images). Conformal predictors can be used to predict the image is dog or cat in difficult examples. But what I'm looking for is something like calibrated p-values for the prediction. The sigmoid output values (from my neural net, for example), are well known to not be reflective of actual p values. Can conformal predictors do this (assuming of course I have an available calibration dataset)? If so can anyone point me to the procedure for this? I can't find it.

user151434
  • 11
  • 1

1 Answers1

0

I think what you are looking for is something like crepes: https://github.com/henrikbostrom/crepes that seems to do what you are exactly asking (providing p-values). I stumbled upon this while looking at method to calibrate the models (i.e. fitting some spline on outputs).

The code below provide what you ask on a sklearn random forest:

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from crepes import WrapClassifier
from sklearn.ensemble import RandomForestClassifier

dataset = fetch_openml(name="qsar-biodeg", parser="auto")

X = dataset.data.values.astype(float)
y = dataset.target.values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
X_prop_train, X_cal, y_prop_train, y_cal = train_test_split(X_train, y_train, test_size=0.25)

rf = WrapClassifier(RandomForestClassifier(n_jobs=-1))
rf.fit(X_prop_train, y_prop_train)
rf.calibrate(X_cal, y_cal)
rf.predict_p(X_test)

Notice that we need to split the data in three to make sure the calibration is performed separately. Regarding other implementation (tf/keras - pytorch) I don't know if this is compatible - I think not -. I have also found a venn-abers implementation that doesn't seems to need access to the model.

Lucas Morin
  • 2,513
  • 5
  • 19
  • 39
  • 1
    Hello, yes I did come across crepes and you are right, but as you mentioned I have a pytorch model. They use the function found here: https://github.com/henrikbostrom/crepes/blob/main/src/crepes/base.py (line 108-162). Not sure where this equation comes from, or the rationalization for such. It seems very weird to me (has a np.random.rand?) – user151434 Jul 04 '23 at 20:20