6

When using the python / sklearn API of xgboost are the probabilities obtained via the predict_proba method "real probabilities" or do I have to use logit:rawand manually calculate the sigmoid function?

I wanted to experiment with different cutoff points. Currently using binary:lgisticvia the sklearn:XGBClassifier the probabilities returned from the prob_a method rather resemble 2 classes and not a continuous function where changing the cut-off point impacts the final scoring.

Is this the right way to obtain probabilities for experimenting with the cutoff value?

enter image description here

Georg Heiler
  • 327
  • 2
  • 3
  • 13

2 Answers2

3

Curious Georg if you ran across this article in your pursuit of trying to generate probabilities. It is worth noting that binary:logistic and multi:softprob return predicted probability of each data point belonging to each class.

You can look here to see how the following code is used: XGBoost Predict_Proba Code

  • Still the output does not look like probabilities. https://github.com/dmlc/xgboost/issues/1763 in the jvm version it seems to work better – Georg Heiler Jan 09 '17 at 06:39
0

LightGBM forum was the answer ;) https://github.com/Microsoft/LightGBM/issues/272#issuecomment-276168493

  • apparently, my model is fitting the result very good, i.e. it is very sure about the class probabilities
  • I did not expect such clear-cut boundaries and thus was confused if this could be correct.
Georg Heiler
  • 327
  • 2
  • 3
  • 13
  • Can you please summarize the main points from that article? If the link stops working, this answer will become useless. Also, we don't want to be just a link farm pointing to other places. (See also http://datascience.stackexchange.com/help/deleted-answers.) Thanks! – D.W. Jan 30 '17 at 23:09