How to predict probabilities in xgboost using R?

Question

The below predict function is giving -ve values as well so it cannot be probabilities.

param <- list(max.depth = 5, eta = 0.01,  objective="binary:logistic",subsample=0.9)
bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000)

pred_s <- predict(bst, x_mat_s2)

I google & tried pred_s <- predict(bst, x_mat_s2,type="response") but it didn't work.

Question

How to predict probabilities instead?

I don't see any obvious issues. (Although, I'm more familiar with the python wrapper). Have you tried adding `outputmargin=F` to the `predict` function? If somehow the `outputmargin` is set to `T`, it will return the value before the logistic transformation. — inversion, Sep 10 '15 at 17:24
Doesn't it output probabilities by default with the settings you used? I mean: have you examined pred_s and you are certain those are not probabilities? — kpb, Sep 08 '15 at 11:58
No its having negative values. Probability should vary between 0 to 1. — GeorgeOfTheRF, Sep 08 '15 at 12:05
For Python, you can copy `predict_proba` implementation from `sklearn` API: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py#L534 — Anton Tarasenko, Jan 19 '18 at 15:39

score 22 · Answer 1 · answered Mar 14 '16 at 17:07

22

Just use predict_proba instead of predict. You can leave the objective as binary:logistic.

answered Mar 14 '16 at 17:07

ihadanny

1,357
2
11
19

4

If this were Python and not R, then this answer might be sensible. Wrong language. – B_Miner Aug 15 '16 at 18:30
2

oops! thanks @B_Miner. I'm not deleting this answer as it might be helpful for others that will make the same mistake and think we're talking about python.. – ihadanny Aug 16 '16 at 08:49
For me this does not do the trick http://datascience.stackexchange.com/questions/14527/xgboost-predict-probabilities – Georg Heiler Nov 08 '16 at 07:59
6

xgboost does not have a predict_proba function – Ashoka Lella Aug 24 '17 at 19:26
2

XGBoost Classifier does has a predict_proba option https://xgboost.readthedocs.io/en/latest/python/python_api.html – Paul Bendevis Dec 24 '19 at 19:07
I made the same mistake of assuming this question was about Python. And i found my solution because someone else made the same mistake and answered the question. So upvoting the answer. Is something wrong with stackoverflow? Just wondering... – Hamza Zubair Sep 21 '20 at 05:18
This is a common misconception and it really depends on what you mean by probability. The default output using the "binary:logistic" objective is a classification score. Not a probability. A prediction is assigned to either class by e.g. >=0.5 == 1. Now a true probability, a binomial for instance, you require the 'multi:softprob' objective as as outlined in @cyber0g answer [here](https://datascience.stackexchange.com/a/8812/47485) – BenP Feb 05 '21 at 15:26

score 17 · Answer 2 · answered Nov 12 '15 at 07:18

17

Know I'm a bit late, but to get probabilities from xgboost you should specify multi:softmax objective like this:

xgboost(param, data = x_mat, label = y_mat,nround = 3000, objective='multi:softprob')

From the ?xgb.train:

multi:softprob same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class.

answered Nov 12 '15 at 07:18

cyberj0g

271
2
4

2

Thanks. How is this loss function different from binary:logistic for binary classification? – GeorgeOfTheRF Nov 12 '15 at 07:30
3

It's just a generalization of logistic function for multi-class case, should be no significant difference. – cyberj0g Nov 12 '15 at 07:39
I used it and it worked well, but I had an issue. I'm facing a warning "Parameters: { scale_pos_weight } might not be used.", which comes form the fact I use multi:softprob ([source](https://github.com/dmlc/xgboost/issues/5717#issuecomment-634621557)). Why my unbalanced data, it seems that this param is really important (if I go for a classic binary classification), and currently, when I change it, it has no impact because it's not taken into account. – Adept Aug 13 '20 at 08:06

score 3 · Answer 3 · edited Aug 28 '21 at 04:50

3

After the prediction:

pred_s <- predict(bst, x_mat_s2)

You can get the probability by:

pred_s$data

If this is a binary classification then pred_s$data includes prob.0, prob.1, response.

So you can get prob.1 by:

pred_s$data$prob.1

edited Aug 28 '21 at 04:50

Ethan

1,625
8
23
39

answered Jun 11 '19 at 19:05

Dera

31
1

score 0 · Answer 4 · answered Jun 06 '22 at 09:06

0

Many years late, but noticed this; and this is how I do it:

pred_s <- predict(bst, x_mat_s2, type="prob")

Typically then I then work with probability of my upper class "1":

pred_s[,2]

answered Jun 06 '22 at 09:06

RichardBJ

101
2

How to predict probabilities in xgboost using R?

4 Answers4