25

The below predict function is giving -ve values as well so it cannot be probabilities.

param <- list(max.depth = 5, eta = 0.01,  objective="binary:logistic",subsample=0.9)
bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000)

pred_s <- predict(bst, x_mat_s2)

I google & tried pred_s <- predict(bst, x_mat_s2,type="response") but it didn't work.

Question

How to predict probabilities instead?

GeorgeOfTheRF
  • 2,018
  • 5
  • 17
  • 20
  • I don't see any obvious issues. (Although, I'm more familiar with the python wrapper). Have you tried adding `outputmargin=F` to the `predict` function? If somehow the `outputmargin` is set to `T`, it will return the value before the logistic transformation. – inversion Sep 10 '15 at 17:24
  • Doesn't it output probabilities by default with the settings you used? I mean: have you examined pred_s and you are certain those are not probabilities? – kpb Sep 08 '15 at 11:58
  • No its having negative values. Probability should vary between 0 to 1. – GeorgeOfTheRF Sep 08 '15 at 12:05
  • 1
    For Python, you can copy `predict_proba` implementation from `sklearn` API: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/sklearn.py#L534 – Anton Tarasenko Jan 19 '18 at 15:39

4 Answers4

22

Just use predict_proba instead of predict. You can leave the objective as binary:logistic.

ihadanny
  • 1,357
  • 2
  • 11
  • 19
  • 4
    If this were Python and not R, then this answer might be sensible. Wrong language. – B_Miner Aug 15 '16 at 18:30
  • 2
    oops! thanks @B_Miner. I'm not deleting this answer as it might be helpful for others that will make the same mistake and think we're talking about python.. – ihadanny Aug 16 '16 at 08:49
  • For me this does not do the trick http://datascience.stackexchange.com/questions/14527/xgboost-predict-probabilities – Georg Heiler Nov 08 '16 at 07:59
  • 6
    xgboost does not have a predict_proba function – Ashoka Lella Aug 24 '17 at 19:26
  • 2
    XGBoost Classifier does has a predict_proba option https://xgboost.readthedocs.io/en/latest/python/python_api.html – Paul Bendevis Dec 24 '19 at 19:07
  • I made the same mistake of assuming this question was about Python. And i found my solution because someone else made the same mistake and answered the question. So upvoting the answer. Is something wrong with stackoverflow? Just wondering... – Hamza Zubair Sep 21 '20 at 05:18
  • This is a common misconception and it really depends on what you mean by probability. The default output using the "binary:logistic" objective is a classification score. Not a probability. A prediction is assigned to either class by e.g. >=0.5 == 1. Now a true probability, a binomial for instance, you require the 'multi:softprob' objective as as outlined in @cyber0g answer [here](https://datascience.stackexchange.com/a/8812/47485) – BenP Feb 05 '21 at 15:26
17

Know I'm a bit late, but to get probabilities from xgboost you should specify multi:softmax objective like this:

xgboost(param, data = x_mat, label = y_mat,nround = 3000, objective='multi:softprob')

From the ?xgb.train:

multi:softprob same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class.

cyberj0g
  • 271
  • 2
  • 4
  • 2
    Thanks. How is this loss function different from binary:logistic for binary classification? – GeorgeOfTheRF Nov 12 '15 at 07:30
  • 3
    It's just a generalization of logistic function for multi-class case, should be no significant difference. – cyberj0g Nov 12 '15 at 07:39
  • I used it and it worked well, but I had an issue. I'm facing a warning "Parameters: { scale_pos_weight } might not be used.", which comes form the fact I use multi:softprob ([source](https://github.com/dmlc/xgboost/issues/5717#issuecomment-634621557)). Why my unbalanced data, it seems that this param is really important (if I go for a classic binary classification), and currently, when I change it, it has no impact because it's not taken into account. – Adept Aug 13 '20 at 08:06
3

After the prediction:

pred_s <- predict(bst, x_mat_s2)

You can get the probability by:

pred_s$data

If this is a binary classification then pred_s$data includes prob.0, prob.1, response.

So you can get prob.1 by:

pred_s$data$prob.1
Ethan
  • 1,625
  • 8
  • 23
  • 39
Dera
  • 31
  • 1
0

Many years late, but noticed this; and this is how I do it:

pred_s <- predict(bst, x_mat_s2, type="prob")

Typically then I then work with probability of my upper class "1":

pred_s[,2]
RichardBJ
  • 101
  • 2