XGBoost outputs tend towards the extremes

Question

I am currently using XGBoost for risk prediction, it seems to be doing a good job in the binary classification department but the probability outputs are way off, i.e., changing the value of a feature in an observation by a very small amount can make the probability output jump from 0.5 to 0.99.

I barely see outputs in the 0.6-0.8 range. In all cases, the probability is less than 0.99 or 1.

I am aware of post training calibration methods such as Platt Scaling and Logistic Correction, but I was wondering if there is anything I can tweak in the XGBoost training process.

I call XGBoost from different languages using FFI, so it would be nice if I can fix this issue without introducing other calibration libraries, e.g., changing eval metric from AUC to log loss.

Have you checked that any scaling you applied to the training set has also been applied correctly to the test set? — bradS, Jun 27 '18 at 08:01
I have the same issue. Could you find anything useful? As a temporary solution I sorted to probs and normalized under 0 to 1 based on occurrences but i dont think its a good solution. — Ilker Kurtulus, Aug 15 '19 at 15:00
Actually `XGBoost` is quite robust against outliers, when comparing to other vanilla methods like `SVM`. — Piotr Rarus, Nov 26 '19 at 08:49

score 1 · Answer 1 · answered Nov 04 '19 at 12:23

The first I'd ask would be "What is positive/negative ratios?". Just because I had a similar issue multiple times when classes were very imbalanced. If it's your case you can try to balance dataset or try to play with scale_pos_weight parameter of XGboost. In the case of 50/50 classes ratios, probablities might be more or less normalised.

Also, it's very likely you have overfitting at the same time. Try to tune learning rate, sampling parameters and regularization parameters (reg_alpha, reg_lambda).

Lucas Morin · Answer 2 · 2020-03-03T12:22:19.410

XGBoost is not naturally calibrated in probabilities. You need to use something like:

objective = "binary:logistic"

to make sure that the model's output can be interpreted as a probability. Otherwise you may only get scores, that can only be used to rank instances. As most performance metrics can be calculated on scores, it is a common mistake to use what look like probabilities (associated with good performance metrics) instead of 'real' probabilities.

AS for model instability (here in the sense that a slight change in explanatory variable change the prediction a lot), you need to recheck your whole calibration process : variable selection, train / test partitionning, hyper-parameter-tuning / cross validation, performance metrics used, to ensure that your model is not over-fitting.

score 0 · Answer 3 · answered Jan 27 '18 at 20:37

0

Yes, check the log-loss distribution as the number of iterations increases. If it start's shooting up before your final boosting iteration then it's over-fitting.

answered Jan 27 '18 at 20:37

bbennett36

51
5

score -1 · Answer 4 · answered Oct 25 '18 at 17:48

First, you should be sure on that your data is large enough when working with tree-based algorithms like XGBoost and LightGBM, such sudden changes may indicate overfitting. (10,000 samples at least, rule of thumb)

Second, how is your cardinality; if you have 3-4 features, it would be expected that a change of feature causing such an affect.

Third, what are your selection of hyperparameters? Tree-based models are much sensitive to changes of the parameters. Be sure that you carefully implement your hyperparameter tuning.

Lastly, when dealing with binary classification; error metrics gets really important. You can do a combination of binary log loss and binary error (XGBoost allows you to choose multiple); also be sure to implement early stopping by choosing early_stopping_rounds = N in the train method of XGBoost, where N is the selection of iterations. By that, your algorithm will stop early at a reasonable point where your loss stops to decrease, avoiding overfitting.

XGBoost outputs tend towards the extremes

4 Answers4