Questions tagged [xgboost]

For questions related to the eXtreme Gradient Boosting algorithm.

691 questions
81
votes
5 answers

GBM vs XGBOOST? Key differences?

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes…
Aman
  • 977
  • 1
  • 8
  • 8
58
votes
6 answers

Does XGBoost handle multicollinearity by itself?

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also…
yad
  • 1,773
  • 3
  • 16
  • 27
55
votes
2 answers

How to interpret the output of XGBoost importance?

I ran a xgboost model. I don't exactly know how to interpret the output of xgb.importance. What is the meaning of Gain, Cover, and Frequency and how do we interpret them? Also, what does Split, RealCover, and RealCover% mean? I have some extra…
user14204
42
votes
2 answers

LightGBM vs XGBoost

I'm trying to understand which is better (more accurate, especially in classification problems) I've been searching articles comparing LightGBM and XGBoost but found only…
Sergey Nizhevyasov
  • 533
  • 1
  • 4
  • 4
40
votes
4 answers

Why do we need XGBoost and Random Forest?

I wasn't clear on couple of concepts: XGBoost converts weak learners to strong learners. What's the advantage of doing this ? Combining many weak learners instead of just using a single tree ? Random Forest uses various sample from tree to create…
40
votes
6 answers

Unbalanced multiclass data with XGBoost

I have 3 classes with this distribution: Class 0: 0.1169 Class 1: 0.7668 Class 2: 0.1163 And I am using xgboost for classification. I know that there is a parameter called scale_pos_weight. But how is it handled for 'multiclass' case, and how can…
36
votes
1 answer

Why is xgboost so much faster than sklearn GradientBoostingClassifier?

I'm trying to train a gradient boosting model over 50k examples with 100 numeric features. XGBClassifier handles 500 trees within 43 seconds on my machine, while GradientBoostingClassifier handles only 10 trees(!) in 1 minutes and 2 seconds :( I…
ihadanny
  • 1,357
  • 2
  • 11
  • 19
35
votes
3 answers

xgboost: give more importance to recent samples

Is there a way to add more importance to points which are more recent when analyzing data with xgboost?
kilojoules
  • 453
  • 1
  • 4
  • 6
33
votes
3 answers

Hypertuning XGBoost parameters

XGBoost have been doing a great job, when it comes to dealing with both categorical and continuous dependant variables. But, how do I select the optimized parameters for an XGBoost problem? This is how I applied the parameters for a recent Kaggle…
Dawny33
  • 8,226
  • 12
  • 47
  • 104
32
votes
3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…
user781486
  • 1,305
  • 2
  • 16
  • 18
26
votes
2 answers

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…
tokestermw
  • 418
  • 1
  • 4
  • 8
25
votes
4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…
25
votes
4 answers

Is feature engineering still useful when using XGBoost?

I was reading the material related to XGBoost. It seems that this method does not require any variable scaling since it is based on trees and this one can capture complex non-linearity pattern, interactions. And it can handle both numerical and…
KevinKim
  • 625
  • 1
  • 7
  • 13
22
votes
1 answer

Lightgbm vs xgboost vs catboost

I've seen that in Kaggle competitions people are using lightgbms where they used to use xgboost. My question is: when would you rather use xgboost instead of lightgbm? What about catboost?
David Masip
  • 5,981
  • 2
  • 23
  • 61
22
votes
1 answer

XGBRegressor vs. xgboost.train huge speed difference?

If I train my model using the following code: import xgboost as xg params = {'max_depth':3, 'min_child_weight':10, 'learning_rate':0.3, 'subsample':0.5, 'colsample_bytree':0.6, 'obj':'reg:linear', 'n_estimators':1000, 'eta':0.3} features =…
1
2 3
46 47