Questions tagged [catboost]

36 questions
22
votes
1 answer

Lightgbm vs xgboost vs catboost

I've seen that in Kaggle competitions people are using lightgbms where they used to use xgboost. My question is: when would you rather use xgboost instead of lightgbm? What about catboost?
David Masip
  • 5,981
  • 2
  • 23
  • 61
6
votes
1 answer

How to achieve SHAP values for a CatBoost model in R?

I'm asked to create a SHAP analysis in R but I cannot find it how to obtain it for a CatBoost model. I can get the SHAP values of an XGBoost model with shap_values <- shap.values(xgb_model = model, X_train = train_X) but not for CatBoost. Here is…
user100740
  • 91
  • 2
4
votes
1 answer

Are linear models better when dealing with too many features? If so, why?

I had to build a classification model in order to predict which what would be the user rating by using his/her review. (I was dealing with this dataset: Trip Advisor Hotel Reviews) After some preprocessing, I compared the results of a Logistic…
3
votes
0 answers

What is the concept behind the categorical-encoding used in the CatBoost benchmark problems?

I'm working through CatBoost quality benchmark problems (here). I'm particularly intrigued by the methodology adopted to convert categorical features to numerical values as described in the comparison_description.pdf (here). What is the reasoning…
PPR
  • 171
  • 1
  • 5
2
votes
1 answer

Catboost multiclassification evaluation metric: Kappa & WKappa

I am working on an unbalanced classification problem and i want to use Kappa as my evaluation metric. Considering the classifier accepts weights (which i have given it), should i still be using weighted kappa or just use the standard kappa? I am not…
Musa
  • 31
  • 2
2
votes
0 answers

Catboost: Categorcial Feature Encoding

I would like to understand all the methods available in Catboost for encoding categorical features. Unfortunately, the published articles by Yandex ("CatBoost: gradient boosting with categorical features support" and "CatBoost: unbiased boosting…
calpyte
  • 121
  • 2
2
votes
1 answer

How do we target-encode categorical features in multi class classification problems?

Say I have a multiclass problem with a dataset as this: user_id price target -------+--------+----- 1 30 apple 1 20 samsung 2 32 samsung 2 40 huawei . . where I have a lot of users i.e One Hot…
CutePoison
  • 450
  • 2
  • 8
2
votes
1 answer

How to tell CatBoost which feature is categorical?

I am excited to learn that CatBoost can handle categorical features by itself. One of my features, Department ID, is categorical. However, it looks like numeric, since the values are like 1001, 1002, ..., 1218. Those numbers are just IDs of the…
Fred Chang
  • 85
  • 1
  • 6
1
vote
1 answer

Does Gradient Boosting perform n-ary splits where n > 2?

I wonder whether algorithms such as GBM, XGBoost, CatBoost, and LightGBM perform more than two splits at a node in the decision trees? Can a node be split into 3 or more branches instead of merely binary splits? Can more than one feature be used in…
1
vote
0 answers

Feature Selection before modeling with Boosting Trees

I have read in some papers that the subset of features chosen for a boosting tree algorithm will make a big difference on the performanceso I've been trying RFE, Boruta, Clustering variables, correlation, WOE & IV and Chi-square Let's say I have a…
Mamoud
  • 11
  • 2
1
vote
2 answers

Does gradient boosting algorithm error always decrease faster and lower on training data?

I am building another XGBoost model and I'm really trying not to overfit the data. I split my data into train and test set and fit the model with early stopping based on the test-set error which results in the following loss plot: I'd say this is…
Xaume
  • 182
  • 2
  • 11
1
vote
2 answers

RandomizedSearchcv(n_iter=10) doesnt stop after training 10 models

I am using RandomizedSearchcv for hyperparameter optimization. When I run the model, it shows the scores for each model training. The problem is, it trains way more than 10 models when in fact I expect it to train just 10 models by specifying…
1
vote
0 answers

Optuina pruning during CrossValidation, does it make sense?

I'm currently trying to build a model using CatBoost. For my parameter tuning, I'm using optuna and cross-validation and pruning the trial checking on the intermediate cross-validation scores. Here there's a minimum example: def objective(trial): …
1
vote
1 answer

If I use Weight of Evidence to transform categorical variables, do I still need to inform their indexes to Catboost

I'm using Weight of Evidence (WOE) to encode my categorical features. Do I still need to inform Catboost that they are categorical features by using cat_features parameter?
1
vote
0 answers

Intuition behind catboost encoding techniques

Can anyone please help me in understanding the effect of various bucketing techniques used in CatBoost Algorithm for categorical features? Like there is border, buckets, binarized target mean, counter encoding techniques, I am not able to get proper…
1
2 3