Questions tagged [catboost]
36 questions
22
votes
1 answer
Lightgbm vs xgboost vs catboost
I've seen that in Kaggle competitions people are using lightgbms where they used to use xgboost. My question is: when would you rather use xgboost instead of lightgbm? What about catboost?
David Masip
- 5,981
- 2
- 23
- 61
6
votes
1 answer
How to achieve SHAP values for a CatBoost model in R?
I'm asked to create a SHAP analysis in R but I cannot find it how to obtain it for a CatBoost model. I can get the SHAP values of an XGBoost model with
shap_values <- shap.values(xgb_model = model, X_train = train_X)
but not for CatBoost.
Here is…
user100740
- 91
- 2
4
votes
1 answer
Are linear models better when dealing with too many features? If so, why?
I had to build a classification model in order to predict which what would be the user rating by using his/her review. (I was dealing with this dataset: Trip Advisor Hotel Reviews)
After some preprocessing, I compared the results of a Logistic…
dsbr__0
- 191
- 1
- 3
3
votes
0 answers
What is the concept behind the categorical-encoding used in the CatBoost benchmark problems?
I'm working through CatBoost quality benchmark problems (here). I'm particularly intrigued by the methodology adopted to convert categorical features to numerical values as described in the comparison_description.pdf (here). What is the reasoning…
PPR
- 171
- 1
- 5
2
votes
1 answer
Catboost multiclassification evaluation metric: Kappa & WKappa
I am working on an unbalanced classification problem and i want to use Kappa as my evaluation metric. Considering the classifier accepts weights (which i have given it), should i still be using weighted kappa or just use the standard kappa? I am not…
Musa
- 31
- 2
2
votes
0 answers
Catboost: Categorcial Feature Encoding
I would like to understand all the methods available in Catboost for
encoding categorical features.
Unfortunately, the published articles by Yandex
("CatBoost: gradient boosting with categorical features support" and "CatBoost: unbiased boosting…
calpyte
- 121
- 2
2
votes
1 answer
How do we target-encode categorical features in multi class classification problems?
Say I have a multiclass problem with a dataset as this:
user_id price target
-------+--------+-----
1 30 apple
1 20 samsung
2 32 samsung
2 40 huawei
.
.
where I have a lot of users i.e One Hot…
CutePoison
- 450
- 2
- 8
2
votes
1 answer
How to tell CatBoost which feature is categorical?
I am excited to learn that CatBoost can handle categorical features by itself. One of my features, Department ID, is categorical. However, it looks like numeric, since the values are like 1001, 1002, ..., 1218. Those numbers are just IDs of the…
Fred Chang
- 85
- 1
- 6
1
vote
1 answer
Does Gradient Boosting perform n-ary splits where n > 2?
I wonder whether algorithms such as GBM, XGBoost, CatBoost, and LightGBM perform more than two splits at a node in the decision trees? Can a node be split into 3 or more branches instead of merely binary splits? Can more than one feature be used in…
Chong Lip Phang
- 221
- 2
- 8
1
vote
0 answers
Feature Selection before modeling with Boosting Trees
I have read in some papers that the subset of features chosen for a boosting tree algorithm will make a big difference on the performanceso I've been trying RFE, Boruta, Clustering variables, correlation, WOE & IV and Chi-square
Let's say I have a…
Mamoud
- 11
- 2
1
vote
2 answers
Does gradient boosting algorithm error always decrease faster and lower on training data?
I am building another XGBoost model and I'm really trying not to overfit the data. I split my data into train and test set and fit the model with early stopping based on the test-set error which results in the following loss plot:
I'd say this is…
Xaume
- 182
- 2
- 11
1
vote
2 answers
RandomizedSearchcv(n_iter=10) doesnt stop after training 10 models
I am using RandomizedSearchcv for hyperparameter optimization. When I run the model, it shows the scores for each model training. The problem is, it trains way more than 10 models when in fact I expect it to train just 10 models by specifying…
Mehmet Deniz
- 31
- 4
1
vote
0 answers
Optuina pruning during CrossValidation, does it make sense?
I'm currently trying to build a model using CatBoost. For my parameter tuning, I'm using optuna and cross-validation and pruning the trial checking on the intermediate cross-validation scores. Here there's a minimum example:
def objective(trial):
…
GiusWestsideDS
- 11
- 1
1
vote
1 answer
If I use Weight of Evidence to transform categorical variables, do I still need to inform their indexes to Catboost
I'm using Weight of Evidence (WOE) to encode my categorical features. Do I still need to inform Catboost that they are categorical features by using cat_features parameter?
Jorge Amaral
- 131
- 2
1
vote
0 answers
Intuition behind catboost encoding techniques
Can anyone please help me in understanding the effect of various bucketing techniques used in CatBoost Algorithm for categorical features? Like there is border, buckets, binarized target mean, counter encoding techniques, I am not able to get proper…
Mimansa Maheshwari
- 11
- 1