Questions tagged [gradient-boosting-decision-trees]
46 questions
6
votes
1 answer
What is Pruning & Truncation in Decision Trees?
Pruning & Truncation
As per my understanding
Truncation: Stop the tree while it is still growing so that it may not end up with leaves containing very low data points. One way to do this is to set a minimum number of training inputs to use on each…
Pluviophile
- 3,520
- 11
- 29
- 49
5
votes
1 answer
Multi-target regression tree with additional constraint
I have a regression problem where I need to predict three dependent variables ($y$) based on a set of independent variables ($x$):
$$ (y_1,y_2,y_3) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n +u. $$
To solve this problem, I would…
Peter
- 7,277
- 5
- 18
- 47
4
votes
1 answer
XGBoost - Imputing Vs keeping NaN
What is the benefit of imputing numerical or categorical features when using DT methods such as XGBoost that can handle missing values? This question is mainly for when the values are missing not at random.
An example of missing not at random…
thereandhere1
- 715
- 1
- 7
- 22
3
votes
3 answers
Am I building a good or bad model for prediction built using Gradient Boosting Classifier Algorithm?
I am building a binary classification model using GB Classifier for imbalanced data with event rate 0.11% having sample size of 350000 records (split into 70% training & 30% testing).
I have successfully tuned hyperparameters using GridsearchCV, and…
RajendraW
- 33
- 4
3
votes
3 answers
Example for Boosting
Can someone exactly tell me how does boosting as implemented by LightGBM or XGBoost work in real case scenerio. Like I know it splits tree leaf wise instead of level wise, which will contribute to global average not just the loss of branch which…
Chris_007
- 193
- 5
2
votes
1 answer
What if root of a such tree is pruned in xgboost?
Extreme Gradient Boosting stops to grow a tree if $\gamma$ is greater than impurity reduction given as eq (7) (see below) , what does happen if tree's root has a negative impurity? I think there is no any way to boosting goes on because the next…
Davi Américo
- 123
- 4
1
vote
0 answers
Why is the average prediction moving away from average response for a reg:gamma model
I'm predicting a response that I would typically model under a gamma distribution, with relatively simple paramters, I'm just using the default other than these:
learning_rate = 0.01
max_depth = 6
base_score = the average of y
Since my base_score…
Mattice Verhoeven
- 111
- 5
1
vote
1 answer
Why is HistGradientBoostingRegressor in sklearn so fast and low on memory?
I trained multiple models for my problem and most ensemble algorithms resulted in lengthy fit and train time and huge model size on disk (approx 10GB for RandomForest) but when I tried HistGradientBoostingRegressor from sklearn the fit and training…
ro23
- 35
- 4
1
vote
1 answer
Tree complexity and gamma parameter in xgboost
According to xgboost paper, regularization is given by:
$$\Omega(f) = \gamma T + \lambda || w||^2$$
where $\gamma$ is the complexity of a tree (i.e., number of leaves in the tree).
The parameter gamma in xgboost library, on the other hand, controls…
zzzbob
- 45
- 4
1
vote
1 answer
Why is monotonic constraint disabled when using MAE as an objective to LGBM?
I tried to use monotonic constraints in LGBM, but if I use mean absolute error as an objective, it gives a warning that monotonic constraints cannot be done in l1.
What is the reason? Thanks!
morqueatsz
- 25
- 3
1
vote
1 answer
Why is gradient boosting better than random forest for unbalanced data?
I've searched everywhere and still couldn't figure this one out.
This post mentioned that Gradient Boosting is better than Random Forest for unbalanced data. Why is that? Is Random Forest worse because of bootstrapping (perhaps this wouldn't get a…
Aldla E Aoepql
- 41
- 2
1
vote
1 answer
How does machine learning algorithms process text?
I'm still new in machine learning and I've been trying to expand my knowledge about it. For my first project, I want to classify if a tweet is suicidal or not using the gradient boost algorithm.
I do know that ml models can't process plain text…
Emman
- 11
- 2
1
vote
0 answers
ML Model that doesnt average/penalize extreme values
I have a 20k dataset, and a couple hundred of those lines are extreme values and 10 of them or so are even extremer values. But they are correct and have a unique tag, so when that tag comes up I am hoping the ML treats it as unique as it is and not…
Jroc561
- 11
- 1
1
vote
0 answers
Error from XGBoost missing data handling
I have a regression problem with a very large dataset >50 million rows, 81 features and 1 target, all positive float values unevenly distributed between 0 - 1 million. I've trained an XGBoost model on the data and gotten a relatively good R^2 score…
lexan55
- 36
- 2
1
vote
0 answers
Catboost not working properly when I remove non important variables (source of randomness?)
I was wondering if anyone has encountered the same. The thing is, when I run a catboost boosting model, delete non important variables (feature importance by prediction importance = 0, in fact these variables arenot in the boosting trees), rerun the…
Tom
- 53
- 8