How do algorithms GBM algorithms, such as XGBoost or LightGBM handle NaN values? I know that they learn how to replace NaN values with other values but my question is: How do they do it exactly?
Asked
Active
Viewed 1.3k times
10
-
4In short it creates a third branch as well for missing values and will automatically learn which direction to go when a value is missing, so when the data of a specific value is missing, it takes that direction. Refer https://arxiv.org/abs/1603.02754 – Aditya Jan 06 '20 at 12:49
-
What about lightGBM? – user10296606 Jan 06 '20 at 13:07
1 Answers
7
LIGHTGBM will ignore missing values during a split, then allocate them to whichever side reduces the loss the most. https://github.com/microsoft/LightGBM/issues/2921
There are some options you can set such as usemissing=false, which disables handling for missing values. You can also use the zeroas_missing option to change behavior. GitHub
Noah Weber
- 5,609
- 1
- 11
- 26
-
4The link which you provided http://mlexplained.com/2018/01/05/lightgbm-and-xgboost-explained/ for me is redirected to electronicsmarket.org - I don't know why – Ihor B. Sep 15 '21 at 10:44