10

How do algorithms GBM algorithms, such as XGBoost or LightGBM handle NaN values? I know that they learn how to replace NaN values with other values but my question is: How do they do it exactly?

Hagbard
  • 414
  • 5
  • 16
user10296606
  • 1,784
  • 5
  • 17
  • 31
  • 4
    In short it creates a third branch as well for missing values and will automatically learn which direction to go when a value is missing, so when the data of a specific value is missing, it takes that direction. Refer https://arxiv.org/abs/1603.02754 – Aditya Jan 06 '20 at 12:49
  • What about lightGBM? – user10296606 Jan 06 '20 at 13:07

1 Answers1

7

LIGHTGBM will ignore missing values during a split, then allocate them to whichever side reduces the loss the most. https://github.com/microsoft/LightGBM/issues/2921

There are some options you can set such as usemissing=false, which disables handling for missing values. You can also use the zeroas_missing option to change behavior. GitHub

Noah Weber
  • 5,609
  • 1
  • 11
  • 26
  • 4
    The link which you provided http://mlexplained.com/2018/01/05/lightgbm-and-xgboost-explained/ for me is redirected to electronicsmarket.org - I don't know why – Ihor B. Sep 15 '21 at 10:44