5

I'm trying to understand how the base value is calculated. So I used an example from SHAP's github notebook, Census income classification with LightGBM.

Right after I trained the lightgbm model, I applied explainer.shap_values() on each row of the test set individually. By using force_plot(), it yields the base value, model output value, and the contributions of features, as shown below: enter image description here

My understanding is that the base value is derived when the model has no features. But how is it actually calculated in SHAP?

desertnaut
  • 1,908
  • 2
  • 13
  • 23
David293836
  • 197
  • 1
  • 6

1 Answers1

3

As you say, it's the value of a feature-less model, which generally is the average of the outcome variable in the training set (often in log-odds, if classification). With force_plot, you actually pass your desired base value as the first parameter; in that notebook's case it is explainer.expected_value[1], the average of the second class.

https://github.com/slundberg/shap/blob/06c9d18f3dd014e9ed037a084f48bfaf1bc8f75a/shap/plots/force.py#L31

https://github.com/slundberg/shap/issues/352#issuecomment-447485624

Ben Reiniger
  • 11,094
  • 3
  • 16
  • 53
  • I don't really get the sense of a feature-less model. I know this is the correct explanation given also by the author of the original paper, but I am not able to understand it. Wouldn't that prediction obtained by my trained model that actually use some features? – Alexbrini Nov 23 '20 at 09:31
  • 1
    @Alexbrini, according to my understanding the featureless model can simply predict a fixed value (eg the average of the output), with no features used whatsoever. – Nikos M. Jun 08 '21 at 08:36
  • Do you know what does it mean when I see `0.5` as base value for all my instances? when I plot waterfall plot, I see `0` for all my instances. You can refer this post here - https://stats.stackexchange.com/questions/569843/why-shap-base-value-is-0-5-for-all-my-instances-and-what-does-it-mean – The Great Mar 31 '22 at 07:32