3

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by multicollinarity, so I assume SHAP won't be affected due to that?

Carlos Mougan
  • 6,011
  • 2
  • 15
  • 45
hideonbush
  • 31
  • 1

1 Answers1

3

Shapley values are designed to deal with this problem. You might want to have a look at the literature.

They are based on the idea of a collaborative game, and the goal is to compute each player's contribution to the total game.

Lets say you are playing in the football champions league final Real Madrid vs Liverpool. And Madrid only has 3 players A,B,C and they somehow score 5 goals.

To calculate the Shapley value of player 1, you will have the following combinations playing, combinations:

$S_1 = \frac{1}{3}\left( v(\{1,2,3\} - v(\{2,3\})\right) + \frac{1}{6}\left( v(\{1,2\} - v(\{2\})\right) + \frac{1}{6}\left( v(\{1,3\} - v(\{3\})\right)+ \frac{1}{3}\left( v(\{1\} - v(\emptyset)\right)$

$S_2 = \frac{1}{3}\left( v({1,2,3} - v({1,2})\right) + \frac{1}{6}\left( v({1,2} - v({1})\right) + \frac{1}{6}\left( v({2,3} - v({3})\right)+ \frac{1}{3}\left( v({2} - v(\emptyset)\right)$$

$S_3 = \frac{1}{3}\left( v(\{1,2,3\} - v(\{1,2\})\right) + \frac{1}{6}\left( v(\{1,3\} - v(\{1\})\right) + \frac{1}{6}\left( v(\{2,3\} - v(\{2\})\right)+ \frac{1}{3}\left( v(\{3\} - v(\emptyset)\right)$

Where v = value of the function of the set. For the Real Madrid the numbers of goals scored, by the different combinations of players.

As you see, the theoretical definition encapsulates the dependence between features. The theory will tell you that the sum of the contributions is equal to the prediction $S_1 + S_2 + S_3 = 5$.

Let's now if RM players gets some high Shapley values next week.

Carlos Mougan
  • 6,011
  • 2
  • 15
  • 45
  • Thanks, but how do you understand this then "Like many other permutation-based interpretation methods, the Shapley value method suffers from inclusion of unrealistic data instances when features are correlated" – hideonbush May 24 '22 at 13:32
  • It has nothing to do with colinearity, "S_i" can turn out to be something that makes no sense from a business perspective – Carlos Mougan May 25 '22 at 11:55
  • 1
    There are still some problems with usage of shap values as an explaination tool, which appears in the case of high colinearity. If you have highly colinear features, their marginal contribution will decrease, which might surprise users expecting a given feature to have high importance. – Lucas Morin May 25 '22 at 12:16
  • That's also what i'm thinking - do you have a source saying this? @lcrmorin – hideonbush May 25 '22 at 13:05
  • https://arxiv.org/pdf/1903.10464.pdf – Carlos Mougan May 25 '22 at 13:38
  • 1
    Also, since you are using xgboost you need to differentiate between observational and intentional TreeSHAP, that is different from SHAP and Shapley values – Carlos Mougan May 25 '22 at 13:41
  • https://arxiv.org/pdf/2006.16234.pdf – Carlos Mougan May 25 '22 at 13:41
  • "On theother hand, papers like Frye et al. (2019) have noted that using the interventional Shapley value (which breaks the dependence between features) will lead to evaluating the model on “impossible data points” that lie off the true data manifold." – Carlos Mougan May 25 '22 at 13:42
  • The "impossible data points" and the "colinearity" are somehow different problems – Carlos Mougan May 25 '22 at 13:43
  • The last paper is very interesting! I'm not looking at casusal relationsships, so I believe i'm going with the "True to the model" interpretation. As far as I can see the first paper looks at the KernelExplainer, not the TreeExplainer, so there might be a difference in terms of correlation - am I right? – hideonbush May 25 '22 at 14:25
  • @hideonbush best I can do is a kaggle notebook: https://www.kaggle.com/code/lucasmorin/shap-explainability-with-colinearity though I don't know if the problem is with Shapley Values or their Shap approximation - lgbm now implements Shap TreeExplainer in their predict method. So I am not sure how my experience match what Carlos mention. – Lucas Morin May 25 '22 at 15:17