Questions tagged [shap]

68 questions
13
votes
2 answers

SHAP value analysis gives different feature importance on train and test set

Should SHAP value analysis be done on the train or test set? What does it mean if the feature importance based on mean |SHAP value| is different between the train and test set of my lightgbm model? I intend to use SHAP analysis to identify how each…
pbk
  • 133
  • 1
  • 5
6
votes
1 answer

How to achieve SHAP values for a CatBoost model in R?

I'm asked to create a SHAP analysis in R but I cannot find it how to obtain it for a CatBoost model. I can get the SHAP values of an XGBoost model with shap_values <- shap.values(xgb_model = model, X_train = train_X) but not for CatBoost. Here is…
user100740
  • 91
  • 2
5
votes
1 answer

Shapley values without intercept (or without `expected_value`)

I have a model and I want to derive its interpretability by using feature contributions. In the end, I want to have some contribution per feature such that the sum of contributions equals the prediction of the model. One approach may be to use…
David Masip
  • 5,981
  • 2
  • 23
  • 61
5
votes
1 answer

How is the "base value" of SHAP values calculated?

I'm trying to understand how the base value is calculated. So I used an example from SHAP's github notebook, Census income classification with LightGBM. Right after I trained the lightgbm model, I applied explainer.shap_values() on each row of the…
David293836
  • 197
  • 1
  • 6
5
votes
1 answer

Explanation of how DeepExplainer works to obtain SHAP values in simple terms

I have been using DeepExplainer (DE) to obtain the approximate SHAP values for my MLP model. I am following the SHAP Python library. Now I'd like learn the logic behind DE more. From the relevant paper it is not clear to me how SHAP values are…
mlee_jordan
  • 153
  • 1
  • 8
5
votes
1 answer

Is it valid to compare SHAP values across models?

Let's say I have three models: a random forest with 100 trees a random forest with 1000 trees an xgboost model. I can rank the importance of my features on my dataset for each model using SHAP, and compare relative importance across models. What…
DKL
  • 78
  • 5
3
votes
0 answers

Explain FastText model using SHAP values

I have trained fastText model and some fully connected network build on its embeddings. I figured out how to use Lime on it: complete example can be found in Natural Language Processing Is Fun Part 3: Explaining Model Predictions The idea is clear -…
Mikhail_Sam
  • 131
  • 4
3
votes
1 answer

Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by…
hideonbush
  • 31
  • 1
2
votes
2 answers

difference between feature effect and feature importance

Is there a difference between feature effect (eg SHAP effect) and feature importance in machine learning terminologies?
2
votes
1 answer

How can interparet shap.summary_plot and its gray color concerning outliers/anomaly?

I inspired by this notebook, and I'm experimenting IsolationForest algorithm using scikit-learn==0.22.2.post1 for anomaly detection context on the SF version of KDDCUP99 dataset, including 4 attributes. The data is directly fetched from sklearn and…
Mario
  • 335
  • 5
  • 18
2
votes
0 answers

Shapley summary plot interpretation doubt?

I have question when interpreting SHAP summary plot. I have attached the sample plot Here, If I am interpreting it correctly, low values of feature 1 are associated with high and negative values for the dependent variable. However, Feature 1 takes…
2
votes
0 answers

How SHAP value explains contribution of features for outliers event?

I'm trying to understand and experiment with how the SHAP value can explain behaviour for each outlier events (rows) and how it can be related to shap.force_plot(). I already created a simple synthetic dataset with 7 outliers. I didn't get how 4.85…
Mario
  • 335
  • 5
  • 18
2
votes
1 answer

Shapley contribution when coalition is 0

I am exploring Shapley for channel attribution based on [here][1] Consider C1, C2, C3, C4 as 4 channels in question. Some of the coalition does not have value, such as (C1, C2) -> 20 (C1, C3, C4) -> 10 (C1, C2, C3, C4) -> 0 The reason being there…
Kenny
  • 121
  • 1
2
votes
0 answers

Aggregate SHAP importances from different models

A couple of questions on the SHAP approach to the estimation of feature importance. I would like to use the random forest, logistic regression, SVM, and kNN to train four classification models on a dataset. Parameters in each training are chosen to…
2
votes
1 answer

Getting the positive impacting features using SHAP

I'm attempting to use SHAP to automatically extract feature names that have a positive impact on my regression models. On inspection of the code I see that the bar plot, for example, determines these by taking the mean absolute SHAP values for a…
amateurjustin
  • 75
  • 1
  • 8
1
2 3 4 5