A Data Odyssey
Күн бұрын
43,669
1

SHAP with Python (Code and Explanations)

SHAP is the most powerful Python package for understanding and debugging your machine learning models. It can be used to explain both individual predictions and trends across multiple predictions. We explore how by walking through the code and explanations for the SHAP waterfall plot, force plot, absolute mean plot, beeswarm plot and dependence plots.
SHAP course: adataodyssey.com/courses/shap...
XAI course: adataodyssey.com/courses/xai-...
Newsletter signup: mailchi.mp/40909011987b/signup
*NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course)
Read the companion article (no-paywall link):
towardsdatascience.com/introd...
SHAP for Categorical Features (no-paywall link): towardsdatascience.com/shap-f...
Medium: / conorosullyds
Twitter: / conorosullyds
Mastodon: sigmoid.social/@conorosully
Website: adataodyssey.com/

Пікірлер: 70

@adataodyssey2 ай бұрын
*NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course) SHAP course: adataodyssey.com/courses/shap-with-python/ XAI course: adataodyssey.com/courses/xai-with-python/ Newsletter signup: mailchi.mp/40909011987b/signup
@mohadesehkeshavarz9107
Ай бұрын
why can not get the XAI for free? the time had ended?
@adataodyssey
Ай бұрын
@@mohadesehkeshavarz9107 if you sign up for the newsletter letter, you will get a coupon that gives you free access to the XAI course. If you are still having trouble, send me your email on Instagram.
@tamojitmaiti2 ай бұрын
This is so clear and concise! Thank you!
@adataodyssey
2 ай бұрын
No problem Tamojit! This is my goal. More XAI content is on the way.
@cutestbear33276 ай бұрын
thank you for the awesome video~ really like the way you explain everything thoroughly and meticulously. really friendly to people like us who have just begun our journey into data science
@adataodyssey
6 ай бұрын
I'm glad you found it useful! Are there any other related concepts you are interested in learning about?
@cutestbear3327
6 ай бұрын
@@adataodyssey hi conor, thnx for your kind reply. i am happy to go with whatever topic you dive into. maybe random forest (and its hyperparameter tuning) since it is such a classic? may you have fun and enjoy continued success on youtube~~ cheers
@murilopalomosebilla29998 ай бұрын
Really well explained. Thanks ^^
@adataodyssey
8 ай бұрын
No problem! I'm glad you found it useful
@thegerman12395 ай бұрын
Thank you so much for this awesome video! I'm currently writing a term paper about this topic and other machine learning explainability techniques. This helped me out a lot while creating my examples! Kind regards from Germany!
@adataodyssey
5 ай бұрын
Guten tag! I'm glad this helped. I also have videos about the maths behind Shapley values: kzread.info/dash/bejne/h36Z15Ryp9SdlJM.htmlsi=-s-QTmLoQmSiYwFD kzread.info/dash/bejne/lG2l08R_pce8mKw.htmlsi=uMpSUk7ue6Tzs8SQ
@thegerman1239
5 ай бұрын
Hey I'm done with the paper! The videos about the math really helped me as well. You're a champ
@adataodyssey
5 ай бұрын
@@thegerman1239 Great stuff! All the best with the result.
@yukiwang58259 ай бұрын
Wonderful video' Thanks for this.
@adataodyssey
9 ай бұрын
Thanks :)
@bakerb-rz6lv11 ай бұрын
love you, bro.😀
@pilarangelicarodriguezcaba81993 ай бұрын
really easy to understand, a lot better than the offician documentation from shap plots
@adataodyssey
3 ай бұрын
Thank you! This was my motivation for the content. Had to do a lot of work to understand the method fully :)
@shotclock54242 ай бұрын
This is the best way to explain explanations 😁 I am interested to see a video of yours with more complex models like Deep Neural Networks on Signal Data and how can we use SHAP on that. Great work!
@adataodyssey
2 ай бұрын
Thank you! I will keep that in mind
@felicebugge10 күн бұрын
Really useful , thank you
@adataodyssey
9 күн бұрын
No problem Felice!
@wangchris54689 ай бұрын
Lovely ~~~~ 👍👍👍
@adataodyssey
9 ай бұрын
Thanks!
@melih68267 ай бұрын
Hi Connor, you mentioned on the limitation of the SHAP values that "highly correlated features are a problem when using shap values technique", but on this video the heat map shows that features are highly correlated?
@adataodyssey
7 ай бұрын
The problem with correlated features is that they can potentially lead to unexpected model predictions. That is when we sample pairs of feature values that do not exist in the dataset. Some models will still produce reasonable predictions even if there are correlated features. The point is you can still use SHAP even if you have correlated features. You just need to be aware that the results may be negatively impacted. It is important to validate the results using other methods and visualisations. For example, it's not included here, but in the course, we use SHAP interaction values to find an interaction between two features. We then confirm this interaction using a scatter plot. In other words, we had a useful result even with highly correlated features. I hope that makes sense!
@markfedenia33839 ай бұрын
I see that cuML computes Shapley values, however it does not look like the Explainer object is compatible with shap. Do you know if there is any way to use the cuML Explainer object and model with the shap package (by the way, excellent videos)
@adataodyssey
9 ай бұрын
Thanks! I'm not too familiar with cuML but I think it should be possible. You would have to replace all SHAP values and base_values in a SHAP explainer object with those from the cuML explainer object. It's not exactly what you are looking for but this article explains how you can manipulate the SHAP values object and then use the SHAP plots as normal: towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea?sk=2eca9ff9d28d1c8bfde82f6784bdba19
@rafaelagd0 Жыл бұрын
Great video! Could you comment on the future of SHAP? It seems the project was abandoned. The latest commit is from June 2022 and there is a pile of 1.5k issues. I couldn't find much information about it and the other packages seem to depend on it. So there may be no alternative.
@adataodyssey
Жыл бұрын
That is a good point, Rafael! I think SHAP has a good future regardless of the package. The method is widely used in industry and is based on solid theory. The method is based on Shapley values which have been around for long time. For now the package works well for me. The 1.5k issues is more an indication of the popularity than major issues with the package. Hopefully, if it does run into serious issues then updates will be made. If not, I’m sure something will take it’s place. As I mentioned, it is very popular so someone is sure to take advantage of that. The code and method is all open sourced so it shouldn’t be too hard to replicate. I know there are already other implementations in R (see IML package).
@apogounte82396 ай бұрын
Hi! Interesting video! Just wanted to mention that if you just run shap.plots.waterfall(shap_values[0]), you never get on the y-axis, the actual names of the features, but you get instead feature 5, feature 2, etc. Is there a quick fix?
@adataodyssey
6 ай бұрын
Yes, you should be able to fix that. You can try: 1) Make sure your X feature matrix (that you pass into the explainer function i.e. shap_values = explainer(X)) is a pandas dataframe and the column names are the correct feature names. You can check these using X.columns 2) Update the shap_values after they have been created using something like: shap_values.feature_names = list(["feature 1","feature 2", ... ]). It is important to pass the new names as a list. Let me know if that helps
@ooplectures38289 ай бұрын
Please explain how can i use shap to determine features important against classes in a multi classification problem. I need to know which features or values of features are contributing to prediction of each class in a multi classification system.
@adataodyssey
9 ай бұрын
This has been on the list for a while. I'm not sure when I'll be able to do it but hopefully soon!
@NasirUddin-im2zb7 ай бұрын
When i was running my code i had this issues, regading shap: FutureWarning: In the future `np.long` will be defined as the corresponding NumPy scalar. long_ = _make_signed(np.long), I did pip install 1.20.0, 1.24.2, 1.22.2 so on, no of them work, what can i do, if you can suggest me something it will be great.
@adataodyssey
7 ай бұрын
Hi Nasir, sorry about that. I've never seen that issue before. To confirm, do you mean that you installed different versions on NumPy? This link might help: github.com/neonbjb/tortoise-tts/issues/379 They suggest trying: pip install numpy==1.20.0
@anki81367 ай бұрын
Hey connor , Thanks for the course I just have one doubt , how to explain this stacked force plot , I am having some problems in that. can you make a video or something?
@adataodyssey
7 ай бұрын
Hi Anki, I am sorry that the explanation was not clear. Yet, I am reluctant to make a video on the stacked force plot. This is because, in practice, I have not found it very useful. It is used to explore relationships between features and shap values. But you can do this using the dependence plots which are also easier to understand. In the course, I go into a bit more detail on the stacked force plot. Did you see that section?
@anki8136
7 ай бұрын
@@adataodyssey no I didn't saw that video yet but I will watch it now
@adataodyssey
7 ай бұрын
@@anki8136 Okay, hopefully that clears things up for you. It is in the aggregations lesson
@user-ji3ib1rn8s3 ай бұрын
I tried XGBoost for a different dataset but it did not give a good scatter plot nor a red line significant to separate the observations. So which other model should one use if the number of features are 870?
@adataodyssey
3 ай бұрын
This is too many features! You will never be able to get good explanations. Try to reduce the amount of features by removing the highly correlated ones.
@slimanearbaoui1237 Жыл бұрын
can this library work with lstm model
@adataodyssey
Жыл бұрын
Hi Slimane :) I've never applied it to an lstm models. Applying SHAP to deep learning models can be challenging. You may be able to apply SHAP to lstm model with some work. I have applied it to convolutional neural networks used for image classification and regression tasks. I've linked to two article below. I used the PyTorch. I know that SHAP also works with keras. towardsdatascience.com/image-classification-with-pytorch-and-shap-can-you-trust-an-automated-car-4d8d12714eea?sk=b04dcbb8a09f049f605d2110b5c8d851 towardsdatascience.com/using-shap-to-debug-a-pytorch-image-regression-model-4b562ddef30d?sk=7eb3016839186f1ba2a6f1f105f8ff64
@shamkhalmammadov4083 Жыл бұрын
Can you please make another example with categorical variables
@adataodyssey
Жыл бұрын
Hi Shamkhal, there is a video in the course that explains categorical features :) Otherwise, you might find this article useful (no-paywall link): towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea?sk=2eca9ff9d28d1c8bfde82f6784bdba19
@shamkhalmammadov4083
Жыл бұрын
@@adataodyssey Thank you very much! I am your big fun. I loved the way you explained SHAP. I got medium 3 days ago just to read your article. I still have a big problem with waterfall plot my targte variable has 3 classes - 0,1,2 for some reason I can not plot faterfall type plot
@adataodyssey
Жыл бұрын
@@shamkhalmammadov4083 Okay, in this case you have a categorical feature as your target variable. I assumed you meant categorical feature as an input feature. I have only worked with binary target variables. Can you send me your link to your dataset>
@mulusewwondieyaltaye4937Ай бұрын
I can't access SHAP python course. Could you please give me the access
@adataodyssey
Ай бұрын
Hi Mulusew, the SHAP course is no longer free. But you will now get free access to my XAI course if you sign up to the newsletter
@KOTESWARARAOMAKKENAPHD8 ай бұрын
I got error in boxplot code
@adataodyssey
8 ай бұрын
Sorry to hear that. Can you describe the error in more detail?
@digitama4 ай бұрын
Your explanation is very interesting, but I met with a problem that is "Numba needs NumPy 1.20 or less" and no matter how much downgrade the Numpy and Numba I did, the problem still doesn't go away, any suggestions?
@adataodyssey
4 ай бұрын
Sorry to hear that! Did you try only downgrading the Numpy package? Also you could try upgrading the Numba package instead so it is inline with the latest version of Numpy. Remember to refresh your kernel after installing a new package, if you are working with a notebook.
@digitama
4 ай бұрын
@@adataodyssey I did downgraded Numba and havent tried upgrading it, what is the version to upgrade to?
@bakerb-rz6lv11 ай бұрын
I got something strange bugs. I copy your code, and I run it. At today morning, The code work correctly. But now, it cannot work. I did not change anything! The error message is, After I run the code "explainer = shap.Explainer(model)": TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model: XGBRegressor(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, gpu_id=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=None, ...)
@adataodyssey
11 ай бұрын
Can you try to run this code: explainer = shap.Explainer(model,X[0:10]) where X is the feature matrix used to train your model. For some models, you need to pass this in as a mask. You can see the full example for a random forest here: github.com/conorosully/SHAP-tutorial/blob/main/src/project_1_solution.ipynb
@bakerb-rz6lv
11 ай бұрын
@@adataodyssey It still cannot work. Strangely, it says "AttributeError: module 'numpy' has no attribute 'bool'". I do not understand why this code is about the numpy. All packages I used is the newest version.
@bakerb-rz6lv
11 ай бұрын
@@adataodyssey And I found another difference. In your GitHub code, the step 9--Train model. Your output is XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None, colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise', importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0, reg_lambda=1, ...) But my output and your video's output is : XGBRegressor(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, gpu_id=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=None, ...)
@adataodyssey
11 ай бұрын
@@bakerb-rz6lv Sometimes, if you are using the newest versions, then other packages have not caught up yet. It could be that SHAP uses an older version of numpy. See this similar issue: stackoverflow.com/questions/74893742/how-to-solve-attributeerror-module-numpy-has-no-attribute-bool#:~:text=This%20means%20you%20are%20using,while%20that%20isn't%20fixed. The important point is: "Then, in version NumPy 1.24.0, the deprecated np.bool was entirely removed. This means you are using a NumPy version that removed the deprecated ways AND the library you are using wasn't updated to match that version (uses something like np.bool instead of just bool)." You could try to install an early version of numpy. But this is just a guess on my part.
@bakerb-rz6lv
11 ай бұрын
@@adataodyssey God damn it! You are right. I install numpy==1.22.3 and it work correctly. Maybe you can set this comment to top to notice other freshmen.
@noazamstein57952 ай бұрын
What does it mean that being a male increases the prediction by 0.78, AND ALSO not being an infant FURTHER increases it by 0.42? These two are obviously mutually exclusive, so I would expect either one of them being the sum of 0.78+0.42 or something else
@adataodyssey
2 ай бұрын
Your confusion is warranted as there is not a clear interpretation for this feature. In the model, there are three sex features (M, F and I). Together they are mutually exclusive. You are right, by summing up the values you get a clear interpretation of the contribution of the original categorical feature. Unfortunately, there is no easy way to do this with the SHAP package. We discuss this is in my SHAP course. You can also find a solution in this article: towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea?sk=2eca9ff9d28d1c8bfde82f6784bdba19
@bakerb-rz6lv11 ай бұрын
Hello, teacher. I use another method to train my model. Here are some codes: from sklearn.model_selection import train_test_split # Extract feature and target arrays X, y = df.drop('Grade', axis=1), df[['Grade']] # Extract text features cats = X.select_dtypes(exclude=np.number).columns.tolist() # Convert to Pandas category for col in cats: X[col] = X[col].astype('category') X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) dtrain_reg = xgb.DMatrix(X_train, y_train, enable_categorical=True) dtest_reg = xgb.DMatrix(X_test, y_test, enable_categorical=True)
@bakerb-rz6lv
11 ай бұрын
params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"} n = 100 model = xgb.train( params=params, dtrain=dtrain_reg, num_boost_round=n, ) explainer = shap.Explainer(model) shap_values = explainer(X)
@bakerb-rz6lv
11 ай бұрын
And it have something wrong: TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model: How can I fix it?
@adataodyssey
10 ай бұрын
Sorry I missed this comment! But I think I answered you question on the other comment :)