I understand from resources like this one that the Population Stability Index (PSI) can be used to test for data drift when a machine learning model is in production. However, the resources I have looked at describe PSI in terms of a single variable. Can PSI be applied when observations include multiple variables? How?
Asked
Active
Viewed 61 times
1 Answers
1
PSI is a special case of "SI" which essentially measures the difference between the distribution of one variable, over two groups (here, the development data and production data, presumably). Usually, PSI is suggested for the final score (y_hat_prod), which essentially is a weighted sum of feature variables (Sum_Beta_Xis).
If you wish to calculate the difference in the distribution of the X's individually, you can do that; you'd end up with what's normally referred to as Characteristic Stability Index for that individual variable, irrespective of the other Xs or the y_hat.
skoh
- 126
- 2
-
Can you please comment on how the weights are determined to compute Sum_Beta_Xis? Also, are there any references you can cite? – Fijoy Vadakkumpadan Aug 15 '23 at 18:40
-
1The weights are determined during the model train process, also referred to as `parameters`. A good reference for the SI is https://arize.com/blog-course/population-stability-index-psi/ – skoh Aug 18 '23 at 20:49