Questions tagged [data-drift]
9 questions
8
votes
1 answer
How to Combat Data Drift
I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
scott lucas
- 83
- 3
4
votes
3 answers
What techniques are used to analyze data drift?
I've created a model that has recently started suffering from drift.
I believe the drift is due to changes in the dataset but I don't know how to show that quantitatively.
What techniques are typically used to analyze and explain model (data)…
Connor
- 597
- 1
- 15
2
votes
0 answers
Detecting Data Drift in Audio Data
For a give set of audio files collected from an industrial process via a microphone, I have extracted suitable features and fed them into a neural network for training a binary classifier as depicted below.
The model has been performing quite well…
TwinPenguins
- 4,157
- 3
- 17
- 53
1
vote
1 answer
Do you have to use clustering with SciKit-Learn's Mutual Information metric?
I'd like to calculate the mutual information between two datasets, but I'd prefer not to cluster them first.
I'm thinking of using SciKit-Learn's mutual_info_score metric, but it's documentation suggests the inputs should be clusters, not whole…
Connor
- 597
- 1
- 15
1
vote
0 answers
Very different behaviors between PSI and KS tests for data drift
I want to set up a process for data drift and I am trying to see which metric to pick. Using KS test, it flags half of my features to be drifted although when I look at the distribution between the two datasets, I dont see much of variation. PSI…
Fatima
- 71
- 3
1
vote
1 answer
How train a pre-trained model based on new dataset?
I have trained a deep nn model based on some existing data. In the meantime, I have collected more data and label them so that I can feed it to the model to improve its performance. The questions is, should I feed:
Option 1- New data to the already…
Mahdi Amrollahi
- 263
- 2
- 10
1
vote
1 answer
What are the advantages of model drift vs concept drift in online learning?
I have asked this question here but I'm also posting it here to get a better insight:
https://stats.stackexchange.com/questions/602282/what-are-the-advantages-of-model-drift-vs-concept-drift-in-online-learning
Let's say I have a simple linear…
Ash
- 130
- 4
1
vote
1 answer
What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?
As a beginner in MLOps, I was overwhelmed by some confusing definitions.
As far as I understand, when we have a classifier or regressor with y = f(X) function:
Covariate Shift is changing the distribution of independent variables (X),
Label Shift…
Mohsen Mahmoodzadeh
- 43
- 1
- 5
0
votes
1 answer
Can I use Population Stability Index (PSI) when observations have multiple variables?
I understand from resources like this one that the Population Stability Index (PSI) can be used to test for data drift when a machine learning model is in production. However, the resources I have looked at describe PSI in terms of a single…
Fijoy Vadakkumpadan
- 113
- 4