Questions tagged [data-drift]

9 questions
8
votes
1 answer

How to Combat Data Drift

I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
4
votes
3 answers

What techniques are used to analyze data drift?

I've created a model that has recently started suffering from drift. I believe the drift is due to changes in the dataset but I don't know how to show that quantitatively. What techniques are typically used to analyze and explain model (data)…
2
votes
0 answers

Detecting Data Drift in Audio Data

For a give set of audio files collected from an industrial process via a microphone, I have extracted suitable features and fed them into a neural network for training a binary classifier as depicted below. The model has been performing quite well…
1
vote
1 answer

Do you have to use clustering with SciKit-Learn's Mutual Information metric?

I'd like to calculate the mutual information between two datasets, but I'd prefer not to cluster them first. I'm thinking of using SciKit-Learn's mutual_info_score metric, but it's documentation suggests the inputs should be clusters, not whole…
1
vote
0 answers

Very different behaviors between PSI and KS tests for data drift

I want to set up a process for data drift and I am trying to see which metric to pick. Using KS test, it flags half of my features to be drifted although when I look at the distribution between the two datasets, I dont see much of variation. PSI…
Fatima
  • 71
  • 3
1
vote
1 answer

How train a pre-trained model based on new dataset?

I have trained a deep nn model based on some existing data. In the meantime, I have collected more data and label them so that I can feed it to the model to improve its performance. The questions is, should I feed: Option 1- New data to the already…
Mahdi Amrollahi
  • 263
  • 2
  • 10
1
vote
1 answer

What are the advantages of model drift vs concept drift in online learning?

I have asked this question here but I'm also posting it here to get a better insight: https://stats.stackexchange.com/questions/602282/what-are-the-advantages-of-model-drift-vs-concept-drift-in-online-learning Let's say I have a simple linear…
Ash
  • 130
  • 4
1
vote
1 answer

What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?

As a beginner in MLOps, I was overwhelmed by some confusing definitions. As far as I understand, when we have a classifier or regressor with y = f(X) function: Covariate Shift is changing the distribution of independent variables (X), Label Shift…
0
votes
1 answer

Can I use Population Stability Index (PSI) when observations have multiple variables?

I understand from resources like this one that the Population Stability Index (PSI) can be used to test for data drift when a machine learning model is in production. However, the resources I have looked at describe PSI in terms of a single…