Highest Voted 'data-drift' Questions - Data Science Stack Exchange

8

votes

1 answer

How to Combat Data Drift

I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…

asked Sep 14 '22 at 11:16

scott lucas

83
3

4

votes

3 answers

What techniques are used to analyze data drift?

I've created a model that has recently started suffering from drift. I believe the drift is due to changes in the dataset but I don't know how to show that quantitatively. What techniques are typically used to analyze and explain model (data)…

dataset machine-learning-model data-science-model concept-drift data-drift

asked Mar 16 '23 at 23:38

Connor

597
1
15

2

votes

0 answers

Detecting Data Drift in Audio Data

For a give set of audio files collected from an industrial process via a microphone, I have extracted suitable features and fed them into a neural network for training a binary classifier as depicted below. The model has been performing quite well…

machine-learning time-series audio-recognition concept-drift data-drift

asked Jan 12 '22 at 22:28

TwinPenguins

4,157
3
17
53

1

vote

1 answer

Do you have to use clustering with SciKit-Learn's Mutual Information metric?

I'd like to calculate the mutual information between two datasets, but I'd prefer not to cluster them first. I'm thinking of using SciKit-Learn's mutual_info_score metric, but it's documentation suggests the inputs should be clusters, not whole…

scikit-learn dataset clustering mutual-information data-drift

asked Mar 20 '23 at 14:04

Connor

597
1
15

1

vote

0 answers

Very different behaviors between PSI and KS tests for data drift

I want to set up a process for data drift and I am trying to see which metric to pick. Using KS test, it flags half of my features to be drifted although when I look at the distribution between the two datasets, I dont see much of variation. PSI…

data-drift

asked Mar 18 '23 at 02:26

Fatima

71
3

1

vote

1 answer

How train a pre-trained model based on new dataset?

I have trained a deep nn model based on some existing data. In the meantime, I have collected more data and label them so that I can feed it to the model to improve its performance. The questions is, should I feed: Option 1- New data to the already…

deep-learning gradient-descent data-drift

asked Feb 16 '23 at 09:59

Mahdi Amrollahi

263
2
10

1

vote

1 answer

What are the advantages of model drift vs concept drift in online learning?

I have asked this question here but I'm also posting it here to get a better insight: https://stats.stackexchange.com/questions/602282/what-are-the-advantages-of-model-drift-vs-concept-drift-in-online-learning Let's say I have a simple linear…

online-learning linear-models concept-drift data-drift

asked Jan 17 '23 at 20:56

Ash

130
4

1

vote

1 answer

What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?

As a beginner in MLOps, I was overwhelmed by some confusing definitions. As far as I understand, when we have a classifier or regressor with y = f(X) function: Covariate Shift is changing the distribution of independent variables (X), Label Shift…

data distribution mlops data-drift concept-drift

asked Nov 01 '22 at 11:35

Mohsen Mahmoodzadeh

43
1
5

0

votes

1 answer

Can I use Population Stability Index (PSI) when observations have multiple variables?

I understand from resources like this one that the Population Stability Index (PSI) can be used to test for data drift when a machine learning model is in production. However, the resources I have looked at describe PSI in terms of a single…

machine-learning machine-learning-model mlops deployment data-drift

asked Oct 17 '22 at 01:40

Fijoy Vadakkumpadan

113
4

Questions tagged [data-drift]