How does a misrepresented disproportionate data affects modelling?

Asked Aug 13 '21 at 09:58

Active Aug 13 '21 at 10:13

Viewed 30 times

Let's say I have a dataset of the occurrence of pregnancies each time is tried, the ground truth of success to failure rate is 30:70. But the dataset with me now is a 70:30 dataset. How would that be an issue when modelling?

I understand that most modern algorithms can process disproportionate data well. But will the misrepresented data affect the outcome of the model?

I've tried looking for papers and articles and it doesn't seem to answer my question. Links to papers and websites would be appreciated!

edited Aug 13 '21 at 10:13

asked Aug 13 '21 at 09:58

user122977

1

in general yes the outcome will be affected since statistical learning learns on train samples if these are disproportionate then learning will be disproportionate – Nikos M. Aug 13 '21 at 18:04
Related; https://datascience.stackexchange.com/q/99954/64377 – Erwan Aug 13 '21 at 18:12

How does a misrepresented disproportionate data affects modelling?

0 Answers0