1

Let's say I have a dataset of the occurrence of pregnancies each time is tried, the ground truth of success to failure rate is 30:70. But the dataset with me now is a 70:30 dataset. How would that be an issue when modelling?

I understand that most modern algorithms can process disproportionate data well. But will the misrepresented data affect the outcome of the model?

I've tried looking for papers and articles and it doesn't seem to answer my question. Links to papers and websites would be appreciated!

user122977
  • 11
  • 2
  • 1
    in general yes the outcome will be affected since statistical learning learns on train samples if these are disproportionate then learning will be disproportionate – Nikos M. Aug 13 '21 at 18:04
  • Related; https://datascience.stackexchange.com/q/99954/64377 – Erwan Aug 13 '21 at 18:12

0 Answers0