1

I am working on a dataset that has both categorical and numerical (continuous and discrete) features (26 columns, 30244 rows). Target is categorical (1, 2, 3) and I am performing EDA on this dataset.

  • The categorical features with numerical values (ex: gender has values 0 and 1) are also considered when taking the heatmap with seaborn. As per my knowledge, the heatmap is drawn to check the correlation between continuous numerical features right (correct me if I am wrong). Should I remove such features before taking the heatmap?
  • I have another feature named "born month". Is this also a categorical feature as it can only take values from 1-12? If so, I need to remove this one also before drawing the heatmap right?
  • Should I do a test like the Chi-Square test on those features?
Jonathan
  • 5,310
  • 1
  • 7
  • 21
leahnanno
  • 73
  • 1
  • 4
  • Presumably you must consider juristictional issues - including the juristiction where the data is created / stored and used. And of course then you might want to consider whether it's ethical to release some of that data.. – Mr R Jun 03 '21 at 21:06
  • @MrR Sorry but I didn't get you. – leahnanno Jun 04 '21 at 07:57
  • Hi @leahnanno - In some places (Europe?), California? and others there are very strict privacy rules - which will affect the data that is in the data set... Just removing the name may not be sufficient to anonymise it. – Mr R Jun 04 '21 at 12:37
  • @MrR oh, I don't have such a problem with this dataset. This dataset is provided to us by a company for research purposes. – leahnanno Jun 04 '21 at 15:41

0 Answers0