3

This might not be a very good question, but I would still ask if it's beneficial to do EDA before running a clustering algorithm?

I understand that EDA helps us generate good and helpful insights into the data, which is crucial in data understanding. If we leave aside standard checks and manipulations like - removing outliers, scaling, removing constant value columns, removing null/'zero' value columns, etc. and if we have 20-30 features. How will EDA help me in producing good and sensible clusters? Is it even necessary to do the EDA before clustering?

Note: I am using k-means

Akash Dubey
  • 676
  • 2
  • 5
  • 16

1 Answers1

0

How would you know you have to do cluster analysis before looking at your data ?

Setting aside data quality questions (which you should never do), a bare minimum of EDA will help you :

  • Know if it's relevant to do a clustering analysis (rarely imo)
  • Know if K-means is the best clustering tool (rarely imo)
  • Get an idea of the number of the clusters

Then you should do some EDA after, to understand what are the clusters you have selected.

Edit: Basically, it will help you answers this kind of question : How do I interpret my result of clustering?

Lucas Morin
  • 2,513
  • 5
  • 19
  • 39