4

I'm using a KMeans to get the profile of several users according to several columns (I'm working with RStudio).

To analyze my clusters, I decided to realize a radar chart, so I decided to use feature scaling : x-min(x)/diff(range(x)), to have my values in [0,1] (to get a quite good idea of my data per cluster). However, since there are multiple choice for normalization, I was wondering if doing my analysis with another choice for normalization - for instance : x-mean(x)/sd(x) - would give me the same results (in a general way at least)

Or am I completly wrong for considering my scaled data and should I use my unscaled data in my radar chart ?

MBB
  • 109
  • 6
  • 1
    Yes, it affects the result. There's no right way, and that's one problem with clustering. – Emre Nov 01 '17 at 00:07
  • I would standardize the data first before running K-Means. This [post](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering) provides some illustrative example! – Shadi Oct 31 '17 at 23:02

1 Answers1

4

Yes. Feature scaling can completely change the clustering result.

People usually scale data to [0:1] or to have a standard deviation of 1.

However, that is nothing but a heuristic.

In many cases, the need for scaling is nothing but a symptom, caused by inappropriate data for the method. You can't just fix this by some naive scaling, but it's just a hack that often works.

For statistically meaningful results, all axes should be scaled to reflect attribute relevance, such that a difference of 1 unit is of the same importance in each attribute.

Has QUIT--Anony-Mousse
  • 7,969
  • 1
  • 14
  • 30