1

I have data obtained from a survey and I would like to perform a 3D clustering of the individuals who have answered the survey based on 3 of the questions they have answered:

  • Are you satisfied with product x? The range of responses is: strongly agree, agree, neutral, not satisfied, not satisfied at all.

  • Would you buy product x again? the range of answers is: strongly agree, agree, neutral, disagree, do not agree, not at all agree

  • How much would you recommend product x to your family, friends... (range from 0 to 10)?

The thing is that I am not sure how to start, do you think I should first convert the answers of the first two questions to numbers (for example: strongly agree : 1)? I also have doubts about the libraries that allow me to do this.

The result I would like to get would be something like the example graphical representation you can see on the following page, rotated and showing the clusters found: https://projector.tensorflow.org/

Thisisme
  • 11
  • 1

1 Answers1

1

Convert your categorical choices into a range of 0 to 1. Convert your 1-10 scale to 0-1. Throw sklearn k-means at it. Use the elbow method for deciding how many clusters there are.

For plotting the clusters, plotly 3d scatterplot will get you up and running (with separate colors for each cluster) very quickly. I question if this is the right technique tho ... you will have points on top of each-other. You could jitter the points(add noise in each direction) or use the opacity of the color to designate how many points belong at that point.

Michael Higgins
  • 351
  • 2
  • 7