0

I have dataset with two label class (good and bad), I want to apply K Means on my dataset using python, should I use that label dataset or I have to delete the label class column ?

Has QUIT--Anony-Mousse
  • 7,969
  • 1
  • 14
  • 30
lona
  • 119
  • 3
  • 1
    K-means clustering is done to give labels to data. You already have those, so why are you applying k-means? What is the problem statement? – bkshi Feb 09 '19 at 08:58
  • I think the OP actually meant the dataset contains a `binary feature`. – Louis T Feb 09 '19 at 09:49
  • Possible duplicate of [K-Means clustering for mixed numeric and categorical data](https://datascience.stackexchange.com/questions/22/k-means-clustering-for-mixed-numeric-and-categorical-data) – Louis T Feb 09 '19 at 09:50
  • Thanks all for your answers , yes my dataset consists of binary features and I want to use clustering to test new samples – lona Feb 11 '19 at 03:23

1 Answers1

1

Delete the label column.

Assuming that you want to compare the clusters to the labels later, then the labels must not be part of the data passed to k-means.

And k-means only works well on continuous variables anyway.

Has QUIT--Anony-Mousse
  • 7,969
  • 1
  • 14
  • 30