I have dataset with two label class (good and bad), I want to apply K Means on my dataset using python, should I use that label dataset or I have to delete the label class column ?
Asked
Active
Viewed 70 times
0
-
1K-means clustering is done to give labels to data. You already have those, so why are you applying k-means? What is the problem statement? – bkshi Feb 09 '19 at 08:58
-
I think the OP actually meant the dataset contains a `binary feature`. – Louis T Feb 09 '19 at 09:49
-
Possible duplicate of [K-Means clustering for mixed numeric and categorical data](https://datascience.stackexchange.com/questions/22/k-means-clustering-for-mixed-numeric-and-categorical-data) – Louis T Feb 09 '19 at 09:50
-
Thanks all for your answers , yes my dataset consists of binary features and I want to use clustering to test new samples – lona Feb 11 '19 at 03:23
1 Answers
1
Delete the label column.
Assuming that you want to compare the clusters to the labels later, then the labels must not be part of the data passed to k-means.
And k-means only works well on continuous variables anyway.
Has QUIT--Anony-Mousse
- 7,969
- 1
- 14
- 30