0

I'm trying to create a machine learning project for the prediction of Pre-Eclampsia (hypertension condition in pregnant women from the 20th week of pregnancy).

I have access to some clinical data of pregnant women with the potential to predict this condition, however, there is no target column with the prediction of this condition (example: a column called “Pre-Eclampsia”, where its rows would have values true or false). And because I'm not a doctor, I don't believe that manually entering this column to use a supervised algorithm is adequate, even if by reading articles, I can get a sense of the conditions that classify someone with this disease.

Therefore, I would have to opt for unsupervised learning and I was recommended to use MeanShift. I started using it, after cleaning and adjusting the dataset, but I would like to be able to identify which instances are in each generated cluster.

MeanShift application

As you can see, there are groups of instances and I would like to know what are the attributes of these instances, so that I could find out (perhaps with a doctor) which of these groups have the condition of Pre-Eclampsia.

If you have new ideas and suggestions, please feel free to speak up.

Below, some data in code (Python) of how I'm applying MeanShift.

Dataset used to predict the condition

bandwidth = estimate_bandwidth(data, quantile=0.2)

ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit_predict(data)
labels = ms.labels_
cluster_centers = ms.cluster_centers_

labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)

print(f'Número estimado de clusters: {n_clusters_}')

>>> Número estimado de clusters: 8

Thanks in advance for all your help.

0 Answers0