2

I have a feature vector with different data types, Considering all the data in that feature vector. I have to classify as Good or Bad.

Which algorithm should be used to just get the output Good or bad based on different data types in a feature vector?

The feature vectors are as follows:

[Application_Name(string) , Uptime (Integer) , Criticality factor (0-1 float value) and few integer type ]

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
Amruth
  • 29
  • 2
  • Don't you have any label for your features? – Green Falcon Feb 27 '19 at 08:52
  • 1
    @Media Updated feature vector in question , Application_Name(string) , Uptime (Integer) , Criticality factor (0-1 float value) and few integer type ] – Amruth Feb 27 '19 at 08:56
  • 1
    Is your task unsupervised? – Green Falcon Feb 27 '19 at 08:58
  • @Media , I am kind of new to this , anything is fine which is best ? All the values in vector are available and are uncorrelated . – Amruth Feb 27 '19 at 09:03
  • 1
    I think Media is asking if you already have a set of data labelled 'Good' and 'Bad' for you to train a model on? If no, then this is an unsupervised task. – Dan Carter Feb 27 '19 at 09:06
  • @DanCarter Yes this is unsupervised task , maybe I was thinking deciding the boundary based feedback , can the feedback value be included as parameter in feature vector ..Actually i Don;t know ? How do we decide the boundry – Amruth Feb 27 '19 at 09:12
  • How about random answering? How can you evaluate that what you are doing is not just random? – Has QUIT--Anony-Mousse Feb 27 '19 at 18:53

1 Answers1

2

Based on the comments I'll try to answer. I guess you don't have the corresponding labeles. What you can do as a solution is that you can use k-means algorithm as the easiest start point to and specify the hyper parameter, k, to two. Then you can find two clusters and you yourself can evaluate the results. As another approach, you can increase the size of k and again evaluate your answers. You can also use Gaussian Mixture Models for finding better non-convex clusters which have better results. The point is that you have to evaluate the results as an expert and label them manually. This task accelerates the labeling process. After that you can employ a simple MLP for finding a descriminative model.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98