Does anyone have suggestions for specific algorithm or implementation for labeled data of only one class and unlabeled data that can be from either classes? And I'm unsure what is the proportion of Class A to B that exists within the unlabeled data and also my labeled data is not randomly chosen.
Asked
Active
Viewed 54 times
0
-
1"my labeled data is not randomly chosen." Can you explain this further? – Bert Kellerman May 28 '21 at 22:54
-
@BertKellerman I mean I haven't labeled the data by myself. I'm using a well-known source which has the label for only one class. – Deli May 28 '21 at 23:26
-
@BertKellerman you can ignore that part. I think I should use the one-class classifier but I'm not sure It's an appropriate method for my case where I 'm not sure about the proportion of Class A to B that exists within the unlabeled data – Deli May 28 '21 at 23:28
1 Answers
1
This is called PU Learning, and it can be used when using a probabilistic classifier and certain assumptions are met about how the data is labeled.
If the assumptions are met, you
- Label positive, already labeled instances as positive
- Labeled unlabeled instances as negative
- Train a probabilistic classifier.
This produces the same ranking of class probabilities as a classifier would if trained on a dataset labeled with true positive/negative labels.
This video covers the assumptions pretty well and the Elkan paper is pretty accessible.
Bert Kellerman
- 259
- 1
- 6