Generic strategy for object detection

Question

I have a huge collection of objects from which only a tiny fraction are in a class of interest. The collection is initially unlabelled, but labels can be added using an expensive operation (for example, by human).

Currently I use the simple generic machine learning strategy:

Use hand-crafted rules to select a smaller subset of objects (thus leaving out a fraction of interesting ones).
Label part of the smaller subset, and use these for training and choosing a classification algorithm and its parameters.
Classify the remaining objects in the smaller set (and also perhaps in the big set).

This has two drawbacks:

The labeller still needs to see a huge number of uninteresting objects, and therefore is able to label only a very small fraction of interesting ones.
The objects not in the smaller set are completely ignored in the learning phase, resulting in a loss of some information (the classification algorithm might not work well on this complement).

It seems that it would be better to use online learning: i.e., select the objects to show to the labeller based on the previous labels. But then it becomes no longer obvious that the result of classification algorithm retains the nice theoretical properties (i.e., statistical consistency).

Is there a general framework for active object detection which works either theoretically or practically (or both)? I could not get the complete picture from the Wikipedia article active learning.

score 2 · Answer 1 · answered Feb 28 '17 at 12:46

The framework you cope with is semi supervised. You have mostly unlabelled data and you can have some labeled data by manual labelling.

Active learning is one method to cope with the situation, by focusing your labelling efforts in the most beneficial areas. You can read a survey on these techniques at Settles, Burr (2010), "Active Learning Literature Survey" (PDF), Computer Sciences Technical Report 1648. University of Wisconsin–Madison, retrieved 2014-11-18

Please note that even if you are focusing your labelling efforts using active learning, it is still a significant constraint.

There are other methods of copying with semi-supervised framework, like co training.The classical reference on co-training is "Blum, A., Mitchell, T. Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, 1998, p. 92-100"

Generic strategy for object detection

1 Answers1