Are more target labels in a multi-label classification always better?

Question

Context

We work on medical image segmentation. There are a lot of potential labels for one and the same region we segment. There can be different medically defined labels like anatomical regions, more biological labels like tissue types or spatial labels like left/right. And many labels can be further differentiated into (hierarchical) sub labels.

Clarification

The question is with respect to the number of classes / target labels which are used in a multi-label classification/segmentation. It is not about the number of samples and not about the number of input features.

Hypothesis

Are more target labels in a multi-label classification always better?

Yes, even if these labels are not used for the final task, they act as additional features / more knowledge during training.

1.1. Yes, but only if the labels are good / not misleading.

1.2. Yes, even subpar labels can act as noisy labels still supporting the training.

No, more specialized models work better and are easier to train (including only the really relevant labels)

A colleague and I had a disagreement on this and I was sure that someone already did some research on this. Surprisingly, my short literature search did not bring up anything useful. Most results are on "more samples" or "more features". For more samples my understanding is that more samples is generally better but there is a point of no return. For more features the current state seems to be "it depends". I would guess that both of these results, are also applicable to "more labels" but I would be interested in further insights.

Are you aware of any research covering this question? Any personal experience in multi-label classification/segmentation?

In response to @Erwan's reply, let's assume the 2 classes vs. 100 classes example.

The point on overall performance undoubtably being lower makes sense (random baseline example). But that's not really a metric one cares about. What we care about is the individual label performance. The scenario we have is that these 2 classes are mainly important but we also have knowledge about these other 98 classes. The question then becomes if we should include all that information during training (in the form of labels) or not and how that will affect the performance of the 2 classes. My thought process is with respect to multitask learning (which by my knowledge is proven to work well).

This could hint again towards a "it depends".

For example, given a hierarchy of labels, let's say class 1, 2 and 3. Class 2 and 3 can only be a subset of class 1's area.

If we are only interested in class 2 and 3, would including class 1 help? My assumption would be that class 1 adds context and limits to class 2 and 3.
If we are only interested in class 1, would including class 2 and 3 help? Here I'm unsure. Class 2 and 3 could give additional structure / information on the expectable content of class 1. So it could make it more robust.

(disclaimer: I'm biased towards "more labels" being better)

Welcome to DataScienceSE. I'm not expert in image classification but in general the answer is no: more labels makes the problem harder for the model because it must find even more subtle distinguish patterns between the labels, and sometimes these patterns don't exist so this causes more prediction errors. Keep in mind that a random baseline achieves 50% accuracy with 2 classes, only 1% accuracy with 100 classes. Of course this all depends on the data, but generally I'd suggest starting with a small number of classes and later add more classes if this works well. — Erwan, Jun 01 '22 at 08:53
@Erwan Thanks for your input. I did take up your example and my assumptions with respect to that in my question above. — Spenhouet, Jun 01 '22 at 10:30
What matters is whether there is enough information in the features to determine the label. This is why more instances tend to improve performance: the data is potentially a more representative sample, allowing the model to capture patterns more reliably, This does not extend to labels, because the labels are not an input information that the model can use, it's the output. Assuming that there is enough information in the features in general, more labels means that the model needs more instances, because it needs to find more patterns to distinguish the different classes. — Erwan, Jun 01 '22 at 16:14
The model needs a representative sample for every class in the training data, otherwise there's a high risk of overfitting. — Erwan, Jun 01 '22 at 16:21
In the data we work with all samples contain all classes. "the labels are not an input information the model can use" -> but is that true? It's a train time input, right? It is additional information provided during train time. — Spenhouet, Jun 01 '22 at 17:38
Oh sorry, I totally forgot that this is multi-label classification, my previous comments are wrong actually. Ok so multi-label is equivalent to training an independent binary classification model for every class. So in this case the classes don't have any influence on each other, as opposed to multi-class classification. This implies that the number of classes simply doesn't matter, but of course each class/model still needs a representative sample. About the class as input information in the training: of course, but the job of the model is to represent the relation features -> class. ... — Erwan, Jun 01 '22 at 17:57
... So the information that the model can use for this is only about the features, it doesn't have any choice about the class, it "uses" it only to the extent that it predicts yes or no about a particular class. For example if one wants to guess a person's age, they will use any info available (physical features, medical history..) but the age itself is the target, it's not a usable information. — Erwan, Jun 01 '22 at 18:00

Are more target labels in a multi-label classification always better?

0 Answers0