Classification with a lot of the classes

Question

I’m trying to make model which will classify text into about 500 different classes. I think that I have to customize architecture of the Pooling Classifier which looks now like this:

(1): PoolingLinearClassifier(
(layers): Sequential(
   (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   (1): Dropout(p=0.2, inplace=False)
   (2): Linear(in_features=1200, out_features=50, bias=True)
   (3): ReLU(inplace=True)
   (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   (5): Dropout(p=0.1, inplace=False)
   (6): Linear(in_features=50, out_features=498, bias=True)
)

I think that I have to change in (2): Linear layer to have more out_features because in the last (6) Linear layer I predict more out_features than I’ve got in_features. What do you think?

Best regards

score 0 · Answer 1 · answered Aug 28 '23 at 14:43

You created a bottleneck by reducing dimensions and than upscaling later. Bottlenecks are normal in neural nets, we used them in our CNN-s when working with images and audio. However, using them in linear classifiers is not really something I can say I saw before.

Your data is noisy by nature, text is famously hard to work with (because of the way meaning in text is built over multiple sentences and phrases that can be changed by a single word three sentences later) so making a bottleneck is the last thing I would advise you to do... especially when using only one hidden layer - make the first hidden layer a lot wider than the first one (let's say, 4096 and bigger) and then start slowly cutting it in half until you get to your last layer. You are doing a classification, so the output layer should have a SoftMax activation function and not a linear one. The dropout could be a bit bigger, put it around 0.33 and up to 0.5 (your dropouts of 0.2 and 0.1 are for CNNs and RNNs).

I hope your data is more than 500k objects in size, even that could happen to be not sufficient for the task at hand given you are working with text, represented as an 1200-elements array with 500 different classes.

score 0 · Answer 2 · answered Mar 11 '20 at 11:12

0

Try pooling in between, that will reduce size so that its compactible. Have a look here

answered Mar 11 '20 at 11:12

Noah Weber

5,609
1
11
26

1

But is it wrong that in last linear layer I want to predict 498 classes from only 50 "in_features", isn't it? – maliniaki Mar 11 '20 at 14:33

Classification with a lot of the classes

2 Answers2