It's a mistake to use LabelEncoding for a categorical feature, it should be used only for a categorical target variable. This is because it converts values to integers, hence introducing an arbitrary order over the values.
It's not the values which don't appear in the training set which cause the poor performance (you can check), it's very likely because your model overfits: since there are many different values in the features, you would need a massive number of instances so that the model gets a representative sample for every value. Of course data is never like that, and it's clear from your description that some values occur too rarely (that's why some occur only in the test set).
The solution is to simplify the data, so that the model doesn't rely on patterns which appear by chance in the training set:
- Replace values which appear rarely with a special value, e.g.
RARE_VALUE. Try different thresholds for the minimum frequency.
- Encode categorical features with one hot encoding (OHT).
- Since the rare values were removed, the number of OHT features will be lower. In order to avoid overfitting, the ratio instances / features should be high enough.
- In case there are still values in the test set which don't occur in the training set, replace them with your special value
RARE_VALUE.