I have trained a Random Forest Model on some dataset and like to predict outcomes on other data which were not seen in training. When doing this, I get
ValueError: Number of features of the model must match the input. Model n_features is 12 and input n_features is 13
The problem is that there are some variables from the training data not existent in my prediction set. E.g. I capture the count of some feature via dummy variables D_0, D_1, D_2, D_3 indicating the number of occurences of D. I might have no D_2 in my training data but D_2 in my prediction data set.
What's best practice in such a case? I am planning to use this estimator repeatedly on future data and I can't know which features will be existent. Should I rather check for inconsistencies between both feature lists and manually correct those which do not overlap? In the above example, I'd code all occurences of D_2 to D_3 in order to align feature lists.