This question is on an implementation aspect of scikit-learn's DecisionTreeClassifier().
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the scikit-learn DecisionTreeClassifier()?
The problem is that the input features to the classifier are not the original ones - they are numerically encoded ones from pandas DataFrame get_dummies.
For example, I take the mushroom dataset from the UCI repository.
Features in the dataset include - cap_shape, cap_surface, cap_color, odor, etc.
pandas dataframe getdummies encodes these into multiple features based on values of the original features.
Say cap_shape has values b,c,f,k...after encoding new columns are cap_shape_b, cap_shape_c, cap_shape_f. Similar transformations happen for other features.
After training, the classifier tells me that the top two features are:
cap_shape_b, cap_shape_c, cap_shape_f, odor_a,odor_c, odor_f,odor_l.
From this result thrown by the classifier, I would like my function to return the original features, that is, cap_shape and odor.