1

I would have a question on heat map and correlation among variables. I created this heat map, looking at possible correlation among variables and target. I got very small values. I wanted to set a small threshold, e.g., 0.05, for selecting features. Do you think it makes sense, or should I exclude all of them?

enter image description here

Math
  • 151
  • 13
  • 1
    Do you mean you want to filter input variables with a high correlation among them? Or exclude input variables with low correlation with the target? – German C M Feb 27 '21 at 18:53
  • I have tried to select highly correlated features as follows: important_features = correlation_target[correlation_target >0.05] . The model improved its performance, but I do not know if this kind of selection can make sense – Math Feb 27 '21 at 19:31

1 Answers1

1

From the info you provide, it seems you are carrying feature selection based on the correlation between your predictor variables and the target.
This is correct as a type of feature selection (see here) in the family of univariate filter selection, although not the only one. It is fast and intuitive, although you can have a look at other methods. You might also be interested in:

  • variance threshold selection (also per input feature, univariate filter method): it assumes that higher variance in a feature values could mean more prediction power
  • sequential backward selection (look here): it means more performance cost, but features are judged in subsets (not independently as above) and is ok if you don't have many features (as it seems to be)

There are many other strategies for feature selection (you might want to check for this source)

German C M
  • 2,674
  • 4
  • 18