2

Given a hypothetical dataset {S} with 100 X feature variables and 10 predicted Y variables.

X1 ... X100 Y1 .... Y10
1 .. 2 3 .. 4
4 .. 3 2 .. 1

Let's say I want to improve the accuracy of Y1. I am prepared to constraint/remove the input variables in order to increase the accuracy. How would I go about finding the culprits for making Y1 more variable than needed?

E.g. I find that X49 adds the biggest swing in variance with Y1 and after constraining it Y1 is fitted better.

How would I go about finding it's X49?

EDIT: I'm asking for approaches towards sensitivity analysis. Not deciding which variables need to be removed. Let's assume all 100 X variables are important but some need to be constrained (e.g. X49)

1 Answers1

0

There might be a smarter method but I would simply try to fit a model without $X_i$ for every feature $X_i$ (and also a reference model with all the features). By contrast the model where $X_{49}$ is removed should obtain the lowest variance if $X_{49}$ is responsible for a lot of variance.

Be careful that in general a feature which causes a lot of variance is an important one, since if it wasn't important then it wouldn't have much impact on the target.

Erwan
  • 24,823
  • 3
  • 13
  • 34