How much can the AUC improve comparing the raw dataset and the feature engineered dataset?

Question

Let's say I put the following two datasets in the best possible model (same model for both):

A raw dataset, the variables as they came just from the query.
A feature-engineered dataset, with hundreds of created variables, which came from the same raw dataset I just mentioned.

Could the difference between both AUCs be high? How much?

Any ground-rules here, on what "raw vs feature-engineered" and "best possible model" can mean? — Ben Reiniger, Jan 17 '20 at 21:58
Yes. Raw: The variables have missing values, none grouping variable is derived (mean by group or similar), no summations A+B, or A-B, ratios, A/B or similar are calculated. Feature-Engineered: Mean-encoding, Frequency encoding, Impact-encoding, separation in ranges, ranks, lagged variables. a new variable defined from cluster. Best model: Let's say XGBoost. — Juan Esteban de la Calle, Jan 17 '20 at 22:07

score 3 · Answer 1 · answered Jan 19 '20 at 01:40

Yes, the performance can vary a lot using feature engineering.

Example: suppose a dataset where the response variable $y$ is true if $x$ is odd.

x    y

346  F
13   T
178  F
64   F
987  T
...

Most learning models will fail to identify the pattern and will perform poorly, usually falling back to always predicting the majority class. However simply adding a feature $x \% 2$ to the data will allow any model to perform perfectly.

Of course this a toy example, but the point is that a single well chosen feature can drastically change the performance. Naturally the increase in performance totally depends on the data and the nature of the features added.

score 2 · Answer 2 · answered Jan 17 '20 at 22:51

I would say that the best possible model for the raw data would derive all the meaningful features that you would have created from the data anyway.

And I would say that the best possible model for the feature-engineered model will remove/ignore unnecessary features.

The best possible model would have AUC of 1 anyway. It makes all predictions correctly.

But even in the context of noise where AUC of 1 can not be achieved, I think the argument holds.

But learning rate/convergence speed may vary.

How much can the AUC improve comparing the raw dataset and the feature engineered dataset?

2 Answers2