I am trying to understand how recursive feature elimination with cross validation works (the RFECV on sklearn). Lets say that we have 10 features, and we perform RFECV with min_features_to_select=3, cv=5 and scoring=auc.
Lets, consider the first elimination. We apply the estimator 5 times, each time using one of the 5-folds as test set and the remaining as train set. Each time we apply the estimator we get back a model that we evaluate on the test set. Now since we have 5 models, how we select which feature to discard?
Do we take the average coefficient for each feature and drop the one with the lowest average coefficient (or importance)? Does such an approach make sense?
Or do we retrain the estimator one the whole set and drop the feature with lowest coefficient (importance)? Does this mean that we use the average AUC over the 5 folds just to keep a track of the performance? That is, if in the next step the CV performance is lower than before, then we stop dropping features?