I have some data X on which I want to do the following:
- Train two models; SVM and Logistic Regression
- Use a stacking classifier based on the models from (1)
- Calibrate the stacker from (2).
We want to train a stacking-classifier on data of which the model has not been trained i.e we could have X_train and X_stack where we train models in (1) on X_train and then use their predictions on X_stack to train (2).
Then, we want to calibrate thus we need another dataset X_cal. As you can see a lot of data-information is now lost for the (crucial) training part since we need those different datasets.
I'm thinking about using cross-validation to do this, but I aint sure how exactly that should be done. Note my models takes a fairly large amount of time to train, thus I was hoping that I could use X_train to train the two models only once, and then have one dataset X_other_stuff which could be used for all other "stuff" e.g training the stacking classifier (another logistic regression) and training the calibrator.
I know sklearn provides a calibrator and stacker which can do it by using CV, but as mentioned my models takes quite some time to train, doing a 5-fold (e.g training each model atleast 5 times) would simply be too time consuming thus I'm trying to write it my self.