The question is pretty simple.
In stacking, the predictions of level 0 models are being used as features to train a level 1 model.
However, the predictions of what data? Intuitively it makes more sense to predict the test set and use those results to train the final classifier.
I am not sure whether this results in data leakage, I don't think this results to data leakage (since the final classifier has only information that the initial ones do, ie. only from the train data - it doesn't know if those predictions are good or not).
Is this reasoning correct?