I am using lightGBM on time series data. I first split my data set into 10% folds. The last fold is used as a test set.
For each choice of hyperparameters I first use 6 folds to train, then predict on the 7th. I then use 7 folds to train, 8 to predict, then train on 8 predict 9th. I take the average error for each fold, and pick hyperparams which minimise this. I use these hyperparameters and train on 9 folds predict 10th as final model performance estimate.
The final model will just be trained on all data. Does the above make sense?