I am trying to build machine learning models (GBM, RF, Staking) on top of a dataset that is about 3G in size on my local computer. However, I only have 4G memory (only 2G are available).
My question is : is it logical to split the whole data on 20% for training set, 10% on validation set, and the 70% for the testing part? I split also the test set on 7 equal subsets with same distributions. I am doing this due to the fact I cannot test a model on full dataset.
Still I am not really convinced about this solution, and I am not sure that it is good enough to get a robust final model. What can I do? I am new to machine learning and big data.