I am working on a Vehicle detector using semi-supervised data. I created the data by applying another very performant Object Detector (which is slow and I can't use it for real-time detection).
I created data for multiple Videos of different streets. I have around 10 different videos and more than 35K frames. I created a training and validation datasets by splitting data into 80%-20%. I got around 90% for precision and recall on validation set.
But when I tested on two videos that were not used to create the train-validation dataset I get only 15%. So, my model kind of learning the sceneries that I used in training and that's why I get good results on validation with the same sceneries, but poor results on new sceneries.
I tried adding more videos, the model improved from 15% to 18% but it is still not enough.
I don't think adding more videos will really help. What can you propose me to do to make my model more generalizable ?