I'm trying to fine-tune a BERT-based model for a binary classification task (data is in English). The dataset I'm working with is quite small (~500 samples, out of which 80% are currently used for training), and I'm wondering if there is a rule of thumb for the minimal number of samples that is required to produce a decent model. I am able to increase the size of the dataset by labeling manually (though that would only be possible in the case of thousands of samples).
Any ideas? If this is problem-dependent, ideas on how to assess the size of dataset required would be appreciated.
Thanks!