How many samples in dataset are required to fine-tune BERT for binary classification?

Question

I'm trying to fine-tune a BERT-based model for a binary classification task (data is in English). The dataset I'm working with is quite small (~500 samples, out of which 80% are currently used for training), and I'm wondering if there is a rule of thumb for the minimal number of samples that is required to produce a decent model. I am able to increase the size of the dataset by labeling manually (though that would only be possible in the case of thousands of samples).

Any ideas? If this is problem-dependent, ideas on how to assess the size of dataset required would be appreciated.

Thanks!

I am not sure that there is a rule of thumb. I believe the type of dataset affects the size. I would give it a go with your data size. Nonetheless, since your dataset is small, to estimate its validity, I would use cross-validation. Something like [70% train, 20% eval, 10% test] ten times to cover the whole dataset. If the results are fairly standard (i.e. small variance) it's a good sign. If not, more data would be needed. — 20-roso, Dec 20 '22 at 13:23

How many samples in dataset are required to fine-tune BERT for binary classification?

0 Answers0