I am wondering which data cleaning steps should be performed if you want to re-fine a BERT model on custom text data.
Which steps should be performed?
Does it make sense to perform a stemming or lemmatization if it has not been applied to the initial training of the BERT Base/Large model?