3

I'm looking to train an Electra model using unlabelled data in a specific field. Are there any objections to using the same data for unsupervised learning and then using the same data downstream for the supervised learning task?

user103134
  • 31
  • 1

1 Answers1

1

Not at all. A recent ACL paper by AllenAI even says this is the best way. They recommend continuing pre-training on the task data and claim that it reduces the problems caused by domain mismatch. So, if you train the model on the in-domain data from the very beginning, it is probably a good thing given you have enough data for that.

Jindřich
  • 1,661
  • 5
  • 8