Does finetuning BERT involving updating all of the parameters or just the final classification layer?

Question

Currently learning and reading about transformer models, I get that during the pretraining stage the BERT model is trained on a large corpus via MLM and NSP. But during finetuning, for example trying to classify sentiment based on another text, are all of the BERT parameters (110M+ parameters + final classification layer) updated or just only final classification layers? Couldn't find a concrete answer to this in the resources I've been looking at.

Thank you in advance.

score 1 · Answer 1 · answered Sep 04 '20 at 21:15

1

Both approaches are reasonable. Updating the BERT weights will train for longer period of time, but should give more accurate results.

answered Sep 04 '20 at 21:15

Akavall

904
5
11

score 1 · Answer 2 · answered Nov 12 '20 at 23:21

By default, BERT fine-tuning involves learning a task-specific layer (For classification task, a neural network on top of the CLS token), as well as update the existing parameters of the model to adapt for the task. Thus, it's both, new layer + BERT model weights. However, you still have a choice of using just the emebdding of CLS token and train only the layer on top of it to reduce the training complexity. However, its a matter of trade-off between performance and the compute cost.

Does finetuning BERT involving updating all of the parameters or just the final classification layer?

2 Answers2