Currently learning and reading about transformer models, I get that during the pretraining stage the BERT model is trained on a large corpus via MLM and NSP. But during finetuning, for example trying to classify sentiment based on another text, are all of the BERT parameters (110M+ parameters + final classification layer) updated or just only final classification layers? Couldn't find a concrete answer to this in the resources I've been looking at.
Thank you in advance.