Questions tagged [finetuning]

70 questions
12
votes
2 answers

What are the good parameter ranges for BERT hyperparameters while finetuning it on a very small dataset?

I need to finetune BERT model (from the huggingface repository) on a sentence classification task. However, my dataset is really small.I have 12K sentences and only 10% of them are from positive classes. Does anyone here have any experience on…
zwlayer
  • 239
  • 1
  • 2
  • 8
4
votes
1 answer

Which is the fastest image pretrained model?

I had been working with pre-trained models and was just curious to know the fastest forward propagating model of all the computer vision pre-trained models. I have been trying to achieve faster processing in one-shot learning and have tried the…
3
votes
1 answer

How to combine different models in Keras?

I have a pre-trained network, consist of two parts, the feature extraction, and the similarity learning. The network takes two inputs and predicts the images are same or not. The feature extraction part was VGGNet 16 with all layers freezed. I only…
2
votes
1 answer

Combining textual and numeric features into pre-trained Transformer BERT

I have a dataset with 3 columns: Text Meta-data (intending to extract features from it, then use those i.e., numerical features) Target label Question 1: How can I use a pre-trained BERT instance on more than the text? One theoretical solution…
2
votes
2 answers

Does finetuning BERT involving updating all of the parameters or just the final classification layer?

Currently learning and reading about transformer models, I get that during the pretraining stage the BERT model is trained on a large corpus via MLM and NSP. But during finetuning, for example trying to classify sentiment based on another text, are…
spnc
  • 21
  • 2
2
votes
1 answer

LMM Fine Tuning - Supervised Fine Tuning Trainer (SFTTrainer) vs transformers Trainer

When should one opt for the Supervised Fine Tuning Trainer (SFTTrainer) instead of the regular Transformers Trainer when it comes to instruction fine-tuning for Language Models (LLMs)? From what I gather, the regular Transformers Trainer typically…
2
votes
0 answers

What is zero-shot vs one-short vs few-shot learning?

Are there any papers/research work that deals with generalizing the matrix of how the *-shot(s) learning are defined? There's a wide variety of papers that titled themselves as *-shot(s) learning, with some variants of how *-shots are defined,…
2
votes
1 answer

Does GPT-3 remember data from prompts used to fine tune it?

I am trying to fine tune a model using OpenAI's fine tuning API. I am passing bodies of text (for example, news paper articles) as prompts and the data I want from it as completions. Let us consider the following: if a newspaper article I passed as…
2
votes
0 answers

How many samples in dataset are required to fine-tune BERT for binary classification?

I'm trying to fine-tune a BERT-based model for a binary classification task (data is in English). The dataset I'm working with is quite small (~500 samples, out of which 80% are currently used for training), and I'm wondering if there is a rule of…
Occasus
  • 21
  • 1
2
votes
1 answer

Is it possible to add new vocabulary to BERT's tokenizer when fine-tuning?

I want to fine-tune BERT by training it on a domain dataset of my own. The domain is specific and includes many terms that probably weren't included in the original dataset BERT was trained on. I know I have to use BERT's tokenizer as the model was…
user123635
  • 21
  • 1
  • 2
1
vote
1 answer

Fine tune the RetinaNet model in PyTorch

I would like to fine the pre-trained RetinaNet model available in torchvision in order to create my own object detection. I'm trying to replicate what is done for the FastRCNN at this…
xcsob
  • 193
  • 1
  • 4
1
vote
1 answer

Why not using linear regression for finetuning the last layer of a neural network?

In transfer learning, often only the last layer of the network is retrained using gradient descent. However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or…
1
vote
2 answers

Difference between using BERT as a 'feature extractor' and fine tuning BERT with its layers fixed

I understand that there are two ways of leveraging BERT for some NLP classification task: BERT might perform ‘feature extraction’ and its output is input further to another (classification) model The other way is fine-tuning BERT on some text…
MilaHalina
  • 11
  • 1
  • 2
1
vote
1 answer

Train on multi-domains, then fine-tune on specific domain

Would it make sense to first train a model on images from multiple domains, and then do "fine-tuning" on one specific domain to improve its performance on it? For instance, one could train an object detector based on cars camera recorded in NYC,…
1
vote
0 answers

Post-classification after inference in deep learning models

I designed a fire detection using Deep Learning binary classification in Keras (fire vs none). It's a simple model with a few layers. In my training dataset, I included both fire and smoke, and they are both detected (all under "fire"; mostly real…
1
2 3 4 5