What is zero-shot vs one-short vs few-shot learning?

Question

Are there any papers/research work that deals with generalizing the matrix of how the `*-shot(s) learning` are defined?

There's a wide variety of papers that titled themselves as *-shot(s) learning, with some variants of how *-shots are defined, e.g.

Classic ML learning [i.e. No free lunch] Model is trained on task A on training split of dataset B and evaluated using the same task A on held-out or cross-validated splits of dataset B
Zero-shot learning in the context pre-trained / foundation model
- Variant 1: [i.e. Train task A on data B, eval task A on data C] The model is trained on task A but on dataset B but the model seems to work for task A + eval on dataset C by capitalizing on representation learnt from task A + train on dataset B where dataset C and B shares some commonality but also different enough to call the usage of the model on dataset C "zero-shot"
  - i.e. Same task, different data domain/features
  - e.g. Model trained for Machine translation (task A) task on English-Spanish + French-Spanish (dataset B) and evaluated on French-Spanish data (dataset C)
- Variant 2: [i.e. Train on task X on data B, eval task Y with data B/C] The model is trained on task X and dataset B and evaluated on task Y on dataset C that has same properties as dataset B but may/may not different as much as how datasets differs in Variant 1.
  - i.e. Different task, same/different data domain/features
  - e.g. Model trained for language modelling (task X) with a classifier head on Wikipedia text (data B) and evaluated on text classification (task Y) on either Wiki text (data B) or user-generated Twitter text (data C)
One-shot learning in context of general gradient-based or optimization/prediction based ML
- Variant 1: Trained task A on a single epoch/pass through the dataset B and evaluated on held-out / cross-validated splits of dataset B
- Variant 2: Trained task A on a single data point from dataset B and evaluated on held-out / cross-validated splits of dataset B
- Variant 3: Trained task A on a single epoch/pass / a single data point from dataset B and evaluated in a "zero-shot" manner with task X on dataset B or dataset C
  - Is this then "zero-one-shot learning"? Is there a name for this? "One-shot learning/training, zero-shot evaluation"?
- Variant 4: Model is pre-trained for task A till convergence from dataset B and fine-tuned on a single epoch/pass / a single data point for either
  - task A with dataset C (one-shot domain adaptation)
  - task X with dataset B (one-shot task adaptation)
  - task X with dataset C (one-shot domain + task adaptation)

And for Few-shot learning, the premise seems to the same as one-shot but instead of a single epoch/data point, it's a few epoch/data points

To kind of put the above into tables:

Shots	(pre-)Train for	(pre-)Train on	(pre-)Train no. data	Tune for	Tune on	Tune no. data	Eval for	Eval on
Classic	task A	data B (train)	data B till converge	task A	data B (valid)	all data B (valid)	task A	data B (test)
Zero	task A	data B (train)	data B till converge	-	-	-	task A	data C
Zero	task A	data B (train)	data B till converge	-	-	-	task X	data B / C

Shots	(pre-)Train for	(pre-)Train on	(pre-)Train no. data	Tune for	Tune on	Tune no. data	Eval for	Eval on
One	task A	data B (train)	data B one epoch	-	-	-	task A	data B (test)
One	task A	data B (train)	data B one data point	-	-	-	task A	data B (test)
One	task A	data B (train)	data B one epoch / one data point	-	-	-	task X	data B / C
One	task A	data B (train)	all data B (train)	task X	data B / C	one epoch / one data point	task X	data B / C

Shots	(pre-)Train for	(pre-)Train on	(pre-)Train no. data	Tune for	Tune on	Tune no. data	Eval for	Eval on
Few	task A	data B (train)	data B a few epoch	-	-	-	task A	data B (test)
Few	task A	data B (train)	data B a few data point	-	-	-	task A	data B (test)
Few	task A	data B (train)	data B a few epoch / a few data point	-	-	-	task X	data B / C
Few	task A	data B (train)	all data B (train)	task X	data B / C	a few epoch / a few data point	task X	data B / C

The matrix of what counts as zero-shot, one-shot, few-shot is kinda fuzzy.

Are there other variants of the `*-shot(s) learning` that the above matrix didn't manage to cover?

Thanks for that question. I also struggle to get an unique definition and I feel like some paper use a loose definition on purpose to get a better title. One thing that might be of importance and that is not covered in your question is the size of the dataset. Single pass on a Small / Medium / Big / Gigantic dataset is not the same thing. — Lucas Morin, Apr 19 '23 at 15:04

What is zero-shot vs one-short vs few-shot learning?

Are there any papers/research work that deals with generalizing the matrix of how the *-shot(s) learning are defined?

Are there other variants of the *-shot(s) learning that the above matrix didn't manage to cover?

0 Answers0

Are there any papers/research work that deals with generalizing the matrix of how the `*-shot(s) learning` are defined?

Are there other variants of the `*-shot(s) learning` that the above matrix didn't manage to cover?