Are there any papers/research work that deals with generalizing the matrix of how the *-shot(s) learning are defined?
There's a wide variety of papers that titled themselves as *-shot(s) learning, with some variants of how *-shots are defined, e.g.
Classic ML learning [i.e. No free lunch] Model is trained on
task Aon training split ofdataset Band evaluated using the sametask Aon held-out or cross-validated splits ofdataset BZero-shot learning in the context pre-trained / foundation model
- Variant 1: [i.e. Train task A on data B, eval task A on data C] The model is trained on
task Abut ondataset Bbut the model seems to work fortask A + eval on dataset Cby capitalizing on representation learnt fromtask A + train on dataset Bwhere dataset C and B shares some commonality but also different enough to call the usage of the model on dataset C "zero-shot"- i.e. Same task, different data domain/features
- e.g. Model trained for Machine translation (
task A) task on English-Spanish + French-Spanish (dataset B) and evaluated on French-Spanish data (dataset C)
- Variant 2: [i.e. Train on task X on data B, eval task Y with data B/C] The model is trained on
task Xanddataset Band evaluated ontask Yondataset Cthat has same properties asdataset Bbut may/may not different as much as how datasets differs in Variant 1.- i.e. Different task, same/different data domain/features
- e.g. Model trained for language modelling (
task X) with a classifier head on Wikipedia text (data B) and evaluated on text classification (task Y) on either Wiki text (data B) or user-generated Twitter text (data C)
- Variant 1: [i.e. Train task A on data B, eval task A on data C] The model is trained on
One-shot learning in context of general gradient-based or optimization/prediction based ML
Variant 1: Trained
task Aon a single epoch/pass through thedataset Band evaluated on held-out / cross-validated splits ofdataset BVariant 2: Trained
task Aon a single data point fromdataset Band evaluated on held-out / cross-validated splits ofdataset BVariant 3: Trained
task Aon a single epoch/pass / a single data point fromdataset Band evaluated in a "zero-shot" manner withtask Xondataset Bordataset C- Is this then "zero-one-shot learning"? Is there a name for this? "One-shot learning/training, zero-shot evaluation"?
Variant 4: Model is pre-trained for
task Atill convergence fromdataset Band fine-tuned on a single epoch/pass / a single data point for eithertask Awithdataset C(one-shot domain adaptation)task Xwithdataset B(one-shot task adaptation)task Xwithdataset C(one-shot domain + task adaptation)
And for Few-shot learning, the premise seems to the same as one-shot but instead of a single epoch/data point, it's a few epoch/data points
To kind of put the above into tables:
| Shots | (pre-)Train for | (pre-)Train on | (pre-)Train no. data | Tune for | Tune on | Tune no. data | Eval for | Eval on |
|---|---|---|---|---|---|---|---|---|
| Classic | task A | data B (train) | data B till converge | task A | data B (valid) | all data B (valid) | task A | data B (test) |
| Zero | task A | data B (train) | data B till converge | - | - | - | task A | data C |
| Zero | task A | data B (train) | data B till converge | - | - | - | task X | data B / C |
| Shots | (pre-)Train for | (pre-)Train on | (pre-)Train no. data | Tune for | Tune on | Tune no. data | Eval for | Eval on |
|---|---|---|---|---|---|---|---|---|
| One | task A | data B (train) | data B one epoch | - | - | - | task A | data B (test) |
| One | task A | data B (train) | data B one data point | - | - | - | task A | data B (test) |
| One | task A | data B (train) | data B one epoch / one data point | - | - | - | task X | data B / C |
| One | task A | data B (train) | all data B (train) | task X | data B / C | one epoch / one data point | task X | data B / C |
| Shots | (pre-)Train for | (pre-)Train on | (pre-)Train no. data | Tune for | Tune on | Tune no. data | Eval for | Eval on |
|---|---|---|---|---|---|---|---|---|
| Few | task A | data B (train) | data B a few epoch | - | - | - | task A | data B (test) |
| Few | task A | data B (train) | data B a few data point | - | - | - | task A | data B (test) |
| Few | task A | data B (train) | data B a few epoch / a few data point | - | - | - | task X | data B / C |
| Few | task A | data B (train) | all data B (train) | task X | data B / C | a few epoch / a few data point | task X | data B / C |
The matrix of what counts as zero-shot, one-shot, few-shot is kinda fuzzy.