What do warm steps and warmup proportion mean? how to select the number of warmup steps?
Learning rate changes for each batch or each epoch for warmup step=1 ?
What do warm steps and warmup proportion mean? how to select the number of warmup steps?
Learning rate changes for each batch or each epoch for warmup step=1 ?
Answering your four questions
References:
I will quote from several well-explaining resources.
a) Warm-up: A phase in the beginning of your neural network training where you start with a learning rate much smaller than your "initial" learning rate and then increase it over a few iterations or epochs until it reaches that "initial" learning rate.
Another nice explanation. This one also has an example code and graph.
Warmup is a method of warming up learning rate mentioned in ResNet paper. At the beginning of training, it uses a small learning rate to train some epoches or steps (for example, 4 epochs, 10000 steps), and then modifies it to the preset learning for training.
Now, carefully read this one from Stack Overflow:
A training step is one gradient update. In one step
batch_sizeexamples are processed. An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of:2,000 images / (10 images / step) = 200 steps.