I was just looking at the docs on Pytorch for different available schedulers and I found one that I am having some trouble understanding here.
The others seem to make sense: As training progresses, the learning rate gradually decreases. But in my study so far, I am yet to come across a model that needs such a "kickstart" mechanism.
Could someone please help me figure out why we need this?
