I am reading this paper and came across this section. I am not able to understand how or intuitively see how the formula help to overcome optimizer stuck in local minima. Please explain how it works.
Asked
Active
Viewed 49 times
0
-
1The answers to [What is momentum in neural network?](https://datascience.stackexchange.com/q/84167/135707) provide some good explanations of momentum, as well as references for further reading. – Lynn Oct 23 '22 at 07:55
-
Thanks for the link and mention the keyword "momentum". I found a more sensible explanation. This is what I'm actually looking for https://stats.stackexchange.com/questions/457212/how-does-stochastic-gradient-descent-with-momentum-distinguish-between-local-min . – chuackt Oct 24 '22 at 07:49
