0

I am reading this paper and came across this section. I am not able to understand how or intuitively see how the formula help to overcome optimizer stuck in local minima. Please explain how it works.

enter image description here

Ethan
  • 1,625
  • 8
  • 23
  • 39
chuackt
  • 101
  • 1
  • 1
    The answers to [What is momentum in neural network?](https://datascience.stackexchange.com/q/84167/135707) provide some good explanations of momentum, as well as references for further reading. – Lynn Oct 23 '22 at 07:55
  • Thanks for the link and mention the keyword "momentum". I found a more sensible explanation. This is what I'm actually looking for https://stats.stackexchange.com/questions/457212/how-does-stochastic-gradient-descent-with-momentum-distinguish-between-local-min . – chuackt Oct 24 '22 at 07:49

0 Answers0