2

While approximating gradients, using actual epsilon to shift the weights results in wildly big gradient approximations, as the "width" of the used approximation triangle is disporportionately small. In Andrew NG-s course, he is using 0.01, but I suppose it's for example purposes only.

This makes me wonder, is there a method to chose the appropriate epsilon value for gradient approximation based on e.g. the current error value of the network?

Dávid Tóth
  • 145
  • 5

1 Answers1

1

It sounds like the epsilon value is a hyperparameter and the error value is an evaluation metric. Given that, cross-validation can be used to find the epsilon value than minimizes the error value.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102