Consider a simple 1-D function $y = x^2$ to find a maximum with the gradient ascent method.
If we start in point 3 on x-axis:
$$ \frac{\partial f}{\partial x} \biggr\rvert_{x=3} = 2x \biggr\rvert_{x=3} = 6 $$
This means that a direction in which we should move is a $6$.
Gradient ascent gives rule to update: x = old_x + learning_rate * gradient
What I can't understand why we need to multiply a learing_rate with gradient. Why we can't just use x = old_x + learning_rate * sign(gradient).
Because if we made a learning_rate step in a positive direction it is already a maximum switch of x we can make.
I know the reasoning behind finding maximum direction in this equation:
$$grad(())⋅⃗=|grad(())||⃗|cos() $$
But I can't undestand why just to accept a sign of gradient (plus or minus) is not enough for ascending.
