Constant Learning Rate for Gradient Decent

Question

Given, we have a learning rate, $\alpha_n$ for the $n^{th}$ step of the gradient descent process. What would be the impact of using a constant value for $\alpha_n$ in gradient descent?

Do you mean a constant value of $\alpha$ for each step? – Wes Feb 11 '19 at 21:18 — Wes, Feb 11 '19 at 21:18

score 5 · Accepted Answer · answered Feb 11 '19 at 21:45

Intuitively, if $\alpha$ is too large you may "shoot over" your target and end up bouncing around the search space without converging. If $\alpha$ is too small your convergence will be slow and you could end up stuck on a plateau or a local minimum.

That's why most learning rate schemes start with somewhat larger learning rates for quick gains and then reduce the learning rate gradually.

score 1 · Answer 2 · answered Feb 11 '19 at 21:34

Gradient descent has the following rule:

$\theta_{j} := \theta_{j} - \alpha \frac{\delta}{\delta \theta_{j}} J(\theta)$

Here $\theta_{j}$ is a parameter of your model, and $J$ is the cost/loss function. At each step the product $\alpha \frac{\delta}{\delta \theta_{j}} J(\theta)$ gets smaller as we get closer to the gradient $\frac{\delta}{\delta \theta_{j}} J(\theta)$ converging to 0. $\alpha$ can be constant, and in many cases, it is, but varying $\alpha$ might help converge faster.

Constant Learning Rate for Gradient Decent

2 Answers2