Highest Voted 'learning-rate' Questions - Data Science Stack Exchange

8

votes

1 answer

When should you use learning rate scheduling over adaptive learning rate optimization algorithm?

In order to converge to the optimum properly, there have been invented different algorithms that use adaptive learning rate, such as AdaGrad, Adam, and RMSProp. On the other hand, there is a learning rate scheduler such as power scheduling and…

deep-learning learning-rate

asked Aug 15 '17 at 07:14

Blaszard

901
1
13
29

7

votes

1 answer

Is it a good practice to always apply `ReduceLROnPlateau()`, given that models benefit from reducing learning rate once learning stagnates?

The rationale behind the keras function ReduceLROnPlateau() is that models benefit from reducing learning rate once learning stagnates. Is it a good practice to always apply ReduceLROnPlateau()? What are some situations, if any, to not apply…

machine-learning neural-network keras learning-rate

asked Aug 20 '18 at 00:15

user781486

1,305
2
16
18

4

votes

1 answer

Need to kickstart learning rates

I was just looking at the docs on Pytorch for different available schedulers and I found one that I am having some trouble understanding here. The others seem to make sense: As training progresses, the learning rate gradually decreases. But in my…

deep-learning optimization learning-rate

asked Nov 04 '20 at 04:13

Pawan Bhandarkar

143
4

4

votes

1 answer

Should a Learning Rate Scheduler adjust the learning rate by optimization step (batch) or by epoch?

In PyTorch doc, it suggests torch.optim.lr_scheduler provides several methods to adjust the learning rate based on the number of epochs. However, from other sources it looks like the learning rate should be adjusted in every optimization step…

deep-learning tensorflow pytorch transformer learning-rate

asked Jun 17 '23 at 08:54

CyberPlayerOne

392
1
4
14

3

votes

3 answers

Why a sign of gradient (plus or minus) is not enough for finding a steepest ascend?

Consider a simple 1-D function $y = x^2$ to find a maximum with the gradient ascent method. If we start in point 3 on x-axis: $$ \frac{\partial f}{\partial x} \biggr\rvert_{x=3} = 2x \biggr\rvert_{x=3} = 6 $$ This means that a direction in which…

gradient-descent learning-rate gradient

asked Jan 16 '21 at 16:19

Kenenbek Arzymatov

189
6

3

votes

1 answer

Which learning rate should I choose?

I'm training a segmentation model, Unet++, on 2d images and I am now trying to find the optimal learning rate. The backbone of the model is Resnet34, I use Adam optimizer and the loss function is the dice loss function. Also, I use a few…

deep-learning training loss-function optimization learning-rate

asked Nov 13 '20 at 08:14

Nicolas

244
2
6

3

votes

2 answers

Scikit learn linear regression - learning rate and epoch adjustment

I am trying to learn linear regression using ordinary least squares and gradient descent from scratch. I read the documentation for the Scikit learn function and I do not see a means to adjust the learning rate or the epoch with the…

scikit-learn linear-regression epochs learning-rate

asked Sep 05 '20 at 04:53

chrisper

53
1
6

3

votes

1 answer

Learning rate Scheduler

A very important aspect in deep learning is the learning rate. Can someone tell me, how to initialize the lr and how to choose the decaying rate. I'm sure there are valuable pointers that some experienced people in the community can share with…

machine-learning deep-learning learning-rate

asked Nov 15 '19 at 19:41

user

61
3

2

votes

1 answer

Intuition behind Adagrad optimization

The following paper ADADELTA: AN ADAPTIVE LEARNING RATE METHOD gives a method called Adagrad where we we have the following update rule : $$ X_{n+1} = X_n -[Lr/\sqrt{\sum_{i=0}^ng_i^2}]*g_n $$ Now I understand that this updation rule dynamically…

neural-network optimization learning-rate

asked Apr 18 '20 at 06:45

black sheep 369

172
5

2

votes

2 answers

Constant Learning Rate for Gradient Decent

Given, we have a learning rate, $\alpha_n$ for the $n^{th}$ step of the gradient descent process. What would be the impact of using a constant value for $\alpha_n$ in gradient descent?

gradient-descent learning-rate

asked Feb 11 '19 at 20:35

Umbrage

23
2

2

votes

1 answer

Is it necessary to tune the step size, when using Adam?

The Adam optimizer has four main hyperparameters. For example, looking at the Keras interface, we have: keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) The first hyperparameter is called step size…

neural-network optimization hyperparameter-tuning learning-rate

asked Dec 30 '18 at 21:36

DeltaIV

399
1
3
14

2

votes

1 answer

Constant validation loss & accuracy, training accuracy fluctuates

I am training a Squeeze-net model for binary classification of images. I have 79968 images for training (50:50 for and against) and 8892 images in the validation set. After 35000 iterations my training accuracy fluctuates between 1 and 0.96875. The…

training loss-function learning-rate deep-learning

asked Sep 10 '18 at 12:27

Divyang Vashi

21
5

2

votes

1 answer

Why are optimization algorithms slower at critical points?

I just found the animation below from Alec Radford's presentation: As visible, all algorithms are considerably slowed down at saddle point (where derivative is 0) and quicken up once they get out of it. Regular SGD itself is simply stuck at the…

gradient-descent loss-function learning-rate

asked Aug 11 '18 at 16:34

ShellRox

389
3
12

1

vote

1 answer

Is there a relationship between learning rate and training set size?

I have a large dataset to use for training a Neural Network model. However, I don't have enough resources to do a proper hyperparameters tuning on the whole dataset. Therefore, my idea is to tune the learning rate on the subset of data (let's say…

deep-learning neural-network sampling learning-rate

asked May 16 '21 at 08:35

jakes

95
12

1

vote

0 answers

Why does the learning rate influence whether i get a error from BCE or not?

When I use a learning rate higher than 0.001, I get this: Assertion `input_val >= zero && input_val <= one` failed. This means that the input I gave to BCE is above 1 or below 0 right? Why does changing the learning rate cause this error? Also, I…

machine-learning pytorch learning-rate torch

asked Jul 14 '20 at 17:16

SamuelS

11
1

Questions tagged [learning-rate]