Highest Voted 'gradient' Questions - Data Science Stack Exchange

3

votes

1 answer

How batch normalization layer resolve the vanishing gradient problem?

According to this article: https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484 The vanishing gradient problem occurs when using the sigmoid activation function because sigmoid maps large input space into small space, so the…

asked Jun 02 '21 at 04:40

user3668129

363
2
11

3

votes

3 answers

Why a sign of gradient (plus or minus) is not enough for finding a steepest ascend?

Consider a simple 1-D function $y = x^2$ to find a maximum with the gradient ascent method. If we start in point 3 on x-axis: $$ \frac{\partial f}{\partial x} \biggr\rvert_{x=3} = 2x \biggr\rvert_{x=3} = 6 $$ This means that a direction in which…

gradient-descent learning-rate gradient

asked Jan 16 '21 at 16:19

Kenenbek Arzymatov

189
6

3

votes

1 answer

Gradient Checking: MeanSquareError. Why huge epsilon improves discrepancy?

I am using custom C++ code, and coded a simple "Mean Squared Error" layer. Temporarily using it for the 'classification task', not a simple regression. ...maybe this causes the issues? I don't have anything else before this layer - not even a simple…

gradient-descent backpropagation gradient

asked Apr 18 '20 at 23:30

Kari

2,686
1
17
47

3

votes

1 answer

Differentiable approximation for counting negative values in array

I have an array of time of arrivals and I want to convert it to count data using pytorch in a differentiable way. Example arrival times: arrival_times = [2.1, 2.9, 5.1] and let's say the total range is 6 seconds. What I want to have is: counts = [0,…

pytorch gradient

asked Nov 08 '21 at 15:14

iRestMyCaseYourHonor

139
5

2

votes

1 answer

Gradient passthough in PyTorch

I need to quantize the inputs, but the method (bucketize) I need to do so is indifferentiable. I can of course detach the tensor, but then I lose the flow of gradients to earlier weights. I guess the question is quite simple, how do you continue…

neural-network deep-learning pytorch backpropagation gradient

asked Dec 30 '20 at 05:21

user3023715

203
2
5

2

votes

1 answer

How to choose appropriate epsilon value while approximating gradients to check training?

While approximating gradients, using actual epsilon to shift the weights results in wildly big gradient approximations, as the "width" of the used approximation triangle is disporportionately small. In Andrew NG-s course, he is using 0.01, but I…

gradient

asked Sep 01 '20 at 13:46

Dávid Tóth

145
5

2

votes

1 answer

Tensorflow.Keras: How to get gradient for an output class w.r.t a given input?

I have implemented and trained a sequential model using tf.keras. Say I am given an input array of size 8X8 and an output [0,1,0,...(rest all 0)]. How to calculate the gradient of the input w.r.t to the given output? model = ... output =…

keras tensorflow gradient

asked Jul 18 '20 at 07:09

samarendra chandan bindu Dash

197
10

2

votes

1 answer

Vanishing Gradient vs Exploding Gradient as Activation function?

ReLU is used as an activation function that serves two purposes: Breaking linearity in DNN. Helping in handling Vanishing Gradient problem. For Exploding Gradient problem, we use Gradient Clipping approach where we set the max threshold limit of…

activation-function gradient

asked Feb 26 '20 at 13:03

vipin bansal

1,252
9
17

2

votes

1 answer

What does it mean for a method to be invariant to diagonal rescaling of the gradients?

In the paper which describes Adam: a method for stochastic optimization, the author states: The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the…

machine-learning neural-network deep-learning optimization gradient

asked Jul 14 '18 at 10:13

Erudite

121
2

1

vote

0 answers

Which Neural Network or Gradient Boosting framework is the simplest for Custom Loss Functions?

I need to implement a custom loss function. The function is relatively simple: $$-\sum \limits_{i=1}^m [O_{1,i} \cdot y_i-1] \ \cdot \ \operatorname{ReLu}(O_{1,i} \cdot \hat{y_i} - 1)$$ With $O$ being some external attribute specific to each case. I…

python neural-network loss-function lightgbm gradient

asked Jul 21 '21 at 15:12

Borut Flis

189
1
7

1

vote

1 answer

Why does my manual derivative of Layer Normalization imply no gradient flow?

I recently tried computing the derivative of the layer norm function (https://arxiv.org/abs/1607.06450), an essential component of transformers, but the result suggests that no gradient flows through the operation, which can't be true. Here's my…

normalization transformer gradient

asked Feb 19 '21 at 21:43

Alex

13
4

1

vote

1 answer

vanishing gradient and gradient zero

There is a well known problem vanishing gradient in BackPropagation training of Feedforward Neural Network (FNN)(here we don't consider the vanishing gradient of Recurrent Neural Network). I don't understand why vanishing gradient does not mean the…

machine-learning gradient-descent gradient

asked Sep 30 '20 at 05:18

user6703592

127
5

1

vote

1 answer

Can mini-batch gradient descent outperform batch gradient descent?

As I was reading and going through the second course of Andrew Ng's deep learning course, I came across a sentence that said, With a well-turned mini-batch size, usually it outperforms either gradient descent or stochastic gradient descent…

machine-learning python deep-learning gradient-descent gradient

asked Jul 30 '20 at 05:28

mitra mirshafiee

153
3

1

vote

1 answer

CNN gradients with different magnitude

I have a CNN architecture with two cross entropy losses $\mathcal{L}_1$ and $\mathcal{L}_2$ summed in the total loss $\mathcal{L} = \mathcal{L}_1 + \mathcal{L}_2$. The task I want to solve is Unsupervised Domain Adaptation. I have attested the…

machine-learning cnn loss-function gradient

asked Jul 10 '20 at 14:35

aretor

117
6

1

vote

0 answers

Matlab Optimization. Meaning of warning: "The slope should be 2. It appears to be 1."

I'm using the manopt package to solve some optimization problems in matlab. The problem is of the form. problem.cost = @(x) f(x) problem.egrad = @(x) g(x) After the problem definition, I check the gradient consistency using the following…

gradient-descent matlab gradient

asked Apr 19 '20 at 12:08

Springberg

111
1

Questions tagged [gradient]