Questions tagged [backpropagation]

Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.

307 questions
108
votes
5 answers

Backprop Through Max-Pooling Layers?

This is a small conceptual question that's been nagging me for a while: How can we back-propagate through a max-pooling layer in a neural network? I came across max-pooling layers while going through this tutorial for Torch 7's nn library. The…
shinvu
  • 1,210
  • 2
  • 9
  • 7
40
votes
4 answers

Guidelines for selecting an optimizer for training neural networks

I have been using neural networks for a while now. However, one thing that I constantly struggle with is the selection of an optimizer for training the network (using backprop). What I usually do is just start with one (e.g. standard SGD) and then…
mplappert
  • 501
  • 1
  • 4
  • 4
26
votes
1 answer

back propagation in CNN

I have the following CNN: I start with an input image of size 5x5 Then I apply convolution using 2x2 kernel and stride = 1, that produces feature map of size 4x4. Then I apply 2x2 max-pooling with stride = 2, that reduces feature map to size 2x2.…
26
votes
4 answers

Gradients for bias terms in backpropagation

I was trying to implement neural network from scratch to understand the maths behind it. My problem is completely related to backpropagation when we take derivative with respect to bias) and I derived all the equations used in backpropagation. Now…
user34042
  • 395
  • 1
  • 3
  • 7
24
votes
1 answer

Deep Neural Network - Backpropogation with ReLU

I'm having some difficulty in deriving back propagation with ReLU, and I did some work, but I'm not sure if I'm on the right track. Cost Function: $\frac{1}{2}(y-\hat y)^2$ where $y$ is the real value, and $\hat y$ is a predicted value. Also assume…
user1157751
  • 669
  • 1
  • 7
  • 21
23
votes
1 answer

How does Gradient Descent and Backpropagation work together?

Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…
22
votes
1 answer

Understanding Timestamps and Batchsize of Keras LSTM considering Hiddenstates and TBPTT

What I'm trying to do What I am trying to do is predicting the next data-point $x_t$ for each point in the timeseries $[x_0, x_1, x_2,...,x_T]$ in the context of a date-stream in real-time, in theory the series is infinity. If a new value $x$ is…
KenMarsu
22
votes
2 answers

Sliding window leads to overfitting in LSTM?

Will I overfit my LSTM if I train it via the sliding-window approach? Why do people not seem to use it for LSTMs? For a simplified example, assume that we have to predict the sequence of characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y…
Kari
  • 2,686
  • 1
  • 17
  • 47
21
votes
1 answer

What do "compile", "fit", and "predict" do in Keras sequential models?

I am a little confused between these two parts of Keras sequential models functions. May someone explains what is exactly the job of each one? I mean compile doing forward pass and calculating cost function then pass it through fit to do backward…
user3486308
  • 1,260
  • 5
  • 16
  • 27
18
votes
4 answers

Question about bias in Convolutional Networks

I am trying to figure out how many weights and biases are needed for CNN. Say I have a (3, 32, 32)-image and want to apply a (32, 5, 5)-filter. For each feature map I have 5x5 weights, so I should have 3 x (5x5) x 32 parameters. Now I need to add…
user
  • 1,971
  • 6
  • 20
  • 36
16
votes
1 answer

Back-propagation through max pooling layers

I have a small sub-question to this question. I understand that when back-propagating through a max pooling layer the gradient is routed back in a way that the neuron in the previous layer which was selected as max gets all the gradient. What I'm…
14
votes
1 answer

Differences between gradient calculated by different reduction methods in PyTorch

I'm playing with different reduction methods provided in built-in loss functions. In particular, I would like to compare the following. The averaged gradient by performing backward pass for each loss value calculated with reduction="none" The…
Zhuoran Liu
  • 141
  • 1
  • 3
14
votes
3 answers

Creating neural net for xor function

It is a well known fact that a 1-layer network cannot predict the xor function, since it is not linearly separable. I attempted to create a 2-layer network, using the logistic sigmoid function and backprop, to predict xor. My network has 2 neurons…
user
  • 1,971
  • 6
  • 20
  • 36
12
votes
2 answers

How does backpropagation works through Max Pooling layer when doing a batch?

Let's assume that we are using a batch size of 100 samples for learning. So in every batch, the weight of every neuron (and bias, etc) is being updated by adding the minus of the learning rate * the average error value that we found using the 100…
Nathan B
  • 241
  • 1
  • 2
  • 5
11
votes
1 answer

Synthetic Gradients good number of Layers & neurons

I would like to train my LSTM with a "synthetic gradients" Decoupled Neural Interface (DNI). How to decide on the number of layers and neurons for my DNI? Searching for them by trial end error or what's worse - by Genetic algorithm which would…
Kari
  • 2,686
  • 1
  • 17
  • 47
1
2 3
20 21