Questions tagged [weight-initialization]

Use this tag when asking about the weight initialization of neural networks which are used in machine and deep learning.

Weight initialization described how the weights and, if available, biases of a neural network are initialized.

51 questions
10
votes
2 answers

what is difference between the DDQN and DQN?

I think I did not understand what is the difference between DQN and DDQN in implementation. I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code. We put the…
7
votes
2 answers

What are the cases where it is fine to initialize all weights to zero

I've taken a few online courses in machine learning, and in general, the advice has been to choose random weights for a neural network to ensure that your neurons don't all learn the same thing, breaking symmetry. However, there were other cases…
6
votes
1 answer

Is it wrong to use Glorot Initialization with ReLu Activation?

I'm reading that keras' default initialization is glorot_uniform. However, all of the tutorials I see are using relu activation as the go-to for hidden layers, yet I do not see them specifying initialization for those layers as he. Would it be…
6
votes
2 answers

What are the reasons for drawing initial neural network weights from the Gaussian distribution?

Are there theoretical or empirical reasons for drawing initial weights of a multilayer perceptron from a Gaussian rather than from, say, a Cauchy distribution?
5
votes
1 answer

Where Does the Normal Glorot Initialization Come from?

The famous Glorot initialization is described first in the paper Understanding the difficulty of training deep feedforward neural networks. In this paper, they derive the following uniform initialization, cf. Eq. (16) in their…
Hermi
  • 157
  • 1
  • 6
4
votes
2 answers

Compare Coefficients of Different Regression Models

in my project, I am using asuite of shallow and deep learning models in order to see which has the best performance on my data. However, in the pool of shallow machine learning models, I want to be able to compare the coefficients of each regression…
4
votes
4 answers

Why are deep learning models unstable compare to machine learning models?

I would like to understand why deep learning models are so unstable. Suppose I use the same dataset to train a machine learning model multiple times (for example logistic regression) and a deep learning model multiple times as well (for example…
3
votes
2 answers

Result of uniform weight initialization in all neurons

Background cs231n has the question regarding how to initialize weights. Question Please confirm or correct my understandings. I think the weight value will be the same in all the neurons with ReLU activation. When W = 0 or less in all neurons, the…
mon
  • 565
  • 2
  • 9
  • 17
3
votes
1 answer

How large of a value should a weight have in a neural network?

If you're assigning random values to the weights in a neural network before back-propagation, is there a certain maximum or minimum value for each weight ( for example, 0 < w < 1000 ) or can weights take on any value? Could a network potentially…
2
votes
1 answer

Why is it okay to set the bias vector up with zeros, and not the weight matrices?

We do not initialize weight matrices with zeros because the symmetry isn’t broken during the backward pass, and subsequently in the parameter updating process. But it is safe to set the bias vector up with zeros, and they are updated…
2
votes
1 answer

A model that only works by setting all initial weights to zero

In this model from MusicNet, they set the initial weights of their neural network to all zeros. self.linear = torch.nn.Linear(regions*k, m, bias=False).cuda() torch.nn.init.constant(self.linear.weight, 0) However, people normally randomize the…
2
votes
3 answers

weight training speed too slow in CNNs

I'm writing my own CNN code from scratch. Though I got fast, converged and satisfactory results, the trained weights change very little in value (while cost/loss function drops in time rapidly in a seemingly converged manner). My initial weights:…
feynman
  • 227
  • 1
  • 8
2
votes
0 answers

TF: What is the difference between the 'kernel weights' and the 'recurrent kernel weights' in LSTMs/GRUs?

Context: I am trying to understand the differences between the GRU/LSTM cells from tensorflow and pytorch (for research reproducibility) and noticed that TensorFlow differentiates between the kernel_initializer and the recurrent_initializer (see…
1
vote
0 answers

Question regarding weight initialization of an artificial neural network

This is what i'm trying to implement in Python. w0,...,w8 = vector w1 of shape (9,1) w9,...,w11 = vector w2 of shape (3,1) b0 (first bias) is of shape (3,1) b1 is of shape (1,1) vector X is of shape (99, 3) I don't know where the problem resides…
user
  • 11
  • 1
1
vote
1 answer

Shared classifier for 3 neural networks (is this weights sharing?)

I would like to create 3 different VGGs with a shared classifier. Basically, each of these architectures has only the convolutions, and then I combine all the nets, with a classifier. For a better explanation, let’s see this image: I have no idea…
1
2 3 4