Questions tagged [mini-batch-gradient-descent]

Is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coefficients. Implementations may choose to sum the gradient over the mini-batch which further reduces the variance of the gradient. The point of using mini-batch is that you are able to update your weights more than once each epoch, so your model gets better. Mini-batch is considered more efficient.

56 questions

votes

2 answers

Sliding window leads to overfitting in LSTM?

Will I overfit my LSTM if I train it via the sliding-window approach? Why do people not seem to use it for LSTMs? For a simplified example, assume that we have to predict the sequence of characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y…

lstm backpropagation mini-batch-gradient-descent

asked Feb 09 '18 at 01:10

Kari

2,686
1
17
47

votes

2 answers

Why averaging the gradient works in Gradient Descent?

In Full-batch Gradient descent or Minibatch-GD we are getting gradient from several training examples. We then average them out to get a "high-quality" gradient, from several estimations and finally use it to correct the network, at once. But why…

gradient-descent mini-batch-gradient-descent

asked Jun 22 '18 at 05:49

Kari

2,686
1
17
47

votes

2 answers

Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors?

I am a little confused about taking averages in cost functions and SGD. So far I always thought in SGD you would compute the average error for a batch and then backpropagate it. But then I was told in a comment on this question that that was wrong.…

machine-learning optimization gradient-descent mini-batch-gradient-descent

asked Jul 25 '19 at 21:13

lo tolmencre

votes

1 answer

sklearn: SGDClassifier yields lower accuracy than LogisticRegression

I'm participating in the kaggle Iceberg Classifier Challenge, where the idea is to classify whether an object present in a radar image is an iceberg or a ship. I am currently trying to implement stochastic gradient descent to get a better idea for…

python scikit-learn logistic-regression mini-batch-gradient-descent

asked Nov 30 '17 at 06:05

PrestonH

votes

1 answer

Changing the batch size during training

The choice of batch size is in some sense the measure of stochasticity : On one hand, smaller batch sizes make the gradient descent more stochastic, the SGD can deviate significantly from the exact GD on the whole data, but allows for more…

gradient-descent bayesian mini-batch-gradient-descent sgd

asked Jan 29 '21 at 08:23

spiridon_the_sun_rotator

votes

1 answer

Does small batch size improve the model?

I'm training an LSTM with Keras. I've noticed that the smaller the batch size, the more the loss decreases during periods: so this makes me think that the network can process fewer items better at a time. Is it a normal behavior in general?

keras loss-function mini-batch-gradient-descent

asked Apr 24 '20 at 17:12

pairon

votes

2 answers

Latent loss in variational autoencoder drowns generative loss

I'm trying to run a variational auto-encoder on the CIFAR-10 dataset, for which I've put together a simple network in TensorFlow with 4 layers in the encoder and decoder each, an encoded vector size of 256. For calculating the latent loss, I'm…

deep-learning autoencoder generative-models mini-batch-gradient-descent

asked Jul 19 '18 at 15:49

Ali250

votes

1 answer

How backpropagation through gradient descent represents the error after each forward pass

In Neural NEtwork Multilayer Perceptron, I understand that the main difference between Stochastic Gradient Descent (SGD) vs Gradient Descent (GD) lies in the way of how many samples are chosen while training. That is, SGD iteratively chooses one…

machine-learning neural-network scikit-learn gradient-descent mini-batch-gradient-descent

asked Dec 09 '17 at 13:52

Katherine

votes

1 answer

Train loss vs validation loss

I have a few basic questions about tracking losses during training. If I am using mini-batch training, should I validate after each batch update or after I have seen the entire dataset? What should be the condition to stop the training to prevent…

machine-learning cross-validation training loss-function mini-batch-gradient-descent

asked Apr 26 '18 at 20:57

pg2455

votes

2 answers

In sequence models, is it possible to have training batches with different timesteps each to reduce the required padding per input sequence?

I want to train an LSTM model with variable length inputs. Specifically I want to use as little padding as possible while still using minibatches. As far as I understand each batch requires a fixed number of timesteps for all inputs, necessitating…

keras tensorflow nlp sequence mini-batch-gradient-descent

asked Nov 26 '20 at 08:57

Tonnz

votes

1 answer

Will stochastic gradient descent converge for multivariate linear regression

I am trying to figure out if stochastic gradient descent for a multivariate linear regression will converge (assuming there is no mini-batching, i.e., the batch size is 1). My guess is yes, based on the fact that stochastic gradient descent will…

linear-regression gradient-descent mini-batch-gradient-descent

asked Jul 04 '20 at 01:17

qxzsilver

votes

1 answer

Plotting Gradient Descent in 3d - Contour Plots

I have generated 3 parameters along with the cost function. I have the $\theta$ lists and the cost list of 100 values from the 100 iterations. I would like to plot the last 2 parameters against cost in 3d to visualize the level sets on the contour…

machine-learning gradient-descent matplotlib plotting mini-batch-gradient-descent

asked Feb 14 '20 at 01:37

m2rik

votes

1 answer

training model on random samples from a large dataset

I have a huge data set(More than 1 million data points).My dataset is text. i am doing NER on it to identify few entities. if i randomly choose 100 data points from the total data set and train my model(LSTM), will this yield good results? i will be…

deep-learning lstm mini-batch-gradient-descent

asked Sep 18 '18 at 09:28

rawwar

votes

1 answer

how does minibatch for LSTM look like?

Minibatch is a collection of examples that are fed into the network, (example after example), and back-prop is done after every single example. We then take average of these gradients and update our weights. This completes processing 1 minibatch. I…

lstm mini-batch-gradient-descent

asked Dec 27 '17 at 22:21

Kari

2,686
1
17
47

votes

3 answers

How much of a problem is each member of a batch having the same label?

I have a batch size of 128 and a total data size of around 10 million, and I am classifying between 4 different label values. How much of a problem is it if each batch only contains data with one label? So for example - batch 0 all have the 3rd…

classification class-imbalance mini-batch-gradient-descent

asked Jul 01 '20 at 12:07

Omroth

2 3 4 Next